A Corpus-Driven Approach To Language Contact: E...
Previous PhDs I have supervised include:A corpus-based examination of the concept of political correctness in British broadsheet newspapers The language of marriage rituals in Botswana Combining corpus approaches and CDA to examine discourses of terrorism in the British and Chinese popular press Combining corpus approaches and CDA to examine discourses of homophobia in a right-wing political organisation A corpus study to compare lexical bundle use of Chinese learners of English with native speakers of English A corpus study of keywords to examine gender identity in British and Malaysian children's writingThe construction of gender identity in Iranian bloggersA corpus-based comparison of two academic books about Wahhabi Islam
A Corpus-Driven Approach to Language Contact: E...
Based on the comparative method, this article seeks to conduct a feature-by-feature comparison between the corpus-based and corpus-driven approaches in corpus linguistics. The similarity between the corpus-based and corpus-driven approaches is present through the use of corpus as the primary tool to collect and analyse data. Meanwhile, the differences between these approaches are present in four aspects: top-down vs. bottom-up approaches, different selection and sampling methods, opposite views towards the corpus annotation, and different paradigmatic claims. The most significant advantages of the corpus-based approach lie in the values added by annotation and flexible size. However, its disadvantages include subjective and incorrect annotation and overreliance on intuition. Moreover, the primary advantages of the corpus-driven approach include objective perspective, novel methodology, and full exploitation of corpus evidence, while its weakness includes the difficulty in collecting meaningful data and formulating a theory based on the corpus and the rejection to annotate corpus.
Creativity is a complex, multi-faceted concept encompassing a variety of related aspects, abilities, properties and behaviours. If we wish to study creativity scientifically, then a tractable and well-articulated model of creativity is required. Such a model would be of great value to researchers investigating the nature of creativity and in particular, those concerned with the evaluation of creative practice. This paper describes a unique approach to developing a suitable model of how creative behaviour emerges that is based on the words people use to describe the concept. Using techniques from the field of statistical natural language processing, we identify a collection of fourteen key components of creativity through an analysis of a corpus of academic papers on the topic. Words are identified which appear significantly often in connection with discussions of the concept. Using a measure of lexical similarity to help cluster these words, a number of distinct themes emerge, which collectively contribute to a comprehensive and multi-perspective model of creativity. The components provide an ontology of creativity: a set of building blocks which can be used to model creative practice in a variety of domains. The components have been employed in two case studies to evaluate the creativity of computational systems and have proven useful in articulating achievements of this work and directions for further research.
The aim of the work reported in this paper is to examine the nature of creativity and to identify within it a set of components, representing key dimensions, that are recognised across a combination of different viewpoints. We present a novel, empirical approach to the problem of modelling how creative behaviour is manifested, that focuses on what is revealed about our understanding of creativity and its attributes by the words we use to discuss and debate the nature of the concept. Analysis of this language provides a sound basis for constructing a sufficiently detailed and comprehensive model of creativity [13, 14]. The current work is intended as a significant, methodological contribution towards addressing the Grand Challenge of evaluation in computational creativity research. It should provide researchers with a firm foundation for evaluating exactly how creative so-called creative systems actually are.
On our approach, statistical language processing techniques are used to identify words significantly associated with creativity in a corpus of academic papers on the subject. A corpus spanning some 60 years of research into the nature of creativity was collected together. The papers were gathered from a wide variety of disciplines including psychology, educational testing and computational creativity, amongst others. The language data drawn from this collection was then analysed and contrasted with data from a corpus of matched papers on subjects unrelated to creativity. From this analysis, a set of 694 creativity words was identified, where each creativity word appeared significantly more often than expected in the corpus of creativity papers. A measure of lexical similarity provided a basis for clustering the creativity words into groups of words with similar or shared aspects of meaning. Through inspection of these clusters, a total of fourteen key components of creativity was identified, where each represents a key theme or attribute of creativity. The set of components yields information about the nature of creativity, based on what is collectively emphasised in discussions about the concept.
Our approach makes use of an empirical study and analysis of the language used to talk about creativity in order to gather and collate knowledge about the concept. In addition, following from the observations above, a confluence approach to creativity is adopted [16, 26, 52]. This works on the principle that creativity results from several components converging and goes on to examine what these components are. Taking this approach in conjunction with the application of tools from computational linguistics and statistical analysis allows a wider disciplinary spectrum of perspectives on creativity to be captured than has previously been attempted. This is achieved by breaking down the whole into smaller and more tractable constituent parts identified through a broad cross-disciplinary examination of creativity research.
This paper has described the methods used to identify a set of components of creativity using corpus-based, statistical language processing techniques. The motivation for the work is the need for a shared, comprehensive and multi-perspective model of creativity. Such a model should be of great value to researchers investigating the nature of creativity and in particular those concerned with the evaluation of creative practice. More broadly, the inter-disciplinary approach described here exemplifies a general approach to the investigation and representation of semantically fuzzy and essentially-contested concepts. For this reason, we expect that it will interest researchers investigating computational methods for analysing and representing other such concepts.
Manchester has pioneered the corpus-based approach to studying translation through the establishment of the Translational English Corpus, the largest corpus of translated language anywhere in the world. Another strand of our research uses diachronic corpus studies to investigate the relationship between translation, as language contact, and language change.
The present study implemented a genre-based approach to analyze the rhetorical structure of English language research articles (RAs): specifically, the Introduction-Methods-Results-Discussion-Conclusion (I-M-R-D-C) sections. Next, lexical bundles (LBs) associated with patterns of moves were identified by applying a corpus-driven approach. The study analyzed two corpora of 30 RAs purposely selected from 16 peer-reviewed journals of applied linguistics published in Saudi Arabia and internationally during the years of 2011-2016. First, a genre-based approach was used to identify the move structures of RAs through analyzing different RA sections by different models. Next, lexical bundles associated with each identified move in each IMRDC section were analyzed using a corpus-driven approach, based on structural and functional taxonomies. The study findings showed that both corpora share similarities and differences related to rhetorical structures and lexical bundles. These findings have pedagogical implications for novice writers, graduate students, and English for Academic Purposes (EAP) instruction, including raising awareness of rhetorical structures and LBs in academic writing for publication, which could help produce more successful publishable research articles.
The distinction between corpus-based and corpus-driven language study was introduced by Tognini-Bonelli (2001). Corpus-based studies typically use corpus data in order to explore a theory or hypothesis, aiming to validate it, refute it or refine it. The definition of corpus linguistics as a method underpins this approach.
In her current research, she continues to apply data-driven approaches to text processing and corpus-based methods to text analysis, and remains interested in analyzing classroom discourse. In her earlier work, she reported on ways in which teachers use language differently from students in varying disciplines and levels of instruction in a university setting. She also documented turn-taking patterns in university classes, and reported on the relationship between interactivity and lexical and grammatical patterns used by teachers and students. She looked at lexical bundles in discourse structure, and also explored student talk in different contexts in an academic setting. She also explored cultural differences in the way discourse is organized in university classes in three disciplines in English Medium Instructional (EMI) settings. While she continues to be interested in university classroom discourse, she is also analyzing academic vocabulary use, lexical bundle use, syntactic complexity features, and stance in college-level writing by students in English as a Foreign Language (EFL) contexts. Her most recent publications (2021), are co-authored with her former graduate students. The journal article with Ryan Young, shows changing gender roles as portrayed in telecinematic discourse depicted through a diachronic keyword analysis in Star Trek, and the one with Katy Bailey explores lexical differences in papers written by second language students in a US-based writing program and outsourced papers. 041b061a72