University of Nottingham Ningbo China
School of
Education and English
  • Intranet

Corpus of Chinese Academic Written and Spoken English (CAWSE) 

***ANNOUNCEMENT: After August 2019, the updates of the CAWSE project will only be available on ***


 CAWSE logo 2017.3.31


A corpus is a collection of digitalised text, from which researchers can draw on evidence to identify features and patterns.

The project of Corpus of Chinese Academic Written and Spoken English (CAWSE) aims to build a large collection of students’ English language samples from the University of Nottingham Ningbo China (UNNC). A variety of assessment tasks (both written and spoken) and speech events (spoken and multi-modal) will be collected from the preliminary-year programme at UNNC. While the majority of the subjects will be L1 Chinese students, there might be a small number of L2 students from other L1 backgrounds such as Russian or Indonesian, which reflects the student population of UNNC. The project will collect data for up to three years (2016-2018), and the expansion of corpus including transcription, tagging and preliminary annotation will continue until Year 5 (2020).Corpus size is expected to be no less than one million tokens and no less than 1,000 pieces of writing or 100 speech events. The project also includes a multimodal pilot corpus.

The UNNC CAWSE, once completed, will provide great resources for us to investigate distinctive characteristics in students’ linguistic samples from UNNC in terms of lexical, syntactical or discoursal features across different band scores, assessment tasks, genres and any other contexts. The final product of UNNC CAWSE, along with a technical manual documenting the development process, will be available to researchers and practitioners. A selection of the accompanied audio/video recordings will also be made accessible for staff and students at UNNC for research, teaching and research purposes. External access may be considered in the future for registered users.

We also welcome any researchers or practitioners, including research students, who wish to use the CAWSE raw data for a research project and who may be able to offer some transcription in return. If you have any questions, please contact Dr. Yu-Hua Chen (


The project is funded by the Ningbo 3315 plan (1 million RMB) awarded to Dr Yu-Hua Chen as part of the government’s innovation development scheme to raise the profile of Ningbo. The corpus development is also assisted by funding (200k RMB) from The University of Nottingham Ningbo China Matching Funding Scheme.



Data Access                        News and Events                         Team Members   

                   Transcription                                Research Outputs