(BAWE) British Academic Written English Corpus

(BAWE) British Academic Written English Corpus

About BAWE

The British Academic Written English Corpus (BAWE) was collected as part of the project, 'An Investigation of Genres of Assessed Writing in British Higher Education'. The project was funded by the Economic and Social Research Council. (2004 - 2007 project number RES-000-23-0800). 

The corpus is a record of proficient university-level student writing at the turn of the 21st century. This Excel Spreadsheet contains information about the corpus holdings. A more detailed spreadsheet is available from the Oxford Text Archive. It contains just under 3000 good-standard student assignments (6,506,995 words). Holdings are fairly evenly distributed across four broad disciplinary areas (Arts and Humanities, Social Sciences, Life Sciences and Physical Sciences) and across four levels of study (undergraduate and taught masters level). Thirty main disciplines are represented. Parsed versions of the corpus have been created by Phil Durrant using the Stanford Core NLP parser, and are available here:  https://phildurrant.net/parsed-bawe-corpus/ 

ESRC logo

JISC logo

TEI logo

Selected links to the corpus are available through the BAWE Quicklinks project, designed for EAP teachers who would like to use corpus data in their feedback to students. 

The corpus is available free of charge to researchers who agree to the conditions of use and who register with the Oxford Text Archive. It can also be searched online via the Sketch Engine open site or Lextutor. Please contact Hilary Nesi for further information, or if you have any queries or comments relating to the project.

Project team

 Queen’s Award for Enterprise Logo
University of the year shortlisted
QS Five Star Rating 2023