The general mark-up design of the ELC follows choices made within the BASE corpus. It adheres to TEI-compliant structural mark-up standards. Encoding in the ELC files is in XML.
The structural elements are:
- Container elements – these are the tags used in the header metadata, and speaker utterance tags in the body of the transcript that identify sex and academic status. The lecturer is identified by a two letter code for sex (nf/nm = female non-student/male non-student). For example: <u who="sm"> is a male student, <u who="sf"> is a female student, and <u who="ss"> is a group of students. This is followed by a four digit unique identifier, such as <u who="nm1003">, which signifies a male lecturer from the UK component who is identified as 1003. The close of each utterance is marked by the end tag </u>. (Container elements are also used in the annotation system.)
- Empty elements – six of these are commonly used: <gap reason="inaudible"/>, <gap reason="pause"/>, <vocal desc="laughter"/>, <vocal desc="voice from video"/>, <event desc="writes on board"/>, and <event desc="draws on board"/>. Other empty elements record unusual occurrences, for example <event desc="drops pen"> which has been inserted to make sense of the “oops” that follows. All pauses of perceivable length are recorded as <gap reason="pause">. Significant gaps for breaks in recordings and inaudible speech are identified in the same way with the addition of length in time data, for example <gap reason="break in recording" dur="00:01:12"/>.
Each file header contains a file description (including title and citation information along with a source description of recording and transcription information), a description of encoding, and a profile description of non-bibliographic information (such as the number of participants, the meaning of unique identifiers, level and module).
Here is an example of a file header used in ELC.