Statistics and Transcription Conventions
Our transcriptions are not written in standard English. Understand the spelling and punctuation system, and how it can be useful.
The JSCC was transcribed using the standard MICASE transcription conventions, with one exception. Unique to this corpus, we have double-coded the speaker information: in addition to the standard speaker (S) number, (which corresponds to the order in which speakers appear in each transcript), we have added stable participant (PID) initials (each identifiable speaker is marked in the header with a unique set of initials). This allows investigators, if they wish, to track rhetorical preferences and “speaking styles” of individuals throughout the corpus.
NOTE: Not all speakers were able to be identified. When the speaker is unknown, you will not see a PID, but simply an S-number.
The JSCC contains 23 transcribed files, each containing one lecture with an average length of about 20 minutes and the corresponding question-and-answer session of about 10 minutes.
The corpus contains just over 100,000 words, 77% coming from the presentations and 23% from the Q&A sessions.
A short history and description of this corpus.
Click here to get access to the full write-up of the conference. You will see 23 separate files, one for each speaker’s presentation.
Our transcriptions are not written in standard English. Understand the spelling and punctuation system, and how it can be useful.
A resource list of research and findings based on the JSCC.
Suggestions for using the JSCC in your classroom.