Anonymized Audio BNC

This drive contains all of the anonymized (muted) sound files from the Audio part of the British National Corpus, together with the associated time-aligned transcription files. These are in the form of Praat TextGrids. Each TextGrid has four tiers (layers of description): 1) phonemes (labels and their start and end times), 2) words (and their timings), 3) muted words (i.e. anonymized portions) and their timings, 4) "errors" - the location of various kinds of suspicious alignment errors; this latter is a kind of quality-control data. All of these files are suitable for public release according to the original agreements with the participants. The copyright and access terms governing (a) the recordings and (b) the time-aligned transcription files we have created are spelled out at http://www.phon.ox.ac.uk/AudioBNC.