Elan/Praat Machine Segmenting

My number one hated stage in transcription work is segmenting. I would sit there fuming while manually segmenting the recordings I made before I could even start transcribing. It was frustrating because it seemed like something that a machine could to a relatively good approximation of instead of me sitting there for hours doing it for each file!

Luckily, it turns out that between Praat and ELAN, you can very easily have a decent approximation of segmentation done for you.  Not perfect, but it saves HEAPS of time. If you have a ton of recordings to segment into units before you need to transcribe, this is the process for you!

Thank you to T. Mark Ellison for helping out with this heaps.

Praat stage

First load the sound file that you want to segment into Praat (Open > Read from File). Create a Praat Textgrid file based on silences.:

01.jpg

This next part is our best setting after a few trials:

02.jpg

The resulting text grid should look something like this:

Screen Shot 2017-05-01 at 11.14.15 am.png

The *** is where Praat has segmented for sound. It’s not perfect, but it gives a pretty good shot at things, and you can adjust the boundaries manually in Elan. Save this text grid now.

ELAN stage

Import your Praat text grid:

04.jpg

Cheers Hedvig for letting me know that if you tick the “exclude silences” box, you can have ELAN automatically remove any empty segments from the Praat text file:

Screen Shot 2017-05-05 at 4.15.56 pm.pngAnd you will have your segmented Praat text grid as a layer in Elan looking something like this!

Screen Shot 2017-05-01 at 11.26.42 am.png

The longest file we tried it on was a 1 hour recording of Samoan (cheers Hedvig Skirgård from Humans Who Read Grammars for providing the file!). It took about 8 minutes for Praat to segment. A 10 minute recording is done in no time.

Now be on your merry way setting up your tiers and transcribing to your hearts content 🙂

Advertisements

One thought on “Elan/Praat Machine Segmenting

  1. I tried this out with a recording done with one person in a quiet room, using a headset mic and it seemed to work well. I compared it with the inbuild Elan silence recogniser ‘Silence recogniser MPI-PL’. That worked well too but creates annotations for the silences too – which are a bit odd. If you really didn’t like them you’d have to work out how to delete them later.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s