Elan/Praat Machine Segmenting

My number one hated stage in transcription work is segmenting. I would sit there fuming while manually segmenting the recordings I made before I could even start transcribing. It was frustrating because it seemed like something that a machine could to a relatively good approximation of instead of me sitting there for hours doing it for each file!

Luckily, it turns out that between Praat and ELAN, you can very easily have a decent approximation of segmentation done for you.  Not perfect, but it saves HEAPS of time. If you have a ton of recordings to segment into units before you need to transcribe, this is the process for you!

Thank you to T. Mark Ellison for helping out with this heaps.

Praat stage

First load the sound file that you want to segment into Praat (Open > Read from File). Create a Praat Textgrid file based on silences.:


This next part is our best setting after a few trials:


The resulting text grid should look something like this:

Screen Shot 2017-05-01 at 11.14.15 am.png

The *** is where Praat has segmented for sound. It’s not perfect, but it gives a pretty good shot at things, and you can adjust the boundaries manually in Elan. Save this text grid now.

ELAN stage

Import your Praat text grid:


Cheers Hedvig for letting me know that if you tick the “exclude silences” box, you can have ELAN automatically remove any empty segments from the Praat text file:

Screen Shot 2017-05-05 at 4.15.56 pm.pngAnd you will have your segmented Praat text grid as a layer in Elan looking something like this!

Screen Shot 2017-05-01 at 11.26.42 am.png

The longest file we tried it on was a 1 hour recording of Samoan (cheers Hedvig Skirgård from Humans Who Read Grammars for providing the file!). It took about 8 minutes for Praat to segment. A 10 minute recording is done in no time.

Now be on your merry way setting up your tiers and transcribing to your hearts content 🙂


Procedure: DEM+COP Decategorialisation

Kandra interviewing Wenembu in Nen language, Bimadebn Village. 2014.

Notes for myself regarding a Nen/Nambo study my supervisors and I are working on.

Observation: DEM+COP constructions in Nen and Nambo function as a focus marker of some sorts. The copula verb in these sister languages (seems to?) agree on person, number, and tense, with the A/S argument of the clause. This focus marker, however, appears to be losing it’s agreement as the combined DEM+COP is becoming fossilised/grammaticalised.

Question: Is this grammaticalisation happening at the same rate in Nenland and Namboland? What are the heavy-use bilinguals doing?

Need to:

  • See what Nen speakers are doing. Are they decategorialising?
  • See what Nambo speakers are doing. Are they decategorialising?
  • See what the heavy-use bilinguals are going. Are they decategorialising?

Code for Nen:

  • Identify the focus markers (DEM+COP followed by ambifixing verb)
  • Note: Is it decategorialising? Y?N. If yes, on: Tense, Number, Person, All?
  • Mark in also instances where DEM+COP is functioning just as the copula (code = Z)

Tier Structure

Pastor Blag interviewing his wife Sambo in Nen language, Bimadebn Village. 2014.

Tier structure:

  • Prom: Speaker initials and PM (e.g. BT PM)
    • Sph: Spelling (orthographic) Phon. Type whatever is in the transcription text.
    • ClPh: Close Phonetic. Paying attention to consonant voicing, and syllable boundaries where possible. Don’t worry too much about vowel quality at this point in time. (e.g. ge.ym, gym, ge.dn.z.ron)
    • PromGl: Prominence gloss. Break down of demonstrative type, and agreement of copula. (e.g. DEM1+3sgU:nphd. See below, “Types of DEM” and “Copula Code”)
  • WhichV: Where is the main verb? (e.g. R, RR. See below, “Which V”
  • Decat: Is the copula of the prominance marker decategorialising? Yes (D), No (A), others. See below, “Decat Tier Code”
    • Decatcat: Category that is decategorialised.
  • NP: What is the person number of the NP of the prominence construction?

Tier Codes

Rusien (left) being interviewed by Fasawar (middle) and Jimmi (right), Bimadebn Village. 2014.

Types of DEM

  • DEM1 = ge
  • DEM2 = gs
  • PV = äte
  • FUT1 = bä
  • FUT2 = ä

Copula Code

  • ym = 3sgU:nphd
  • tm = 3sgU:ypst
  • dnzron = 3sgU:rmpst
  • däron = 3duU:rmpst


  • R – The first verb to the right (in the transcription) is the main verb.
  • RR -The second verb to the right (in the transcription) is the main verb.
  • Z – Zero copula.

Decat Tier Code:

  • D – Decategorialised
  • A – Agreeing
  • C – Caveat. There appears to be decategorialisation happening, but it may not be a true case due to contextual information (e.g. this bag gs ym, it was made like this back in the past). The Decatcat tier for code C is still to be coded as though a D.

Decatcat Tier Code:

  • N – Number
  • P – Person
  • T – Tense
  • When there is more than one, code in alphabetical order. (e.g. tense and number = NT, person and tense = PT)


NP Tier Code:

  • Orthographic representation of the prominence construction NP.
  • NPDeets: person and number, e.g. 3sg

Examples in ELAN:

Screen Shot 2017-02-02 at 9.02.44 am.png
A regular example of where the prominence marker is agreeing with the NP person numer and the main verb TAM.
Screen Shot 2017-02-02 at 9.21.06 am.png
Note the Decat tier. Coded C for ‘caveat’, and Decatcat tier is coded T for tense. The prominence construction reads something like “This house gs ym had been burned.”

Notes on odd cases

  • When the PM or main verb has a super plural
    • Coded as D on the decat tier. The NP tier for the noun phrase sets the person and number as 3sg.

Also coding for gesture

Pastor Blag mid-gesture.

Code on Gesture tier:

  • TO = Touch Object
  • FP = Full Point
  • HP = Half Point
  • FB = Full Beat
  • HB = Half Beat
  • OG = Other Gesture, e.g. motion, eye gaze, nod, without a point or beat. If these other gestures are accompanying a point or beat, it is coded for point or beat.
  • MOT = Motion gesture (e.g. acting out verb, motion of inclusion (e.g. ‘all of us’), motion of hither/thither)
  • NG = No Gesture

Where ‘Point’ includes open hand gesturing to a referent (real or imagined) as well as the canonical one-finger point.

Where ‘Beat’ = non-pointed hand motion approximating a up-down/down-up movement of the hand.

Where ‘Half’ means below the waist.