Pythonic Use

Here, we’ll outline how to use fasttrackpy functions and classes either in an interactive notebook, or within your own package.

import IPython
from fasttrackpy import process_audio_file, \
    process_directory, \
from pathlib import Path

Function use

The easiest way to start using fasttrackpy directly will be by calling one of the process_* functions, which will either return a single CandidateTracks object, or a list of CandidateTracks objects.

Process an audio file

You can process an audio file, and adjust the relevant settings with process_audio().

audio_path = Path("..", "assets", "audio", "ay.wav")
candidates = process_audio_file(

Inspecting the candidates object.

There are a few key attributes you can get from the candidates object, including

  • The error terms for each smooth.
  • The winning candidate
array([0.22375927, 0.25078503, 0.18971138, 0.14314098, 0.13560708,
       0.11605314, 0.11936134, 0.03661058, 0.03557605, 0.05296531,
       0.06031541, 0.07804997, 0.1034784 , 0.07002199, 0.0586435 ,
       0.03941962, 0.02852621, 0.05098893, 0.03282294, 0.03173693])
A formant track object. (4, 385)

Inspecting the candidates.winner object

The candidates.winner object has a few useful attributes to access as well, including the maximum formant.


Data output - Spectrograms

You can get a spectrogram plot out of either the candidates.winner or the candidates itself.



Data Output - DataFrames

You can output the candidates to a polars dataframe.

candidates.to_df(which = "winner").head()
shape: (5, 14)
F1 F2 F3 F4 F1_s F2_s F3_s F4_s error time max_formant n_formant smooth_method file_name
f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 i32 str str
604.374108 1175.26731 2636.119643 2820.424313 630.934881 1196.361693 2541.865094 2975.642101 0.028526 0.025406 5526.315789 4 "dct_smooth_reg… "ay.wav"
613.663049 1183.807981 2638.781798 2764.336825 630.93103 1196.374946 2541.857538 2975.593274 0.028526 0.027406 5526.315789 4 "dct_smooth_reg… "ay.wav"
620.821348 1196.465294 2629.617697 2645.793985 630.919489 1196.414696 2541.834875 2975.446894 0.028526 0.029406 5526.315789 4 "dct_smooth_reg… "ay.wav"
627.364908 1212.220604 2490.175081 2648.947744 630.900286 1196.480913 2541.797116 2975.203274 0.028526 0.031406 5526.315789 4 "dct_smooth_reg… "ay.wav"
633.400922 1227.997019 2396.727652 2646.907343 630.873472 1196.573552 2541.744282 2974.862929 0.028526 0.033406 5526.315789 4 "dct_smooth_reg… "ay.wav"

Processing an Audio + TextGrid combination.

To process a combination of an audio + textgrid, you can use the process_audio_textgrid() function. There are a few more options to add here related to textgrid processing.

TextGrid Processing


fasttrackpy uses aligned-textgrid to process TextGrids. By default, it will assume your textgrid is formatted as the output of forced alignment with a Word and Phone tier. If your textgrid doesn’t have these tiers, you can pass entry_classes [SequenceInterval] instead.


You need to lest process_audio_textgrid() know which tier(s) to process, either by telling it which entry class to target (defaults to "Phone") or by the name of the tier.


To process only specific textgrid intervals (say, the vowels), you can pass target_labels a regex string that will match the labels of intervals.

Running the processing

speaker_audio = Path("..", "assets" , "corpus", "josef-fruehwald_speaker.wav")
speaker_textgrid = Path("..", "assets", "corpus", "josef-fruehwald_speaker.TextGrid")
all_vowels = process_audio_textgrid(
    entry_classes=["Word", "Phone"],
    # just stressed vowels
100%|██████████| 174/174 [00:01<00:00, 144.35it/s]

Inspecting the results

The all_vowels object is a list of CandidateTracks. Each candidate track object has the same attributes discussed above, but a few additional values added from the textgrid interval.

The SequenceInterval object

You can access the aligned-textgrid.SequenceInterval itself, and its related attributes.


Labels & Ids

Interval properties also get added to the CandidateTracks object itself, including .label, which contains the interval label, and .id, which contains a unique id for the interval within the textgrid.

['AY1', '0-0-4-3']

Outputting to a dataframe.

In order to output the results to one large dataframe. You’ll have to use polars.concat().

import polars as pl
import as px

all_df = [vowel.to_df() for vowel in all_vowels]
big_df = pl.concat(all_df, how="diagonal")
(8012, 17)
max_formants = big_df\
    .group_by(["id", "label"])\
fig = px.violin(max_formants, y = "max_formant", points="all")

Processing a corpus

To process all audio/textgrid pairs in a given directory, you can use process_corpus(), which will return a list of all CandidateTracks from the corpus.

corpus_path = Path("..", "assets" , "corpus")
all_vowels = process_corpus(corpus_path)
100%|██████████| 65/65 [00:00<00:00, 254.32it/s]
100%|██████████| 274/274 [00:01<00:00, 250.69it/s]

Just like processing an audio file + textgrid combination, you’ll need to use polars.concat() to get one large data frame as output. The columns file_name and group will distinguish between measurements from different files and from different speakers within the files.

big_df = pl.concat(
    [cand.to_df() for cand in all_vowels],
    how = "diagonal"
unique_groups = big_df \
    .select("file_name", "group", "id") \
    .unique() \
    .group_by(["file_name", "group"]) \