Pythonic Use

Here, we’ll outline how to use fasttrackpy functions and classes either in an interactive notebook, or within your own package.

import IPython
from fasttrackpy import process_audio_file, \
    process_directory, \
    process_audio_textgrid,\
    process_corpus
from pathlib import Path

Function use

The easiest way to start using fasttrackpy directly will be by calling one of the process_* functions, which will either return a single CandidateTracks object, or a list of CandidateTracks objects.

Process an audio file

You can process an audio file, and adjust the relevant settings with process_audio().

audio_path = Path("..", "assets", "audio", "ay.wav")
IPython.display.Audio(audio_path)
candidates = process_audio_file(
    path=audio_path,
    min_max_formant=3000,
    max_max_formant=6000
    )

Inspecting the candidates object.

There are a few key attributes you can get from the candidates object, including

  • The error terms for each smooth.
  • The winning candidate
candidates.smooth_errors
array([0.22375927, 0.25078503, 0.18971138, 0.14314098, 0.13560708,
       0.11605314, 0.11936134, 0.03661058, 0.03557605, 0.05296531,
       0.06031541, 0.07804997, 0.1034784 , 0.07002199, 0.0586435 ,
       0.03941962, 0.02852621, 0.05098893, 0.03282294, 0.03173693])
candidates.winner
A formant track object. (4, 385)

Inspecting the candidates.winner object

The candidates.winner object has a few useful attributes to access as well, including the maximum formant.

candidates.winner.maximum_formant
5526.315789473684

Data output - Spectrograms

You can get a spectrogram plot out of either the candidates.winner or the candidates itself.

candidates.winner.spectrogram()

candidates.spectrograms()

Data Output - DataFrames

You can output the candidates to a polars dataframe.

candidates.to_df(which = "winner").head()
shape: (5, 14)
F1 F2 F3 F4 F1_s F2_s F3_s F4_s error time max_formant n_formant smooth_method file_name
f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 f64 i32 str str
604.374108 1175.26731 2636.119643 2820.424313 630.934881 1196.361693 2541.865094 2975.642101 0.028526 0.025406 5526.315789 4 "dct_smooth_reg… "ay.wav"
613.663049 1183.807981 2638.781798 2764.336825 630.93103 1196.374946 2541.857538 2975.593274 0.028526 0.027406 5526.315789 4 "dct_smooth_reg… "ay.wav"
620.821348 1196.465294 2629.617697 2645.793985 630.919489 1196.414696 2541.834875 2975.446894 0.028526 0.029406 5526.315789 4 "dct_smooth_reg… "ay.wav"
627.364908 1212.220604 2490.175081 2648.947744 630.900286 1196.480913 2541.797116 2975.203274 0.028526 0.031406 5526.315789 4 "dct_smooth_reg… "ay.wav"
633.400922 1227.997019 2396.727652 2646.907343 630.873472 1196.573552 2541.744282 2974.862929 0.028526 0.033406 5526.315789 4 "dct_smooth_reg… "ay.wav"

Processing an Audio + TextGrid combination.

To process a combination of an audio + textgrid, you can use the process_audio_textgrid() function. There are a few more options to add here related to textgrid processing.

TextGrid Processing

entry_classes

fasttrackpy uses aligned-textgrid to process TextGrids. By default, it will assume your textgrid is formatted as the output of forced alignment with a Word and Phone tier. If your textgrid doesn’t have these tiers, you can pass entry_classes [SequenceInterval] instead.

target_tier

You need to lest process_audio_textgrid() know which tier(s) to process, either by telling it which entry class to target (defaults to "Phone") or by the name of the tier.

target_labels

To process only specific textgrid intervals (say, the vowels), you can pass target_labels a regex string that will match the labels of intervals.

Running the processing

speaker_audio = Path("..", "assets" , "corpus", "josef-fruehwald_speaker.wav")
speaker_textgrid = Path("..", "assets", "corpus", "josef-fruehwald_speaker.TextGrid")
all_vowels = process_audio_textgrid(
    audio_path=speaker_audio,
    textgrid_path=speaker_textgrid,
    entry_classes=["Word", "Phone"],
    target_tier="Phone",
    # just stressed vowels
    target_labels="[AEIOU].1",
    min_duration=0.05,
    min_max_formant=3000,
    max_max_formant=6000,
    n_formants=4
)
100%|██████████| 174/174 [00:01<00:00, 144.35it/s]

Inspecting the results

The all_vowels object is a list of CandidateTracks. Each candidate track object has the same attributes discussed above, but a few additional values added from the textgrid interval.

The SequenceInterval object

You can access the aligned-textgrid.SequenceInterval itself, and its related attributes.

all_vowels[0].interval.label
'AY1'
all_vowels[0].interval.fol.label
'K'
all_vowels[0].interval.inword.label
'strikes'

Labels & Ids

Interval properties also get added to the CandidateTracks object itself, including .label, which contains the interval label, and .id, which contains a unique id for the interval within the textgrid.

[all_vowels[0].label,
 all_vowels[0].id]
['AY1', '0-0-4-3']

Outputting to a dataframe.

In order to output the results to one large dataframe. You’ll have to use polars.concat().

import polars as pl
import plotly.express as px


all_df = [vowel.to_df() for vowel in all_vowels]
big_df = pl.concat(all_df, how="diagonal")
big_df.shape
(8012, 17)
max_formants = big_df\
    .group_by(["id", "label"])\
    .agg(
        pl.col("max_formant").mean()
    )
fig = px.violin(max_formants, y = "max_formant", points="all")
fig.show()

Processing a corpus

To process all audio/textgrid pairs in a given directory, you can use process_corpus(), which will return a list of all CandidateTracks from the corpus.

corpus_path = Path("..", "assets" , "corpus")
all_vowels = process_corpus(corpus_path)
100%|██████████| 65/65 [00:00<00:00, 254.32it/s]
100%|██████████| 274/274 [00:01<00:00, 250.69it/s]

Just like processing an audio file + textgrid combination, you’ll need to use polars.concat() to get one large data frame as output. The columns file_name and group will distinguish between measurements from different files and from different speakers within the files.

big_df = pl.concat(
    [cand.to_df() for cand in all_vowels],
    how = "diagonal"
    )
unique_groups = big_df \
    .select("file_name", "group", "id") \
    .unique() \
    .group_by(["file_name", "group"]) \
    .count()