Pythonic Use

Here, we’ll outline how to use fasttrackpy functions and classes either in an interactive notebook, or within your own package.

import IPython
from fasttrackpy import process_audio_file, \
    process_directory, \
    process_audio_textgrid,\
    process_corpus
from pathlib import Path

Function use

The easiest way to start using fasttrackpy directly will be by calling one of the process_* functions, which will either return a single CandidateTracks object, or a list of CandidateTracks objects.

Process an audio file

You can process an audio file, and adjust the relevant settings with process_audio().

audio_path = Path("..", "assets", "audio", "ay.wav")
IPython.display.Audio(audio_path)

candidates = process_audio_file(
    path=audio_path,
    min_max_formant=3000,
    max_max_formant=6000
    )

Inspecting the `candidates` object.

There are a few key attributes you can get from the candidates object, including

The error terms for each smooth.
The winning candidate

candidates.smooth_errors

array([0.22375927, 0.25078503, 0.18971138, 0.14314098, 0.13560708,
       0.11605314, 0.11936134, 0.03661058, 0.03557605, 0.05296531,
       0.06031541, 0.07804997, 0.1034784 , 0.07002199, 0.0586435 ,
       0.03941962, 0.02852621, 0.05098893, 0.03282294, 0.03173693])

candidates.winner

A formant track object. (4, 385)

Inspecting the `candidates.winner` object

The candidates.winner object has a few useful attributes to access as well, including the maximum formant.

candidates.winner.maximum_formant

5526.315789473684

Data output - Spectrograms

You can get a spectrogram plot out of either the candidates.winner or the candidates itself.

candidates.winner.spectrogram()

candidates.spectrograms()

Data Output - DataFrames

You can output the candidates to a polars dataframe.

candidates.to_df(which = "winner").head()

shape: (5, 14)

F1	F2	F3	F4	F1_s	F2_s	F3_s	F4_s	error	time	max_formant	n_formant	smooth_method	file_name
f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	f64	i32	str	str
604.374108	1175.26731	2636.119643	2820.424313	630.934881	1196.361693	2541.865094	2975.642101	0.028526	0.025406	5526.315789	4	"dct_smooth_reg…	"ay.wav"
613.663049	1183.807981	2638.781798	2764.336825	630.93103	1196.374946	2541.857538	2975.593274	0.028526	0.027406	5526.315789	4	"dct_smooth_reg…	"ay.wav"
620.821348	1196.465294	2629.617697	2645.793985	630.919489	1196.414696	2541.834875	2975.446894	0.028526	0.029406	5526.315789	4	"dct_smooth_reg…	"ay.wav"
627.364908	1212.220604	2490.175081	2648.947744	630.900286	1196.480913	2541.797116	2975.203274	0.028526	0.031406	5526.315789	4	"dct_smooth_reg…	"ay.wav"
633.400922	1227.997019	2396.727652	2646.907343	630.873472	1196.573552	2541.744282	2974.862929	0.028526	0.033406	5526.315789	4	"dct_smooth_reg…	"ay.wav"

Processing an Audio + TextGrid combination.

To process a combination of an audio + textgrid, you can use the process_audio_textgrid() function. There are a few more options to add here related to textgrid processing.

TextGrid Processing

`entry_classes`

fasttrackpy uses aligned-textgrid to process TextGrids. By default, it will assume your textgrid is formatted as the output of forced alignment with a Word and Phone tier. If your textgrid doesn’t have these tiers, you can pass entry_classes [SequenceInterval] instead.

`target_tier`

You need to lest process_audio_textgrid() know which tier(s) to process, either by telling it which entry class to target (defaults to "Phone") or by the name of the tier.

`target_labels`

To process only specific textgrid intervals (say, the vowels), you can pass target_labels a regex string that will match the labels of intervals.

Running the processing

speaker_audio = Path("..", "assets" , "corpus", "josef-fruehwald_speaker.wav")
speaker_textgrid = Path("..", "assets", "corpus", "josef-fruehwald_speaker.TextGrid")

all_vowels = process_audio_textgrid(
    audio_path=speaker_audio,
    textgrid_path=speaker_textgrid,
    entry_classes=["Word", "Phone"],
    target_tier="Phone",
    # just stressed vowels
    target_labels="[AEIOU].1",
    min_duration=0.05,
    min_max_formant=3000,
    max_max_formant=6000,
    n_formants=4
)

100%|██████████| 174/174 [00:01<00:00, 144.35it/s]

Inspecting the results

The all_vowels object is a list of CandidateTracks. Each candidate track object has the same attributes discussed above, but a few additional values added from the textgrid interval.

The `SequenceInterval` object

You can access the aligned-textgrid.SequenceInterval itself, and its related attributes.

all_vowels[0].interval.label

'AY1'

all_vowels[0].interval.fol.label

'K'

all_vowels[0].interval.inword.label

'strikes'

Labels & Ids

Interval properties also get added to the CandidateTracks object itself, including .label, which contains the interval label, and .id, which contains a unique id for the interval within the textgrid.

[all_vowels[0].label,
 all_vowels[0].id]

['AY1', '0-0-4-3']

Outputting to a dataframe.

In order to output the results to one large dataframe. You’ll have to use polars.concat().

import polars as pl
import plotly.express as px


all_df = [vowel.to_df() for vowel in all_vowels]
big_df = pl.concat(all_df, how="diagonal")

big_df.shape

(8012, 17)

max_formants = big_df\
    .group_by(["id", "label"])\
    .agg(
        pl.col("max_formant").mean()
    )

fig = px.violin(max_formants, y = "max_formant", points="all")
fig.show()

Processing a corpus

To process all audio/textgrid pairs in a given directory, you can use process_corpus(), which will return a list of all CandidateTracks from the corpus.

corpus_path = Path("..", "assets" , "corpus")
all_vowels = process_corpus(corpus_path)

100%|██████████| 65/65 [00:00<00:00, 254.32it/s]
100%|██████████| 274/274 [00:01<00:00, 250.69it/s]

Just like processing an audio file + textgrid combination, you’ll need to use polars.concat() to get one large data frame as output. The columns file_name and group will distinguish between measurements from different files and from different speakers within the files.

big_df = pl.concat(
    [cand.to_df() for cand in all_vowels],
    how = "diagonal"
    )

unique_groups = big_df \
    .select("file_name", "group", "id") \
    .unique() \
    .group_by(["file_name", "group"]) \
    .count()