import IPython
from fasttrackpy import process_audio_file, \
\
process_directory, \
process_audio_textgrid,
process_corpusfrom pathlib import Path
Pythonic Use
Here, we’ll outline how to use fasttrackpy
functions and classes either in an interactive notebook, or within your own package.
Function use
The easiest way to start using fasttrackpy
directly will be by calling one of the process_*
functions, which will either return a single CandidateTracks
object, or a list of CandidateTracks
objects.
Process an audio file
You can process an audio file, and adjust the relevant settings with process_audio()
.
= Path("..", "assets", "audio", "ay.wav")
audio_path IPython.display.Audio(audio_path)
= process_audio_file(
candidates =audio_path,
path=3000,
min_max_formant=6000
max_max_formant )
Inspecting the candidates
object.
There are a few key attributes you can get from the candidates
object, including
- The error terms for each smooth.
- The winning candidate
candidates.smooth_errors
array([0.22375927, 0.25078503, 0.18971138, 0.14314098, 0.13560708,
0.11605314, 0.11936134, 0.03661058, 0.03557605, 0.05296531,
0.06031541, 0.07804997, 0.1034784 , 0.07002199, 0.0586435 ,
0.03941962, 0.02852621, 0.05098893, 0.03282294, 0.03173693])
candidates.winner
A formant track object. (4, 385)
Inspecting the candidates.winner
object
The candidates.winner
object has a few useful attributes to access as well, including the maximum formant.
candidates.winner.maximum_formant
5526.315789473684
Data output - Spectrograms
You can get a spectrogram plot out of either the candidates.winner
or the candidates
itself.
candidates.winner.spectrogram()
candidates.spectrograms()
Data Output - DataFrames
You can output the candidates
to a polars dataframe.
= "winner").head() candidates.to_df(which
F1 | F2 | F3 | F4 | F1_s | F2_s | F3_s | F4_s | error | time | max_formant | n_formant | smooth_method | file_name |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | f64 | i32 | str | str |
604.374108 | 1175.26731 | 2636.119643 | 2820.424313 | 630.934881 | 1196.361693 | 2541.865094 | 2975.642101 | 0.028526 | 0.025406 | 5526.315789 | 4 | "dct_smooth_reg… | "ay.wav" |
613.663049 | 1183.807981 | 2638.781798 | 2764.336825 | 630.93103 | 1196.374946 | 2541.857538 | 2975.593274 | 0.028526 | 0.027406 | 5526.315789 | 4 | "dct_smooth_reg… | "ay.wav" |
620.821348 | 1196.465294 | 2629.617697 | 2645.793985 | 630.919489 | 1196.414696 | 2541.834875 | 2975.446894 | 0.028526 | 0.029406 | 5526.315789 | 4 | "dct_smooth_reg… | "ay.wav" |
627.364908 | 1212.220604 | 2490.175081 | 2648.947744 | 630.900286 | 1196.480913 | 2541.797116 | 2975.203274 | 0.028526 | 0.031406 | 5526.315789 | 4 | "dct_smooth_reg… | "ay.wav" |
633.400922 | 1227.997019 | 2396.727652 | 2646.907343 | 630.873472 | 1196.573552 | 2541.744282 | 2974.862929 | 0.028526 | 0.033406 | 5526.315789 | 4 | "dct_smooth_reg… | "ay.wav" |
Processing an Audio + TextGrid combination.
To process a combination of an audio + textgrid, you can use the process_audio_textgrid()
function. There are a few more options to add here related to textgrid processing.
TextGrid Processing
entry_classes
fasttrackpy
uses aligned-textgrid
to process TextGrids. By default, it will assume your textgrid is formatted as the output of forced alignment with a Word and Phone tier. If your textgrid doesn’t have these tiers, you can pass entry_classes
[SequenceInterval]
instead.
target_tier
You need to lest process_audio_textgrid()
know which tier(s) to process, either by telling it which entry class to target (defaults to "Phone"
) or by the name of the tier.
target_labels
To process only specific textgrid intervals (say, the vowels), you can pass target_labels
a regex string that will match the labels of intervals.
Running the processing
= Path("..", "assets" , "corpus", "josef-fruehwald_speaker.wav")
speaker_audio = Path("..", "assets", "corpus", "josef-fruehwald_speaker.TextGrid") speaker_textgrid
= process_audio_textgrid(
all_vowels =speaker_audio,
audio_path=speaker_textgrid,
textgrid_path=["Word", "Phone"],
entry_classes="Phone",
target_tier# just stressed vowels
="[AEIOU].1",
target_labels=0.05,
min_duration=3000,
min_max_formant=6000,
max_max_formant=4
n_formants )
100%|██████████| 174/174 [00:01<00:00, 144.35it/s]
Inspecting the results
The all_vowels
object is a list of CandidateTracks
. Each candidate track object has the same attributes discussed above, but a few additional values added from the textgrid interval.
The SequenceInterval
object
You can access the aligned-textgrid.SequenceInterval
itself, and its related attributes.
0].interval.label all_vowels[
'AY1'
0].interval.fol.label all_vowels[
'K'
0].interval.inword.label all_vowels[
'strikes'
Labels & Ids
Interval properties also get added to the CandidateTracks
object itself, including .label
, which contains the interval label, and .id
, which contains a unique id for the interval within the textgrid.
0].label,
[all_vowels[0].id] all_vowels[
['AY1', '0-0-4-3']
Outputting to a dataframe.
In order to output the results to one large dataframe. You’ll have to use polars.concat()
.
import polars as pl
import plotly.express as px
= [vowel.to_df() for vowel in all_vowels]
all_df = pl.concat(all_df, how="diagonal") big_df
big_df.shape
(8012, 17)
= big_df\
max_formants "id", "label"])\
.group_by([
.agg("max_formant").mean()
pl.col( )
= px.violin(max_formants, y = "max_formant", points="all")
fig fig.show()
Processing a corpus
To process all audio/textgrid pairs in a given directory, you can use process_corpus()
, which will return a list of all CandidateTracks
from the corpus.
= Path("..", "assets" , "corpus")
corpus_path = process_corpus(corpus_path) all_vowels
100%|██████████| 65/65 [00:00<00:00, 254.32it/s]
100%|██████████| 274/274 [00:01<00:00, 250.69it/s]
Just like processing an audio file + textgrid combination, you’ll need to use polars.concat()
to get one large data frame as output. The columns file_name
and group
will distinguish between measurements from different files and from different speakers within the files.
= pl.concat(
big_df for cand in all_vowels],
[cand.to_df() = "diagonal"
how )
= big_df \
unique_groups "file_name", "group", "id") \
.select(\
.unique() "file_name", "group"]) \
.group_by([ .count()