process_audio_textgrid
process_audio_textgrid(audio_path, textgrid_path, entry_classes=['Word', 'Phone'], target_tier='Phone', target_labels='[AEIOU]', min_duration=0.05, min_max_formant=4000, max_max_formant=7000, nstep=20, n_formants=4, window_length=0.025, time_step=0.002, pre_emphasis_from=50, smoother=Smoother(), loss_fun=Loss(), agg_fun=Agg())
Process an audio and TextGrid file together.
Parameters
Name | Type | Description | Default |
---|---|---|---|
audio_path |
str | Path | Path to an audio file. | required |
textgrid_path |
str | Path | Path to a TextGrid | required |
entry_classes |
list | Entry classes for the textgrid tiers. Defaults to [“Word”, “Phone”]. | ['Word', 'Phone'] |
target_tier |
str | The tier to target. Defaults to “Phone”. | 'Phone' |
target_labels |
str | A regex that will match intervals to target. Defaults to “[AEIOU]”. | '[AEIOU]' |
min_duration |
float | Minimum vowel duration to mention. Defaults to 0.05. | 0.05 |
min_max_formant |
float | The lowest max-formant value to try. Defaults to 4000. | 4000 |
max_max_formant |
float | The highest max formant to try. Defaults to 7000. | 7000 |
nstep |
int | The number of steps from the min to the max max formant. Defaults to 20. | 20 |
n_formants |
int | The number of formants to track. Defaults to 4. | 4 |
window_length |
float | Window length of the formant analysis. Defaults to 0.025. | 0.025 |
time_step |
float | Time step of the formant analyusis window. Defaults to 0.002. | 0.002 |
pre_emphasis_from |
float | Pre-emphasis threshold. Defaults to 50. | 50 |
smoother |
Smoother | The smoother method to use. Defaults to Smoother() . |
Smoother() |
loss_fun |
Loss | The loss function to use. Defaults to Loss(). | Loss() |
agg_fun |
Agg | The loss aggregation function to use. Defaults to Agg(). | Agg() |
Returns
Type | Description |
---|---|
list[CandidateTracks] | A list of candidate tracks. |