Spoken commands example¶
This example uses an audio classifier model from a Tensorflow tutorial: https://www.tensorflow.org/tutorials/sequences/audio_recognition
N.B. This script downloads a large (2.3GB) speech commands dataset!
[1]:
import sys
sys.path.append('..')
from pathlib import Path
import tarfile
import shutil
import pandas as pd
from scipy.io.wavfile import read, write
from sklearn.metrics import confusion_matrix
from dpemu.nodes.series import Series
from dpemu.nodes.tuple import Tuple
from dpemu.filters.sound import ClipWAV
from dpemu.filters.common import ApplyToTuple
from dpemu.plotting_utils import visualize_confusion_matrix
First we download the dataset unless it is already present. If you have downloaded and extracted the dataset into a different directory, change the data_dir variable accordingly.
[2]:
data_url = "https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz"
fname = "speech_commands_v0.02.tar.gz"
data_dir = Path.home() / "datasets/speech_data"
if not data_dir.exists():
data_dir.mkdir(parents=True)
!wget {data_url} -P {data_dir}
tarfile.open(data_dir / fname, "r:gz").extractall(data_dir)
[3]:
trained_categories = ["yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go"]
labels = ["_silence_", "_unknown_", "yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go"]
test_set_rel_paths = !cat {data_dir / "testing_list.txt"}
test_set_files = [data_dir / p for p in test_set_rel_paths]
test_categories = !cut -d'/' -f1 {data_dir / "testing_list.txt"} | sort -u
len(test_set_files), len(test_categories), len(trained_categories)
[3]:
(11005, 35, 10)
In order to download the speech commands dataset to the correct place, we need to set the variables dpemu_path and example_path.
[4]:
dpemu_path = Path.cwd().parents[1]
example_path = dpemu_path / "examples/speech_commands"
Choose a category in which to generate errors. Later on we will generate errors in all of the test set categories.
[5]:
category = "stop"
data_subset_dir = data_dir / category
fs = list(data_subset_dir.iterdir())
wavs = [read(f) for f in data_subset_dir.iterdir()]
Create an error generating tree and generate errors in the category chosen above.
[6]:
wav_node = Tuple()
wav_node.addfilter(ApplyToTuple(ClipWAV("dyn_range"), 1))
root_node = Series(wav_node)
err_params = {"dyn_range": .2}
clipped = root_node.generate_error(wavs, err_params)
Now we arbitrarily choose a speech command example from the data subset. To try another audio clip, change the index.
[7]:
example_index = 123
[8]:
clipped_filename = data_dir / 'clipped.wav'
write(clipped_filename, 16000, clipped[example_index][1])
[9]:
!aplay {fs[example_index]}
Playing WAVE '/home/jpssilve/datasets/speech_data/stop/3ec05c3d_nohash_0.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono
[10]:
!aplay {clipped_filename}
Playing WAVE '/home/jpssilve/datasets/speech_data/clipped.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono
Define a function to filter out irrelevant output (e.g. Python deprecation warnings):
[11]:
def filter_scores(output):
return [line for line in output if "score" in line or ".wav" in line]
Run the model on the clean clip selected above:
[12]:
scores_clean = !python {example_path}/label_wav.py \
--graph={example_path}/trained_model/my_frozen_graph.pb \
--labels={example_path}/trained_model/conv_labels.txt \
--wav={fs[example_index]}
filter_scores(scores_clean)
[12]:
['stop (score = 0.54378)',
'off (score = 0.19993)',
'_unknown_ (score = 0.07233)']
Run the model on the corresponding errorified clip:
[13]:
scores_clipped = !python {example_path}/label_wav.py \
--graph={example_path}/trained_model/my_frozen_graph.pb \
--labels={example_path}/trained_model/conv_labels.txt \
--wav={clipped_filename}
filter_scores(scores_clipped)
[13]:
['stop (score = 0.22963)',
'down (score = 0.16858)',
'_unknown_ (score = 0.11415)']
You can also run the model on an entire directory of .wav files in one go:
[14]:
scores_clean_dir = !python {example_path}/label_wav_dir.py \
--graph={example_path}/trained_model/my_frozen_graph.pb \
--labels={example_path}/trained_model/conv_labels.txt \
--wav_dir={data_subset_dir}
filter_scores(scores_clean_dir)
[14]:
['0f46028a_nohash_4.wav',
'stop (score = 0.84888)',
'up (score = 0.10150)',
'_unknown_ (score = 0.02897)',
'095847e4_nohash_0.wav',
'stop (score = 0.83839)',
'up (score = 0.10791)',
'down (score = 0.01377)',
'f8ba7c0e_nohash_1.wav',
'stop (score = 0.99616)',
'down (score = 0.00215)',
'_unknown_ (score = 0.00114)',
'4cee0c60_nohash_1.wav',
'stop (score = 0.94652)',
'up (score = 0.04828)',
'_unknown_ (score = 0.00210)',
'52e228e9_nohash_1.wav',
'stop (score = 0.98153)',
'down (score = 0.00989)',
'up (score = 0.00290)',
'42f81601_nohash_0.wav',
'stop (score = 0.95047)',
'up (score = 0.02973)',
'_unknown_ (score = 0.01149)',
'bc065a17_nohash_1.wav',
'stop (score = 0.51887)',
'down (score = 0.33725)',
'_unknown_ (score = 0.13488)',
'692a88e6_nohash_1.wav',
'stop (score = 0.83974)',
'up (score = 0.15001)',
'_unknown_ (score = 0.00472)',
'96a48d28_nohash_0.wav',
'stop (score = 0.99714)',
'_unknown_ (score = 0.00153)',
'up (score = 0.00119)',
'763188c4_nohash_0.wav',
'stop (score = 0.92912)',
'_unknown_ (score = 0.04616)',
'go (score = 0.02025)',
'53fd1780_nohash_0.wav',
'stop (score = 0.26836)',
'down (score = 0.19330)',
'_unknown_ (score = 0.13893)',
'e9323bd9_nohash_0.wav',
'stop (score = 0.69810)',
'up (score = 0.15704)',
'_unknown_ (score = 0.06251)',
'686d030b_nohash_4.wav',
'stop (score = 0.99679)',
'up (score = 0.00229)',
'down (score = 0.00053)',
'fc3ba625_nohash_0.wav',
'stop (score = 0.94688)',
'_unknown_ (score = 0.02761)',
'up (score = 0.02019)',
'c4e00ee9_nohash_1.wav',
'stop (score = 0.61116)',
'up (score = 0.25302)',
'off (score = 0.04122)',
'66774579_nohash_0.wav',
'stop (score = 0.71052)',
'off (score = 0.07688)',
'up (score = 0.07429)',
'ee07dcb9_nohash_0.wav',
'stop (score = 0.90859)',
'up (score = 0.03769)',
'_unknown_ (score = 0.01578)',
'4634529e_nohash_1.wav',
'stop (score = 0.29116)',
'up (score = 0.19151)',
'off (score = 0.13839)',
'8f3f252c_nohash_0.wav',
'stop (score = 0.86171)',
'up (score = 0.10564)',
'off (score = 0.01070)',
'3e31dffe_nohash_4.wav',
'stop (score = 0.96517)',
'_unknown_ (score = 0.03305)',
'up (score = 0.00064)',
'3d794813_nohash_4.wav',
'stop (score = 0.91730)',
'up (score = 0.05711)',
'_unknown_ (score = 0.02363)',
'f15a354c_nohash_0.wav',
'stop (score = 0.99641)',
'up (score = 0.00274)',
'off (score = 0.00026)',
'c71e3acc_nohash_0.wav',
'stop (score = 0.69207)',
'up (score = 0.10310)',
'go (score = 0.07329)',
'004ae714_nohash_0.wav',
'stop (score = 0.90009)',
'off (score = 0.02894)',
'_unknown_ (score = 0.02361)',
'a16013b7_nohash_4.wav',
'stop (score = 0.57909)',
'up (score = 0.28567)',
'down (score = 0.03399)',
'c22ebf46_nohash_0.wav',
'stop (score = 0.92027)',
'up (score = 0.06529)',
'_unknown_ (score = 0.01240)',
'2a89ad5c_nohash_0.wav',
'stop (score = 0.56984)',
'up (score = 0.14682)',
'_unknown_ (score = 0.06884)',
'b2ae3928_nohash_0.wav',
'stop (score = 0.98627)',
'_unknown_ (score = 0.01248)',
'down (score = 0.00076)',
'37dca74f_nohash_3.wav',
'stop (score = 0.78066)',
'up (score = 0.07329)',
'_unknown_ (score = 0.07164)',
'3bb68054_nohash_1.wav',
'stop (score = 0.99162)',
'go (score = 0.00331)',
'up (score = 0.00219)',
'a6f2fd71_nohash_1.wav',
'stop (score = 0.59300)',
'up (score = 0.39114)',
'_unknown_ (score = 0.00468)',
'893705bb_nohash_1.wav',
'stop (score = 0.44148)',
'up (score = 0.21584)',
'go (score = 0.08655)',
'46114b4e_nohash_1.wav',
'stop (score = 0.86626)',
'up (score = 0.10812)',
'down (score = 0.00908)',
'32561e9e_nohash_0.wav',
'stop (score = 0.91559)',
'up (score = 0.03920)',
'_unknown_ (score = 0.01798)',
'513aeddf_nohash_4.wav',
'stop (score = 0.95504)',
'_unknown_ (score = 0.04000)',
'go (score = 0.00360)',
'0137b3f4_nohash_3.wav',
'stop (score = 0.74314)',
'up (score = 0.22082)',
'off (score = 0.01753)',
'85851131_nohash_1.wav',
'stop (score = 0.98796)',
'up (score = 0.01117)',
'_unknown_ (score = 0.00050)',
'28612180_nohash_0.wav',
'up (score = 0.42242)',
'stop (score = 0.14160)',
'down (score = 0.09028)',
'e07dd7d4_nohash_0.wav',
'stop (score = 0.34194)',
'up (score = 0.33417)',
'_unknown_ (score = 0.13248)',
'01bb6a2a_nohash_1.wav',
'stop (score = 0.92179)',
'up (score = 0.03660)',
'_unknown_ (score = 0.01734)',
'645ed69d_nohash_3.wav',
'stop (score = 0.99787)',
'up (score = 0.00141)',
'_unknown_ (score = 0.00037)',
'34d5aa5a_nohash_1.wav',
'stop (score = 0.83214)',
'up (score = 0.03625)',
'down (score = 0.02931)',
'333784b7_nohash_3.wav',
'stop (score = 0.97966)',
'up (score = 0.01852)',
'_unknown_ (score = 0.00124)',
'9a69672b_nohash_4.wav',
'stop (score = 0.89391)',
'up (score = 0.07879)',
'go (score = 0.01087)',
'31f01a8d_nohash_4.wav',
'stop (score = 0.97557)',
'up (score = 0.01867)',
'off (score = 0.00205)',
'0d6d7360_nohash_1.wav',
'stop (score = 0.70121)',
'up (score = 0.14588)',
'_unknown_ (score = 0.04814)',
'4a1e736b_nohash_1.wav',
'stop (score = 0.98115)',
'_unknown_ (score = 0.01325)',
'up (score = 0.00284)',
'3b4f8f24_nohash_0.wav',
'stop (score = 0.98360)',
'down (score = 0.00743)',
'_unknown_ (score = 0.00367)',
'982babaf_nohash_1.wav',
'stop (score = 0.98975)',
'_unknown_ (score = 0.00548)',
'up (score = 0.00266)',
'7fd25f7c_nohash_1.wav',
'stop (score = 0.99790)',
'_unknown_ (score = 0.00121)',
'up (score = 0.00055)',
'a7200079_nohash_3.wav',
'stop (score = 0.98081)',
'up (score = 0.00732)',
'_unknown_ (score = 0.00548)',
'af6fbbf5_nohash_0.wav',
'stop (score = 0.99353)',
'_unknown_ (score = 0.00320)',
'up (score = 0.00177)',
'e882abb2_nohash_1.wav',
'stop (score = 0.89639)',
'_unknown_ (score = 0.04231)',
'up (score = 0.04003)',
'7ff4fc72_nohash_0.wav',
'stop (score = 0.65718)',
'down (score = 0.19441)',
'_unknown_ (score = 0.05682)',
'80c45ed6_nohash_0.wav',
'stop (score = 0.94357)',
'down (score = 0.02360)',
'up (score = 0.02062)',
'fc2411fe_nohash_1.wav',
'stop (score = 0.89233)',
'up (score = 0.03672)',
'down (score = 0.02436)',
'54ad8f22_nohash_3.wav',
'stop (score = 0.51013)',
'down (score = 0.17139)',
'go (score = 0.08875)',
'692a88e6_nohash_3.wav',
'stop (score = 0.93051)',
'up (score = 0.06421)',
'_unknown_ (score = 0.00301)',
'893705bb_nohash_6.wav',
'stop (score = 0.44148)',
'up (score = 0.21584)',
'go (score = 0.08655)',
'171edea9_nohash_2.wav',
'stop (score = 0.99756)',
'up (score = 0.00147)',
'_unknown_ (score = 0.00069)',
'f0522ff4_nohash_4.wav',
'stop (score = 0.99686)',
'_unknown_ (score = 0.00174)',
'up (score = 0.00112)',
'824e8ce5_nohash_1.wav',
'stop (score = 0.86248)',
'up (score = 0.06937)',
'_unknown_ (score = 0.01894)',
'a9ca1818_nohash_4.wav',
'stop (score = 0.84088)',
'up (score = 0.14219)',
'_unknown_ (score = 0.00756)',
'48a9f771_nohash_2.wav',
'stop (score = 0.72441)',
'up (score = 0.08968)',
'off (score = 0.05914)',
'6c429c7b_nohash_1.wav',
'up (score = 0.35406)',
'stop (score = 0.30395)',
'off (score = 0.16752)',
'f035e2ea_nohash_3.wav',
'stop (score = 0.99878)',
'_unknown_ (score = 0.00053)',
'up (score = 0.00030)',
'b06c19b0_nohash_0.wav',
'stop (score = 0.99874)',
'_unknown_ (score = 0.00081)',
'up (score = 0.00021)',
'9a356ab9_nohash_0.wav',
'down (score = 0.30055)',
'_unknown_ (score = 0.19656)',
'stop (score = 0.15799)',
'0cd323ec_nohash_1.wav',
'stop (score = 0.41698)',
'off (score = 0.14234)',
'down (score = 0.12729)',
'f19c1390_nohash_0.wav',
'stop (score = 0.99386)',
'up (score = 0.00474)',
'down (score = 0.00070)',
'435695e3_nohash_0.wav',
'stop (score = 0.99527)',
'_unknown_ (score = 0.00271)',
'up (score = 0.00107)',
'179a61b7_nohash_0.wav',
'stop (score = 0.45093)',
'_unknown_ (score = 0.24628)',
'go (score = 0.06381)',
'190821dc_nohash_0.wav',
'stop (score = 0.85112)',
'up (score = 0.06795)',
'go (score = 0.03845)',
'82951cf0_nohash_1.wav',
'stop (score = 0.37752)',
'up (score = 0.32371)',
'_unknown_ (score = 0.05701)',
'bd76a7fd_nohash_4.wav',
'stop (score = 0.98600)',
'up (score = 0.01023)',
'_unknown_ (score = 0.00227)',
'b4ea0d9a_nohash_2.wav',
'stop (score = 0.95925)',
'up (score = 0.03561)',
'_unknown_ (score = 0.00178)',
'e4be0cf6_nohash_0.wav',
'stop (score = 0.43057)',
'up (score = 0.36197)',
'off (score = 0.13706)',
'626e323f_nohash_0.wav',
'stop (score = 0.99196)',
'up (score = 0.00500)',
'down (score = 0.00119)',
'3589bc72_nohash_3.wav',
'stop (score = 0.98538)',
'go (score = 0.00724)',
'_unknown_ (score = 0.00485)',
'9a7c1f83_nohash_0.wav',
'stop (score = 0.97974)',
'_unknown_ (score = 0.01620)',
'go (score = 0.00237)',
'90804775_nohash_2.wav',
'stop (score = 0.45099)',
'_unknown_ (score = 0.12778)',
'off (score = 0.09222)',
'1e412fac_nohash_0.wav',
'stop (score = 0.97331)',
'_unknown_ (score = 0.01048)',
'down (score = 0.00528)',
'72e382bd_nohash_2.wav',
'stop (score = 0.99472)',
'down (score = 0.00212)',
'_unknown_ (score = 0.00174)',
'37d38e44_nohash_0.wav',
'stop (score = 0.87186)',
'down (score = 0.06694)',
'go (score = 0.02199)',
'322d17d3_nohash_3.wav',
'stop (score = 0.99799)',
'up (score = 0.00112)',
'_unknown_ (score = 0.00045)',
'a045368c_nohash_4.wav',
'stop (score = 0.99478)',
'up (score = 0.00325)',
'off (score = 0.00098)',
'b69002d4_nohash_0.wav',
'stop (score = 0.99936)',
'up (score = 0.00028)',
'_unknown_ (score = 0.00019)',
'a7200079_nohash_2.wav',
'stop (score = 0.85205)',
'down (score = 0.12575)',
'_unknown_ (score = 0.00612)',
'5b09db89_nohash_3.wav',
'stop (score = 0.78082)',
'up (score = 0.13418)',
'_unknown_ (score = 0.03140)',
'fa446c16_nohash_3.wav',
'stop (score = 0.99090)',
'up (score = 0.00493)',
'_unknown_ (score = 0.00252)',
'b4ea0d9a_nohash_1.wav',
'stop (score = 0.95521)',
'up (score = 0.03692)',
'_unknown_ (score = 0.00382)',
'493392c6_nohash_0.wav',
'stop (score = 0.97289)',
'up (score = 0.01633)',
'_unknown_ (score = 0.00663)',
'ca4eeab0_nohash_0.wav',
'stop (score = 0.81987)',
'down (score = 0.05813)',
'up (score = 0.05399)',
'f34e6f44_nohash_0.wav',
'stop (score = 0.39819)',
'_unknown_ (score = 0.22448)',
'up (score = 0.10589)',
'92e17cc4_nohash_1.wav',
'stop (score = 0.98871)',
'up (score = 0.00868)',
'_unknown_ (score = 0.00104)',
'36050ef3_nohash_1.wav',
'stop (score = 0.87487)',
'up (score = 0.05612)',
'go (score = 0.02334)',
'6a014b29_nohash_1.wav',
'stop (score = 0.79478)',
'off (score = 0.06393)',
'_unknown_ (score = 0.04445)',
'5b26c81b_nohash_0.wav',
'stop (score = 0.92717)',
'off (score = 0.03468)',
'up (score = 0.02418)',
'88d009d2_nohash_0.wav',
'stop (score = 0.23585)',
'up (score = 0.15615)',
'_unknown_ (score = 0.11477)',
'ab00c4b2_nohash_1.wav',
'stop (score = 0.98693)',
'up (score = 0.00912)',
'down (score = 0.00194)',
'a7acbbeb_nohash_2.wav',
'stop (score = 0.91138)',
'up (score = 0.08291)',
'_unknown_ (score = 0.00190)',
'87d5e978_nohash_1.wav',
'stop (score = 0.99362)',
'up (score = 0.00550)',
'_unknown_ (score = 0.00048)',
'15dd287d_nohash_2.wav',
'stop (score = 0.60951)',
'down (score = 0.11640)',
'go (score = 0.10724)',
'69a1a79f_nohash_2.wav',
'stop (score = 0.99213)',
'up (score = 0.00532)',
'_unknown_ (score = 0.00173)',
'd3831f6a_nohash_0.wav',
'stop (score = 0.99898)',
'up (score = 0.00088)',
'_unknown_ (score = 0.00011)',
'd9b8fab2_nohash_1.wav',
'stop (score = 0.87669)',
'off (score = 0.03555)',
'_unknown_ (score = 0.03319)',
'2dc4f05d_nohash_2.wav',
'stop (score = 0.72243)',
'_unknown_ (score = 0.16989)',
'down (score = 0.03805)',
'5af0ca83_nohash_0.wav',
'stop (score = 0.93210)',
'up (score = 0.03181)',
'down (score = 0.01050)',
'1dc86f91_nohash_2.wav',
'stop (score = 0.99897)',
'_unknown_ (score = 0.00045)',
'up (score = 0.00029)',
'89947bd7_nohash_0.wav',
'stop (score = 0.50612)',
'up (score = 0.47685)',
'off (score = 0.00776)',
'b528edb3_nohash_1.wav',
'stop (score = 0.71747)',
'down (score = 0.13705)',
'_unknown_ (score = 0.06468)',
'c1d39ce8_nohash_0.wav',
'stop (score = 0.66247)',
'go (score = 0.17296)',
'_unknown_ (score = 0.07923)',
'b93528e3_nohash_0.wav',
'stop (score = 0.32753)',
'up (score = 0.17626)',
'off (score = 0.11351)',
'c4a7a867_nohash_0.wav',
'stop (score = 0.98290)',
'up (score = 0.00738)',
'_unknown_ (score = 0.00650)',
'6d1dcca6_nohash_0.wav',
'stop (score = 0.68633)',
'up (score = 0.08010)',
'off (score = 0.07813)',
'26e573a9_nohash_0.wav',
'stop (score = 0.81587)',
'down (score = 0.05165)',
'up (score = 0.04305)',
'5188de0d_nohash_0.wav',
'stop (score = 0.22536)',
'_unknown_ (score = 0.18580)',
'go (score = 0.12804)',
'0585b66d_nohash_3.wav',
'stop (score = 0.94474)',
'up (score = 0.02064)',
'_unknown_ (score = 0.01619)',
'1b835b87_nohash_1.wav',
'stop (score = 0.94834)',
'up (score = 0.01997)',
'down (score = 0.01052)',
'aff582a1_nohash_3.wav',
'stop (score = 0.99678)',
'_unknown_ (score = 0.00310)',
'down (score = 0.00006)',
'f9273a21_nohash_1.wav',
'stop (score = 0.95208)',
'up (score = 0.04081)',
'_unknown_ (score = 0.00239)',
'4a4e28f1_nohash_0.wav',
'stop (score = 0.26048)',
'down (score = 0.16748)',
'go (score = 0.13965)',
'1ed0b13d_nohash_3.wav',
'stop (score = 0.99921)',
'_unknown_ (score = 0.00047)',
'up (score = 0.00021)',
'3ec05c3d_nohash_0.wav',
'stop (score = 0.54378)',
'off (score = 0.19993)',
'_unknown_ (score = 0.07233)',
'7846fd85_nohash_3.wav',
'stop (score = 0.92796)',
'down (score = 0.02070)',
'_unknown_ (score = 0.01309)',
'8dc18a75_nohash_0.wav',
'stop (score = 0.78939)',
'up (score = 0.19312)',
'_unknown_ (score = 0.00795)',
'7846fd85_nohash_4.wav',
'stop (score = 0.89709)',
'down (score = 0.03361)',
'_unknown_ (score = 0.01997)',
'f5626af6_nohash_3.wav',
'stop (score = 0.94139)',
'_unknown_ (score = 0.04825)',
'up (score = 0.00626)',
'ffd2ba2f_nohash_3.wav',
'stop (score = 0.99623)',
'_unknown_ (score = 0.00219)',
'up (score = 0.00138)',
'513aeddf_nohash_2.wav',
'stop (score = 0.89436)',
'_unknown_ (score = 0.05919)',
'go (score = 0.02806)',
'551e42e8_nohash_0.wav',
'stop (score = 0.45233)',
'_unknown_ (score = 0.16313)',
'off (score = 0.11665)',
'26e573a9_nohash_1.wav',
'stop (score = 0.93726)',
'down (score = 0.02824)',
'up (score = 0.01093)',
'9be15e93_nohash_3.wav',
'stop (score = 0.99688)',
'up (score = 0.00160)',
'_unknown_ (score = 0.00101)',
'264f471d_nohash_1.wav',
'stop (score = 0.80808)',
'_unknown_ (score = 0.10039)',
'up (score = 0.03336)',
'b959cd0c_nohash_0.wav',
'stop (score = 0.24488)',
'_unknown_ (score = 0.11815)',
'left (score = 0.10235)',
'bd76a7fd_nohash_2.wav',
'stop (score = 0.99836)',
'up (score = 0.00083)',
'_unknown_ (score = 0.00038)',
'74241b28_nohash_1.wav',
'stop (score = 0.97935)',
'_unknown_ (score = 0.01288)',
'up (score = 0.00481)',
'6ef407da_nohash_1.wav',
'stop (score = 0.98262)',
'up (score = 0.01171)',
'_unknown_ (score = 0.00243)',
'51eefcc6_nohash_0.wav',
'stop (score = 0.64022)',
'go (score = 0.16679)',
'no (score = 0.06918)',
'a827e3a1_nohash_3.wav',
'stop (score = 0.76328)',
'off (score = 0.15193)',
'up (score = 0.02800)',
'0d82fd99_nohash_3.wav',
'stop (score = 0.92439)',
'up (score = 0.05283)',
'_unknown_ (score = 0.00782)',
'5efb758c_nohash_0.wav',
'stop (score = 0.79350)',
'go (score = 0.05066)',
'up (score = 0.04322)',
'6094340e_nohash_1.wav',
'stop (score = 0.98467)',
'down (score = 0.00778)',
'_unknown_ (score = 0.00234)',
'067f61e2_nohash_3.wav',
'stop (score = 0.99034)',
'up (score = 0.00893)',
'_unknown_ (score = 0.00047)',
'54d9ccb5_nohash_0.wav',
'stop (score = 0.98414)',
'up (score = 0.01113)',
'down (score = 0.00169)',
'01b4757a_nohash_0.wav',
'stop (score = 0.67216)',
'up (score = 0.13359)',
'_unknown_ (score = 0.05075)',
'953fe1ad_nohash_2.wav',
'stop (score = 0.78502)',
'up (score = 0.10579)',
'down (score = 0.05232)',
'af790082_nohash_0.wav',
'stop (score = 0.96869)',
'up (score = 0.02890)',
'_unknown_ (score = 0.00105)',
'9a7c1f83_nohash_4.wav',
'stop (score = 0.95722)',
'go (score = 0.02449)',
'down (score = 0.00590)',
'94de6a6a_nohash_1.wav',
'up (score = 0.88006)',
'_unknown_ (score = 0.03862)',
'stop (score = 0.02416)',
'332d33b1_nohash_0.wav',
'stop (score = 0.44896)',
'down (score = 0.21524)',
'no (score = 0.15847)',
'674ca5ea_nohash_0.wav',
'stop (score = 0.93532)',
'up (score = 0.02308)',
'off (score = 0.01062)',
'b97c9f77_nohash_0.wav',
'stop (score = 0.65481)',
'down (score = 0.10290)',
'_unknown_ (score = 0.09691)',
'bfd26d6b_nohash_1.wav',
'stop (score = 0.83370)',
'up (score = 0.15832)',
'_unknown_ (score = 0.00376)',
'f632210f_nohash_1.wav',
'stop (score = 0.68990)',
'up (score = 0.12806)',
'down (score = 0.05177)',
'4290ca61_nohash_0.wav',
'stop (score = 0.86563)',
'off (score = 0.03857)',
'up (score = 0.03350)',
'893705bb_nohash_8.wav',
'stop (score = 0.71492)',
'up (score = 0.16641)',
'go (score = 0.04935)',
'3bfd30e6_nohash_0.wav',
'stop (score = 0.63599)',
'go (score = 0.14249)',
'up (score = 0.10391)',
'8ff44869_nohash_1.wav',
'stop (score = 0.79370)',
'_unknown_ (score = 0.06695)',
'down (score = 0.06271)',
'439c84f4_nohash_3.wav',
'stop (score = 0.93857)',
'go (score = 0.02124)',
'down (score = 0.01481)',
'd264f7b6_nohash_2.wav',
'stop (score = 0.35819)',
'up (score = 0.19114)',
'down (score = 0.08839)',
'b414c653_nohash_4.wav',
'up (score = 0.58615)',
'stop (score = 0.25748)',
'off (score = 0.13660)',
'0d85a428_nohash_0.wav',
'stop (score = 0.16377)',
'up (score = 0.14177)',
'_unknown_ (score = 0.14092)',
'525eaa62_nohash_2.wav',
'stop (score = 0.99972)',
'up (score = 0.00016)',
'_unknown_ (score = 0.00011)',
'6aafb34f_nohash_0.wav',
'stop (score = 0.77719)',
'up (score = 0.11667)',
'_unknown_ (score = 0.03379)',
'226537ab_nohash_0.wav',
'stop (score = 0.97408)',
'up (score = 0.02085)',
'_unknown_ (score = 0.00265)',
'559bc36a_nohash_1.wav',
'stop (score = 0.97451)',
'go (score = 0.01067)',
'_unknown_ (score = 0.00638)',
'df280250_nohash_1.wav',
'stop (score = 0.98016)',
'down (score = 0.00753)',
'go (score = 0.00371)',
'b5cf6ea8_nohash_4.wav',
'stop (score = 0.99810)',
'up (score = 0.00149)',
'_unknown_ (score = 0.00032)',
'587f3271_nohash_0.wav',
'stop (score = 0.85410)',
'_unknown_ (score = 0.06047)',
'up (score = 0.02200)',
'1ecfb537_nohash_4.wav',
'stop (score = 0.99523)',
'_unknown_ (score = 0.00262)',
'left (score = 0.00094)',
'321aba74_nohash_1.wav',
'stop (score = 0.99719)',
'_unknown_ (score = 0.00182)',
'up (score = 0.00048)',
'5170b77f_nohash_2.wav',
'stop (score = 0.97167)',
'up (score = 0.01566)',
'_unknown_ (score = 0.00619)',
'd78858d9_nohash_0.wav',
'stop (score = 0.52191)',
'up (score = 0.22323)',
'left (score = 0.05919)',
'113b3fbc_nohash_0.wav',
'stop (score = 0.64668)',
'down (score = 0.20922)',
'no (score = 0.04130)',
'3d9200b9_nohash_0.wav',
'stop (score = 0.61945)',
'up (score = 0.12795)',
'off (score = 0.09970)',
'06f6c194_nohash_4.wav',
'stop (score = 0.58292)',
'up (score = 0.39636)',
'off (score = 0.00788)',
'093f65a1_nohash_0.wav',
'stop (score = 0.96773)',
'_unknown_ (score = 0.00939)',
'up (score = 0.00694)',
'e41a903b_nohash_0.wav',
'stop (score = 0.99393)',
'_unknown_ (score = 0.00234)',
'up (score = 0.00195)',
'1b88bf70_nohash_0.wav',
'stop (score = 0.89692)',
'up (score = 0.04895)',
'off (score = 0.02443)',
'1ffd513b_nohash_0.wav',
'stop (score = 0.99402)',
'up (score = 0.00317)',
'down (score = 0.00159)',
'ecef25ba_nohash_0.wav',
'stop (score = 0.18026)',
'left (score = 0.13037)',
'down (score = 0.12699)',
'837a0f64_nohash_2.wav',
'stop (score = 0.99877)',
'_unknown_ (score = 0.00084)',
'up (score = 0.00034)',
'64220627_nohash_1.wav',
'stop (score = 0.48925)',
'_unknown_ (score = 0.12916)',
'right (score = 0.11773)',
'bd8412df_nohash_1.wav',
'stop (score = 0.97986)',
'_unknown_ (score = 0.01157)',
'up (score = 0.00352)',
'ed3c2d05_nohash_0.wav',
'stop (score = 0.98379)',
'up (score = 0.01037)',
'_unknown_ (score = 0.00274)',
'578d3efb_nohash_3.wav',
'stop (score = 0.99270)',
'up (score = 0.00675)',
'_unknown_ (score = 0.00050)',
'3df9a3d4_nohash_0.wav',
'stop (score = 0.27674)',
'up (score = 0.25178)',
'no (score = 0.10935)',
'2f0ce4d9_nohash_2.wav',
'stop (score = 0.86411)',
'off (score = 0.06190)',
'_unknown_ (score = 0.03669)',
'b3bdded5_nohash_2.wav',
'stop (score = 0.88871)',
'up (score = 0.09937)',
'_unknown_ (score = 0.00315)',
'a8e25ebb_nohash_0.wav',
'stop (score = 0.48994)',
'up (score = 0.12429)',
'no (score = 0.10686)',
'ad63d93c_nohash_1.wav',
'stop (score = 0.51262)',
'up (score = 0.38220)',
'_unknown_ (score = 0.05622)',
'4c3cddb8_nohash_4.wav',
'stop (score = 0.99406)',
'up (score = 0.00474)',
'_unknown_ (score = 0.00085)',
'0132a06d_nohash_2.wav',
'stop (score = 0.98721)',
'up (score = 0.01022)',
'_unknown_ (score = 0.00182)',
'7846fd85_nohash_1.wav',
'stop (score = 0.92152)',
'down (score = 0.04298)',
'no (score = 0.00939)',
'a04817c2_nohash_1.wav',
'stop (score = 0.98223)',
'up (score = 0.01197)',
'down (score = 0.00265)',
'605ed0ff_nohash_0.wav',
'stop (score = 0.45783)',
'up (score = 0.13047)',
'_unknown_ (score = 0.08471)',
'90e72357_nohash_2.wav',
'stop (score = 0.98573)',
'_unknown_ (score = 0.00698)',
'go (score = 0.00461)',
'2da58b32_nohash_4.wav',
'stop (score = 0.79965)',
'off (score = 0.06937)',
'up (score = 0.04585)',
'28ce0c58_nohash_1.wav',
'go (score = 0.48640)',
'up (score = 0.16947)',
'_unknown_ (score = 0.15874)',
'24a3e589_nohash_2.wav',
'stop (score = 0.98158)',
'up (score = 0.01708)',
'_unknown_ (score = 0.00080)',
'2579e514_nohash_1.wav',
'stop (score = 0.48661)',
'up (score = 0.15354)',
'down (score = 0.10605)',
'437455be_nohash_0.wav',
'stop (score = 0.33172)',
'up (score = 0.12149)',
'_unknown_ (score = 0.10527)',
'1b4c9b89_nohash_0.wav',
'stop (score = 0.98440)',
'up (score = 0.00508)',
'_unknown_ (score = 0.00465)',
'f798ac78_nohash_4.wav',
'stop (score = 0.95439)',
'up (score = 0.03564)',
'_unknown_ (score = 0.00394)',
'29229c21_nohash_1.wav',
'stop (score = 0.99749)',
'up (score = 0.00134)',
'_unknown_ (score = 0.00095)',
'd78858d9_nohash_2.wav',
'stop (score = 0.70006)',
'up (score = 0.20123)',
'left (score = 0.03805)',
'5ebc1cda_nohash_6.wav',
'stop (score = 0.63192)',
'_unknown_ (score = 0.15333)',
'go (score = 0.12291)',
'ab46af55_nohash_0.wav',
'stop (score = 0.99821)',
'up (score = 0.00156)',
'_unknown_ (score = 0.00010)',
'2296b1af_nohash_1.wav',
'stop (score = 0.92353)',
'_unknown_ (score = 0.02751)',
'down (score = 0.02737)',
'9a69672b_nohash_2.wav',
'stop (score = 0.99818)',
'up (score = 0.00158)',
'_unknown_ (score = 0.00013)',
'964e8cfd_nohash_1.wav',
'stop (score = 0.95619)',
'up (score = 0.02845)',
'_unknown_ (score = 0.00523)',
'cb72dfb6_nohash_0.wav',
'stop (score = 0.93942)',
'up (score = 0.03724)',
'go (score = 0.00980)',
'9be15e93_nohash_4.wav',
'stop (score = 0.99567)',
'up (score = 0.00360)',
'_unknown_ (score = 0.00044)',
'5ebc1cda_nohash_1.wav',
'stop (score = 0.75682)',
'_unknown_ (score = 0.09401)',
'down (score = 0.04084)',
'1e9b215e_nohash_1.wav',
'stop (score = 0.46735)',
'down (score = 0.11926)',
'_unknown_ (score = 0.11014)',
'28ce0c58_nohash_4.wav',
'_unknown_ (score = 0.35277)',
'stop (score = 0.25628)',
'up (score = 0.22787)',
'0135f3f2_nohash_1.wav',
'down (score = 0.64915)',
'stop (score = 0.13520)',
'go (score = 0.06183)',
'742d6431_nohash_0.wav',
'stop (score = 0.51185)',
'up (score = 0.45758)',
'off (score = 0.01466)',
'563aa4e6_nohash_3.wav',
'stop (score = 0.98674)',
'_unknown_ (score = 0.00840)',
'up (score = 0.00273)',
'bbaa7946_nohash_0.wav',
'stop (score = 0.99574)',
'down (score = 0.00201)',
'_unknown_ (score = 0.00110)',
'a7acbbeb_nohash_1.wav',
'stop (score = 0.78802)',
'up (score = 0.11744)',
'down (score = 0.04498)',
'ec21c46b_nohash_0.wav',
'stop (score = 0.97416)',
'_unknown_ (score = 0.01517)',
'up (score = 0.00404)',
'18ffa72d_nohash_1.wav',
'off (score = 0.16264)',
'_silence_ (score = 0.11814)',
'yes (score = 0.09154)',
'32ad5b65_nohash_0.wav',
'stop (score = 0.99422)',
'up (score = 0.00445)',
'_unknown_ (score = 0.00075)',
'fd32732a_nohash_0.wav',
'stop (score = 0.98304)',
'_unknown_ (score = 0.00850)',
'down (score = 0.00307)',
'a2b16113_nohash_0.wav',
'stop (score = 0.91704)',
'up (score = 0.07075)',
'off (score = 0.00562)',
'92b0a735_nohash_0.wav',
'stop (score = 0.96610)',
'up (score = 0.01413)',
'down (score = 0.00977)',
'6124b431_nohash_1.wav',
'stop (score = 0.91341)',
'up (score = 0.04878)',
'down (score = 0.01150)',
'7fb8d703_nohash_2.wav',
'stop (score = 0.98021)',
'up (score = 0.01911)',
'_unknown_ (score = 0.00029)',
'5e3dde6b_nohash_1.wav',
'stop (score = 0.87034)',
'up (score = 0.06751)',
'_unknown_ (score = 0.04405)',
'ce7a8e92_nohash_1.wav',
'stop (score = 0.13540)',
'up (score = 0.11850)',
'_unknown_ (score = 0.10476)',
'c0fb6812_nohash_0.wav',
'stop (score = 0.67461)',
'up (score = 0.11249)',
'down (score = 0.05729)',
'c79159aa_nohash_4.wav',
'stop (score = 0.98290)',
'up (score = 0.00918)',
'_unknown_ (score = 0.00575)',
'b3327675_nohash_1.wav',
'stop (score = 0.49829)',
'down (score = 0.13202)',
'_unknown_ (score = 0.11668)',
'87070229_nohash_4.wav',
'stop (score = 0.98843)',
'_unknown_ (score = 0.00946)',
'go (score = 0.00120)',
'a5d1becc_nohash_2.wav',
'stop (score = 0.42095)',
'_unknown_ (score = 0.13196)',
'up (score = 0.09977)',
'0ff728b5_nohash_3.wav',
'stop (score = 0.97009)',
'_unknown_ (score = 0.02019)',
'go (score = 0.00626)',
'964e8cfd_nohash_4.wav',
'stop (score = 0.98068)',
'up (score = 0.00856)',
'_unknown_ (score = 0.00304)',
'017c4098_nohash_4.wav',
'stop (score = 0.99057)',
'up (score = 0.00858)',
'down (score = 0.00050)',
'b72e58c9_nohash_0.wav',
'stop (score = 0.98696)',
'_unknown_ (score = 0.00550)',
'up (score = 0.00223)',
'ef77b778_nohash_2.wav',
'stop (score = 0.93676)',
'up (score = 0.06078)',
'_unknown_ (score = 0.00106)',
'0f3f64d5_nohash_1.wav',
'stop (score = 0.84753)',
'up (score = 0.11528)',
'down (score = 0.01605)',
'131e738d_nohash_4.wav',
'stop (score = 0.86223)',
'_unknown_ (score = 0.04718)',
'go (score = 0.03105)',
'bab36420_nohash_1.wav',
'stop (score = 0.99879)',
'up (score = 0.00114)',
'_unknown_ (score = 0.00003)',
'27c24504_nohash_0.wav',
'stop (score = 0.91839)',
'up (score = 0.03990)',
'_unknown_ (score = 0.01542)',
'21e8c417_nohash_0.wav',
'no (score = 0.16717)',
'go (score = 0.14407)',
'down (score = 0.13581)',
'54d9ccb5_nohash_1.wav',
'stop (score = 0.98426)',
'up (score = 0.01385)',
'_unknown_ (score = 0.00076)',
'9448c397_nohash_4.wav',
'stop (score = 0.43997)',
'_unknown_ (score = 0.21405)',
'go (score = 0.05974)',
'c7dc7278_nohash_0.wav',
'stop (score = 0.99714)',
'up (score = 0.00277)',
'_unknown_ (score = 0.00007)',
...]
That was not pretty! We’d better define some helper functions to extract the model’s guesses from that messy output:
[15]:
def get_guesses(scores):
scores = filter_scores(scores)
if len(scores) % 4 != 0:
raise ValueError(f"Expected scores list to have a length divisible by 4 after filtering but got length {len(scores)}")
num_files = len(scores) / 4
fnames = scores[0::4]
guesses = [guess.split(' ')[0] for guess in scores[1::4]]
return zip(fnames, guesses)
def score_directory(directory):
scores = !python {example_path}/label_wav_dir.py \
--graph={example_path}/trained_model/my_frozen_graph.pb \
--labels={example_path}/trained_model/conv_labels.txt \
--wav_dir={directory}
return filter_scores(scores)
Define a function to generate errors in all wav files in a given directory. If an inclusion list is provided, only files on the list will be processed.
[16]:
def errorify_directory(data_root_dir, dir_name, tree_root, err_params, inclusion_list=None):
clean_data_dir = data_root_dir / dir_name
if not clean_data_dir.exists():
raise ValueError(f"Directory {clean_data_dir} does not exist.")
err_data_dir = data_root_dir / (dir_name + "_err")
if not err_data_dir.exists():
err_data_dir.mkdir()
if not inclusion_list:
inclusion_list = [f for f in clean_data_dir.iterdir() if ".wav" in str(f)]
for file in inclusion_list:
fname = file.name
wav = read(file)
clipped = tree_root.generate_error([wav], err_params)[0]
err_file_path = err_data_dir / fname
write(err_file_path, clipped[0], clipped[1])
return err_data_dir
Define a function to generate errors in all wav files on a list. The function is needed when files from multiple categories are present on the list. To facilitate comparisons between clean and errorified data, the clean files the list can be automatically copied to suitably named directories. To do this, provide the parameter copy_clean=True.
[17]:
def errorify_list(data_files, categories, tree_root, err_params, copy_clean=False):
data_root_dir = data_files[0].parents[1]
for cat in categories:
files_in_cat = [f for f in data_files if (cat + "/") in str(f)]
print("category:", cat)
print(f"{len(files_in_cat)}")
errorify_directory(data_root_dir, cat, tree_root, err_params, inclusion_list=files_in_cat)
if copy_clean:
copy_dir = data_root_dir / (cat + "_clean")
copy_dir.mkdir(exist_ok=True)
for file in files_in_cat:
shutil.copy(file, copy_dir)
Define a function to compare the model’s guesses on clean and errorified data. The results are returned in a Pandas dataframe.
[18]:
def compare(data_root, category, clean_ext="_clean", err_ext="_err"):
scores_clean = score_directory(data_root / (category + clean_ext))
guesses_clean = get_guesses(scores_clean)
scores_err = score_directory(data_root / (category + err_ext))
guesses_err = get_guesses(scores_err)
df_clean = pd.DataFrame(guesses_clean, columns=["file", "clean_guess"])
df_err = pd.DataFrame(guesses_err, columns=["file", "err_guess"])
res = pd.merge(df_clean, df_err, on="file", how="inner")
res['true_label'] = category
return res
Generate errors in all test set audio clips.
[19]:
errorify_list(test_set_files, trained_categories, root_node, err_params, copy_clean=True)
category: yes
419
category: no
405
category: up
425
category: down
406
category: left
412
category: right
396
category: on
396
category: off
402
category: stop
411
category: go
402
Run model on clean and errorified data.
[20]:
results = [compare(data_dir, cat) for cat in trained_categories]
df = pd.concat(results)
Create confusion matrices for clean and errorified data, respectively.
[21]:
cm_clean = confusion_matrix(df['true_label'], df['clean_guess'], labels=labels)
cm_err = confusion_matrix(df['true_label'], df['err_guess'], labels=labels)
Visualize the confusion matrix for the clean data.
[22]:
visualize_confusion_matrix(df, cm_clean, 0, labels, "dyn_range", "true_label", "clean_guess")
Visualize the confusion matrix for the errorified data.
[23]:
visualize_confusion_matrix(df, cm_err, 0, labels, "dyn_range", "true_label", "err_guess")
The notebook for this case study can be found here.