Spoken commands example

This example uses an audio classifier model from a Tensorflow tutorial: https://www.tensorflow.org/tutorials/sequences/audio_recognition

N.B. This script downloads a large (2.3GB) speech commands dataset!

[1]:
import sys
sys.path.append('..')
from pathlib import Path
import tarfile
import shutil
import pandas as pd
from scipy.io.wavfile import read, write
from sklearn.metrics import confusion_matrix
from dpemu.nodes.series import Series
from dpemu.nodes.tuple import Tuple
from dpemu.filters.sound import ClipWAV
from dpemu.filters.common import ApplyToTuple
from dpemu.plotting_utils import visualize_confusion_matrix

First we download the dataset unless it is already present. If you have downloaded and extracted the dataset into a different directory, change the data_dir variable accordingly.

[2]:
data_url = "https://storage.googleapis.com/download.tensorflow.org/data/speech_commands_v0.02.tar.gz"
fname = "speech_commands_v0.02.tar.gz"
data_dir = Path.home() / "datasets/speech_data"

if not data_dir.exists():
    data_dir.mkdir(parents=True)
    !wget {data_url} -P {data_dir}
    tarfile.open(data_dir / fname, "r:gz").extractall(data_dir)
[3]:
trained_categories = ["yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go"]
labels = ["_silence_", "_unknown_", "yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go"]

test_set_rel_paths = !cat {data_dir / "testing_list.txt"}
test_set_files = [data_dir / p for p in test_set_rel_paths]
test_categories = !cut -d'/' -f1 {data_dir / "testing_list.txt"} | sort -u

len(test_set_files), len(test_categories), len(trained_categories)
[3]:
(11005, 35, 10)

In order to download the speech commands dataset to the correct place, we need to set the variables dpemu_path and example_path.

[4]:
dpemu_path = Path.cwd().parents[1]
example_path = dpemu_path / "examples/speech_commands"

Choose a category in which to generate errors. Later on we will generate errors in all of the test set categories.

[5]:
category = "stop"
data_subset_dir = data_dir / category

fs = list(data_subset_dir.iterdir())
wavs = [read(f) for f in data_subset_dir.iterdir()]

Create an error generating tree and generate errors in the category chosen above.

[6]:
wav_node = Tuple()
wav_node.addfilter(ApplyToTuple(ClipWAV("dyn_range"), 1))
root_node = Series(wav_node)

err_params = {"dyn_range": .2}
clipped = root_node.generate_error(wavs, err_params)

Now we arbitrarily choose a speech command example from the data subset. To try another audio clip, change the index.

[7]:
example_index = 123
[8]:
clipped_filename = data_dir / 'clipped.wav'
write(clipped_filename, 16000, clipped[example_index][1])
[9]:
!aplay {fs[example_index]}
Playing WAVE '/home/jpssilve/datasets/speech_data/stop/3ec05c3d_nohash_0.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono
[10]:
!aplay {clipped_filename}
Playing WAVE '/home/jpssilve/datasets/speech_data/clipped.wav' : Signed 16 bit Little Endian, Rate 16000 Hz, Mono

Define a function to filter out irrelevant output (e.g. Python deprecation warnings):

[11]:
def filter_scores(output):
    return [line for line in output if "score" in line or ".wav" in line]

Run the model on the clean clip selected above:

[12]:
scores_clean = !python {example_path}/label_wav.py \
--graph={example_path}/trained_model/my_frozen_graph.pb \
--labels={example_path}/trained_model/conv_labels.txt \
--wav={fs[example_index]}

filter_scores(scores_clean)
[12]:
['stop (score = 0.54378)',
 'off (score = 0.19993)',
 '_unknown_ (score = 0.07233)']

Run the model on the corresponding errorified clip:

[13]:
scores_clipped = !python {example_path}/label_wav.py \
--graph={example_path}/trained_model/my_frozen_graph.pb \
--labels={example_path}/trained_model/conv_labels.txt \
--wav={clipped_filename}

filter_scores(scores_clipped)
[13]:
['stop (score = 0.22963)',
 'down (score = 0.16858)',
 '_unknown_ (score = 0.11415)']

You can also run the model on an entire directory of .wav files in one go:

[14]:
scores_clean_dir = !python {example_path}/label_wav_dir.py \
--graph={example_path}/trained_model/my_frozen_graph.pb \
--labels={example_path}/trained_model/conv_labels.txt \
--wav_dir={data_subset_dir}

filter_scores(scores_clean_dir)
[14]:
['0f46028a_nohash_4.wav',
 'stop (score = 0.84888)',
 'up (score = 0.10150)',
 '_unknown_ (score = 0.02897)',
 '095847e4_nohash_0.wav',
 'stop (score = 0.83839)',
 'up (score = 0.10791)',
 'down (score = 0.01377)',
 'f8ba7c0e_nohash_1.wav',
 'stop (score = 0.99616)',
 'down (score = 0.00215)',
 '_unknown_ (score = 0.00114)',
 '4cee0c60_nohash_1.wav',
 'stop (score = 0.94652)',
 'up (score = 0.04828)',
 '_unknown_ (score = 0.00210)',
 '52e228e9_nohash_1.wav',
 'stop (score = 0.98153)',
 'down (score = 0.00989)',
 'up (score = 0.00290)',
 '42f81601_nohash_0.wav',
 'stop (score = 0.95047)',
 'up (score = 0.02973)',
 '_unknown_ (score = 0.01149)',
 'bc065a17_nohash_1.wav',
 'stop (score = 0.51887)',
 'down (score = 0.33725)',
 '_unknown_ (score = 0.13488)',
 '692a88e6_nohash_1.wav',
 'stop (score = 0.83974)',
 'up (score = 0.15001)',
 '_unknown_ (score = 0.00472)',
 '96a48d28_nohash_0.wav',
 'stop (score = 0.99714)',
 '_unknown_ (score = 0.00153)',
 'up (score = 0.00119)',
 '763188c4_nohash_0.wav',
 'stop (score = 0.92912)',
 '_unknown_ (score = 0.04616)',
 'go (score = 0.02025)',
 '53fd1780_nohash_0.wav',
 'stop (score = 0.26836)',
 'down (score = 0.19330)',
 '_unknown_ (score = 0.13893)',
 'e9323bd9_nohash_0.wav',
 'stop (score = 0.69810)',
 'up (score = 0.15704)',
 '_unknown_ (score = 0.06251)',
 '686d030b_nohash_4.wav',
 'stop (score = 0.99679)',
 'up (score = 0.00229)',
 'down (score = 0.00053)',
 'fc3ba625_nohash_0.wav',
 'stop (score = 0.94688)',
 '_unknown_ (score = 0.02761)',
 'up (score = 0.02019)',
 'c4e00ee9_nohash_1.wav',
 'stop (score = 0.61116)',
 'up (score = 0.25302)',
 'off (score = 0.04122)',
 '66774579_nohash_0.wav',
 'stop (score = 0.71052)',
 'off (score = 0.07688)',
 'up (score = 0.07429)',
 'ee07dcb9_nohash_0.wav',
 'stop (score = 0.90859)',
 'up (score = 0.03769)',
 '_unknown_ (score = 0.01578)',
 '4634529e_nohash_1.wav',
 'stop (score = 0.29116)',
 'up (score = 0.19151)',
 'off (score = 0.13839)',
 '8f3f252c_nohash_0.wav',
 'stop (score = 0.86171)',
 'up (score = 0.10564)',
 'off (score = 0.01070)',
 '3e31dffe_nohash_4.wav',
 'stop (score = 0.96517)',
 '_unknown_ (score = 0.03305)',
 'up (score = 0.00064)',
 '3d794813_nohash_4.wav',
 'stop (score = 0.91730)',
 'up (score = 0.05711)',
 '_unknown_ (score = 0.02363)',
 'f15a354c_nohash_0.wav',
 'stop (score = 0.99641)',
 'up (score = 0.00274)',
 'off (score = 0.00026)',
 'c71e3acc_nohash_0.wav',
 'stop (score = 0.69207)',
 'up (score = 0.10310)',
 'go (score = 0.07329)',
 '004ae714_nohash_0.wav',
 'stop (score = 0.90009)',
 'off (score = 0.02894)',
 '_unknown_ (score = 0.02361)',
 'a16013b7_nohash_4.wav',
 'stop (score = 0.57909)',
 'up (score = 0.28567)',
 'down (score = 0.03399)',
 'c22ebf46_nohash_0.wav',
 'stop (score = 0.92027)',
 'up (score = 0.06529)',
 '_unknown_ (score = 0.01240)',
 '2a89ad5c_nohash_0.wav',
 'stop (score = 0.56984)',
 'up (score = 0.14682)',
 '_unknown_ (score = 0.06884)',
 'b2ae3928_nohash_0.wav',
 'stop (score = 0.98627)',
 '_unknown_ (score = 0.01248)',
 'down (score = 0.00076)',
 '37dca74f_nohash_3.wav',
 'stop (score = 0.78066)',
 'up (score = 0.07329)',
 '_unknown_ (score = 0.07164)',
 '3bb68054_nohash_1.wav',
 'stop (score = 0.99162)',
 'go (score = 0.00331)',
 'up (score = 0.00219)',
 'a6f2fd71_nohash_1.wav',
 'stop (score = 0.59300)',
 'up (score = 0.39114)',
 '_unknown_ (score = 0.00468)',
 '893705bb_nohash_1.wav',
 'stop (score = 0.44148)',
 'up (score = 0.21584)',
 'go (score = 0.08655)',
 '46114b4e_nohash_1.wav',
 'stop (score = 0.86626)',
 'up (score = 0.10812)',
 'down (score = 0.00908)',
 '32561e9e_nohash_0.wav',
 'stop (score = 0.91559)',
 'up (score = 0.03920)',
 '_unknown_ (score = 0.01798)',
 '513aeddf_nohash_4.wav',
 'stop (score = 0.95504)',
 '_unknown_ (score = 0.04000)',
 'go (score = 0.00360)',
 '0137b3f4_nohash_3.wav',
 'stop (score = 0.74314)',
 'up (score = 0.22082)',
 'off (score = 0.01753)',
 '85851131_nohash_1.wav',
 'stop (score = 0.98796)',
 'up (score = 0.01117)',
 '_unknown_ (score = 0.00050)',
 '28612180_nohash_0.wav',
 'up (score = 0.42242)',
 'stop (score = 0.14160)',
 'down (score = 0.09028)',
 'e07dd7d4_nohash_0.wav',
 'stop (score = 0.34194)',
 'up (score = 0.33417)',
 '_unknown_ (score = 0.13248)',
 '01bb6a2a_nohash_1.wav',
 'stop (score = 0.92179)',
 'up (score = 0.03660)',
 '_unknown_ (score = 0.01734)',
 '645ed69d_nohash_3.wav',
 'stop (score = 0.99787)',
 'up (score = 0.00141)',
 '_unknown_ (score = 0.00037)',
 '34d5aa5a_nohash_1.wav',
 'stop (score = 0.83214)',
 'up (score = 0.03625)',
 'down (score = 0.02931)',
 '333784b7_nohash_3.wav',
 'stop (score = 0.97966)',
 'up (score = 0.01852)',
 '_unknown_ (score = 0.00124)',
 '9a69672b_nohash_4.wav',
 'stop (score = 0.89391)',
 'up (score = 0.07879)',
 'go (score = 0.01087)',
 '31f01a8d_nohash_4.wav',
 'stop (score = 0.97557)',
 'up (score = 0.01867)',
 'off (score = 0.00205)',
 '0d6d7360_nohash_1.wav',
 'stop (score = 0.70121)',
 'up (score = 0.14588)',
 '_unknown_ (score = 0.04814)',
 '4a1e736b_nohash_1.wav',
 'stop (score = 0.98115)',
 '_unknown_ (score = 0.01325)',
 'up (score = 0.00284)',
 '3b4f8f24_nohash_0.wav',
 'stop (score = 0.98360)',
 'down (score = 0.00743)',
 '_unknown_ (score = 0.00367)',
 '982babaf_nohash_1.wav',
 'stop (score = 0.98975)',
 '_unknown_ (score = 0.00548)',
 'up (score = 0.00266)',
 '7fd25f7c_nohash_1.wav',
 'stop (score = 0.99790)',
 '_unknown_ (score = 0.00121)',
 'up (score = 0.00055)',
 'a7200079_nohash_3.wav',
 'stop (score = 0.98081)',
 'up (score = 0.00732)',
 '_unknown_ (score = 0.00548)',
 'af6fbbf5_nohash_0.wav',
 'stop (score = 0.99353)',
 '_unknown_ (score = 0.00320)',
 'up (score = 0.00177)',
 'e882abb2_nohash_1.wav',
 'stop (score = 0.89639)',
 '_unknown_ (score = 0.04231)',
 'up (score = 0.04003)',
 '7ff4fc72_nohash_0.wav',
 'stop (score = 0.65718)',
 'down (score = 0.19441)',
 '_unknown_ (score = 0.05682)',
 '80c45ed6_nohash_0.wav',
 'stop (score = 0.94357)',
 'down (score = 0.02360)',
 'up (score = 0.02062)',
 'fc2411fe_nohash_1.wav',
 'stop (score = 0.89233)',
 'up (score = 0.03672)',
 'down (score = 0.02436)',
 '54ad8f22_nohash_3.wav',
 'stop (score = 0.51013)',
 'down (score = 0.17139)',
 'go (score = 0.08875)',
 '692a88e6_nohash_3.wav',
 'stop (score = 0.93051)',
 'up (score = 0.06421)',
 '_unknown_ (score = 0.00301)',
 '893705bb_nohash_6.wav',
 'stop (score = 0.44148)',
 'up (score = 0.21584)',
 'go (score = 0.08655)',
 '171edea9_nohash_2.wav',
 'stop (score = 0.99756)',
 'up (score = 0.00147)',
 '_unknown_ (score = 0.00069)',
 'f0522ff4_nohash_4.wav',
 'stop (score = 0.99686)',
 '_unknown_ (score = 0.00174)',
 'up (score = 0.00112)',
 '824e8ce5_nohash_1.wav',
 'stop (score = 0.86248)',
 'up (score = 0.06937)',
 '_unknown_ (score = 0.01894)',
 'a9ca1818_nohash_4.wav',
 'stop (score = 0.84088)',
 'up (score = 0.14219)',
 '_unknown_ (score = 0.00756)',
 '48a9f771_nohash_2.wav',
 'stop (score = 0.72441)',
 'up (score = 0.08968)',
 'off (score = 0.05914)',
 '6c429c7b_nohash_1.wav',
 'up (score = 0.35406)',
 'stop (score = 0.30395)',
 'off (score = 0.16752)',
 'f035e2ea_nohash_3.wav',
 'stop (score = 0.99878)',
 '_unknown_ (score = 0.00053)',
 'up (score = 0.00030)',
 'b06c19b0_nohash_0.wav',
 'stop (score = 0.99874)',
 '_unknown_ (score = 0.00081)',
 'up (score = 0.00021)',
 '9a356ab9_nohash_0.wav',
 'down (score = 0.30055)',
 '_unknown_ (score = 0.19656)',
 'stop (score = 0.15799)',
 '0cd323ec_nohash_1.wav',
 'stop (score = 0.41698)',
 'off (score = 0.14234)',
 'down (score = 0.12729)',
 'f19c1390_nohash_0.wav',
 'stop (score = 0.99386)',
 'up (score = 0.00474)',
 'down (score = 0.00070)',
 '435695e3_nohash_0.wav',
 'stop (score = 0.99527)',
 '_unknown_ (score = 0.00271)',
 'up (score = 0.00107)',
 '179a61b7_nohash_0.wav',
 'stop (score = 0.45093)',
 '_unknown_ (score = 0.24628)',
 'go (score = 0.06381)',
 '190821dc_nohash_0.wav',
 'stop (score = 0.85112)',
 'up (score = 0.06795)',
 'go (score = 0.03845)',
 '82951cf0_nohash_1.wav',
 'stop (score = 0.37752)',
 'up (score = 0.32371)',
 '_unknown_ (score = 0.05701)',
 'bd76a7fd_nohash_4.wav',
 'stop (score = 0.98600)',
 'up (score = 0.01023)',
 '_unknown_ (score = 0.00227)',
 'b4ea0d9a_nohash_2.wav',
 'stop (score = 0.95925)',
 'up (score = 0.03561)',
 '_unknown_ (score = 0.00178)',
 'e4be0cf6_nohash_0.wav',
 'stop (score = 0.43057)',
 'up (score = 0.36197)',
 'off (score = 0.13706)',
 '626e323f_nohash_0.wav',
 'stop (score = 0.99196)',
 'up (score = 0.00500)',
 'down (score = 0.00119)',
 '3589bc72_nohash_3.wav',
 'stop (score = 0.98538)',
 'go (score = 0.00724)',
 '_unknown_ (score = 0.00485)',
 '9a7c1f83_nohash_0.wav',
 'stop (score = 0.97974)',
 '_unknown_ (score = 0.01620)',
 'go (score = 0.00237)',
 '90804775_nohash_2.wav',
 'stop (score = 0.45099)',
 '_unknown_ (score = 0.12778)',
 'off (score = 0.09222)',
 '1e412fac_nohash_0.wav',
 'stop (score = 0.97331)',
 '_unknown_ (score = 0.01048)',
 'down (score = 0.00528)',
 '72e382bd_nohash_2.wav',
 'stop (score = 0.99472)',
 'down (score = 0.00212)',
 '_unknown_ (score = 0.00174)',
 '37d38e44_nohash_0.wav',
 'stop (score = 0.87186)',
 'down (score = 0.06694)',
 'go (score = 0.02199)',
 '322d17d3_nohash_3.wav',
 'stop (score = 0.99799)',
 'up (score = 0.00112)',
 '_unknown_ (score = 0.00045)',
 'a045368c_nohash_4.wav',
 'stop (score = 0.99478)',
 'up (score = 0.00325)',
 'off (score = 0.00098)',
 'b69002d4_nohash_0.wav',
 'stop (score = 0.99936)',
 'up (score = 0.00028)',
 '_unknown_ (score = 0.00019)',
 'a7200079_nohash_2.wav',
 'stop (score = 0.85205)',
 'down (score = 0.12575)',
 '_unknown_ (score = 0.00612)',
 '5b09db89_nohash_3.wav',
 'stop (score = 0.78082)',
 'up (score = 0.13418)',
 '_unknown_ (score = 0.03140)',
 'fa446c16_nohash_3.wav',
 'stop (score = 0.99090)',
 'up (score = 0.00493)',
 '_unknown_ (score = 0.00252)',
 'b4ea0d9a_nohash_1.wav',
 'stop (score = 0.95521)',
 'up (score = 0.03692)',
 '_unknown_ (score = 0.00382)',
 '493392c6_nohash_0.wav',
 'stop (score = 0.97289)',
 'up (score = 0.01633)',
 '_unknown_ (score = 0.00663)',
 'ca4eeab0_nohash_0.wav',
 'stop (score = 0.81987)',
 'down (score = 0.05813)',
 'up (score = 0.05399)',
 'f34e6f44_nohash_0.wav',
 'stop (score = 0.39819)',
 '_unknown_ (score = 0.22448)',
 'up (score = 0.10589)',
 '92e17cc4_nohash_1.wav',
 'stop (score = 0.98871)',
 'up (score = 0.00868)',
 '_unknown_ (score = 0.00104)',
 '36050ef3_nohash_1.wav',
 'stop (score = 0.87487)',
 'up (score = 0.05612)',
 'go (score = 0.02334)',
 '6a014b29_nohash_1.wav',
 'stop (score = 0.79478)',
 'off (score = 0.06393)',
 '_unknown_ (score = 0.04445)',
 '5b26c81b_nohash_0.wav',
 'stop (score = 0.92717)',
 'off (score = 0.03468)',
 'up (score = 0.02418)',
 '88d009d2_nohash_0.wav',
 'stop (score = 0.23585)',
 'up (score = 0.15615)',
 '_unknown_ (score = 0.11477)',
 'ab00c4b2_nohash_1.wav',
 'stop (score = 0.98693)',
 'up (score = 0.00912)',
 'down (score = 0.00194)',
 'a7acbbeb_nohash_2.wav',
 'stop (score = 0.91138)',
 'up (score = 0.08291)',
 '_unknown_ (score = 0.00190)',
 '87d5e978_nohash_1.wav',
 'stop (score = 0.99362)',
 'up (score = 0.00550)',
 '_unknown_ (score = 0.00048)',
 '15dd287d_nohash_2.wav',
 'stop (score = 0.60951)',
 'down (score = 0.11640)',
 'go (score = 0.10724)',
 '69a1a79f_nohash_2.wav',
 'stop (score = 0.99213)',
 'up (score = 0.00532)',
 '_unknown_ (score = 0.00173)',
 'd3831f6a_nohash_0.wav',
 'stop (score = 0.99898)',
 'up (score = 0.00088)',
 '_unknown_ (score = 0.00011)',
 'd9b8fab2_nohash_1.wav',
 'stop (score = 0.87669)',
 'off (score = 0.03555)',
 '_unknown_ (score = 0.03319)',
 '2dc4f05d_nohash_2.wav',
 'stop (score = 0.72243)',
 '_unknown_ (score = 0.16989)',
 'down (score = 0.03805)',
 '5af0ca83_nohash_0.wav',
 'stop (score = 0.93210)',
 'up (score = 0.03181)',
 'down (score = 0.01050)',
 '1dc86f91_nohash_2.wav',
 'stop (score = 0.99897)',
 '_unknown_ (score = 0.00045)',
 'up (score = 0.00029)',
 '89947bd7_nohash_0.wav',
 'stop (score = 0.50612)',
 'up (score = 0.47685)',
 'off (score = 0.00776)',
 'b528edb3_nohash_1.wav',
 'stop (score = 0.71747)',
 'down (score = 0.13705)',
 '_unknown_ (score = 0.06468)',
 'c1d39ce8_nohash_0.wav',
 'stop (score = 0.66247)',
 'go (score = 0.17296)',
 '_unknown_ (score = 0.07923)',
 'b93528e3_nohash_0.wav',
 'stop (score = 0.32753)',
 'up (score = 0.17626)',
 'off (score = 0.11351)',
 'c4a7a867_nohash_0.wav',
 'stop (score = 0.98290)',
 'up (score = 0.00738)',
 '_unknown_ (score = 0.00650)',
 '6d1dcca6_nohash_0.wav',
 'stop (score = 0.68633)',
 'up (score = 0.08010)',
 'off (score = 0.07813)',
 '26e573a9_nohash_0.wav',
 'stop (score = 0.81587)',
 'down (score = 0.05165)',
 'up (score = 0.04305)',
 '5188de0d_nohash_0.wav',
 'stop (score = 0.22536)',
 '_unknown_ (score = 0.18580)',
 'go (score = 0.12804)',
 '0585b66d_nohash_3.wav',
 'stop (score = 0.94474)',
 'up (score = 0.02064)',
 '_unknown_ (score = 0.01619)',
 '1b835b87_nohash_1.wav',
 'stop (score = 0.94834)',
 'up (score = 0.01997)',
 'down (score = 0.01052)',
 'aff582a1_nohash_3.wav',
 'stop (score = 0.99678)',
 '_unknown_ (score = 0.00310)',
 'down (score = 0.00006)',
 'f9273a21_nohash_1.wav',
 'stop (score = 0.95208)',
 'up (score = 0.04081)',
 '_unknown_ (score = 0.00239)',
 '4a4e28f1_nohash_0.wav',
 'stop (score = 0.26048)',
 'down (score = 0.16748)',
 'go (score = 0.13965)',
 '1ed0b13d_nohash_3.wav',
 'stop (score = 0.99921)',
 '_unknown_ (score = 0.00047)',
 'up (score = 0.00021)',
 '3ec05c3d_nohash_0.wav',
 'stop (score = 0.54378)',
 'off (score = 0.19993)',
 '_unknown_ (score = 0.07233)',
 '7846fd85_nohash_3.wav',
 'stop (score = 0.92796)',
 'down (score = 0.02070)',
 '_unknown_ (score = 0.01309)',
 '8dc18a75_nohash_0.wav',
 'stop (score = 0.78939)',
 'up (score = 0.19312)',
 '_unknown_ (score = 0.00795)',
 '7846fd85_nohash_4.wav',
 'stop (score = 0.89709)',
 'down (score = 0.03361)',
 '_unknown_ (score = 0.01997)',
 'f5626af6_nohash_3.wav',
 'stop (score = 0.94139)',
 '_unknown_ (score = 0.04825)',
 'up (score = 0.00626)',
 'ffd2ba2f_nohash_3.wav',
 'stop (score = 0.99623)',
 '_unknown_ (score = 0.00219)',
 'up (score = 0.00138)',
 '513aeddf_nohash_2.wav',
 'stop (score = 0.89436)',
 '_unknown_ (score = 0.05919)',
 'go (score = 0.02806)',
 '551e42e8_nohash_0.wav',
 'stop (score = 0.45233)',
 '_unknown_ (score = 0.16313)',
 'off (score = 0.11665)',
 '26e573a9_nohash_1.wav',
 'stop (score = 0.93726)',
 'down (score = 0.02824)',
 'up (score = 0.01093)',
 '9be15e93_nohash_3.wav',
 'stop (score = 0.99688)',
 'up (score = 0.00160)',
 '_unknown_ (score = 0.00101)',
 '264f471d_nohash_1.wav',
 'stop (score = 0.80808)',
 '_unknown_ (score = 0.10039)',
 'up (score = 0.03336)',
 'b959cd0c_nohash_0.wav',
 'stop (score = 0.24488)',
 '_unknown_ (score = 0.11815)',
 'left (score = 0.10235)',
 'bd76a7fd_nohash_2.wav',
 'stop (score = 0.99836)',
 'up (score = 0.00083)',
 '_unknown_ (score = 0.00038)',
 '74241b28_nohash_1.wav',
 'stop (score = 0.97935)',
 '_unknown_ (score = 0.01288)',
 'up (score = 0.00481)',
 '6ef407da_nohash_1.wav',
 'stop (score = 0.98262)',
 'up (score = 0.01171)',
 '_unknown_ (score = 0.00243)',
 '51eefcc6_nohash_0.wav',
 'stop (score = 0.64022)',
 'go (score = 0.16679)',
 'no (score = 0.06918)',
 'a827e3a1_nohash_3.wav',
 'stop (score = 0.76328)',
 'off (score = 0.15193)',
 'up (score = 0.02800)',
 '0d82fd99_nohash_3.wav',
 'stop (score = 0.92439)',
 'up (score = 0.05283)',
 '_unknown_ (score = 0.00782)',
 '5efb758c_nohash_0.wav',
 'stop (score = 0.79350)',
 'go (score = 0.05066)',
 'up (score = 0.04322)',
 '6094340e_nohash_1.wav',
 'stop (score = 0.98467)',
 'down (score = 0.00778)',
 '_unknown_ (score = 0.00234)',
 '067f61e2_nohash_3.wav',
 'stop (score = 0.99034)',
 'up (score = 0.00893)',
 '_unknown_ (score = 0.00047)',
 '54d9ccb5_nohash_0.wav',
 'stop (score = 0.98414)',
 'up (score = 0.01113)',
 'down (score = 0.00169)',
 '01b4757a_nohash_0.wav',
 'stop (score = 0.67216)',
 'up (score = 0.13359)',
 '_unknown_ (score = 0.05075)',
 '953fe1ad_nohash_2.wav',
 'stop (score = 0.78502)',
 'up (score = 0.10579)',
 'down (score = 0.05232)',
 'af790082_nohash_0.wav',
 'stop (score = 0.96869)',
 'up (score = 0.02890)',
 '_unknown_ (score = 0.00105)',
 '9a7c1f83_nohash_4.wav',
 'stop (score = 0.95722)',
 'go (score = 0.02449)',
 'down (score = 0.00590)',
 '94de6a6a_nohash_1.wav',
 'up (score = 0.88006)',
 '_unknown_ (score = 0.03862)',
 'stop (score = 0.02416)',
 '332d33b1_nohash_0.wav',
 'stop (score = 0.44896)',
 'down (score = 0.21524)',
 'no (score = 0.15847)',
 '674ca5ea_nohash_0.wav',
 'stop (score = 0.93532)',
 'up (score = 0.02308)',
 'off (score = 0.01062)',
 'b97c9f77_nohash_0.wav',
 'stop (score = 0.65481)',
 'down (score = 0.10290)',
 '_unknown_ (score = 0.09691)',
 'bfd26d6b_nohash_1.wav',
 'stop (score = 0.83370)',
 'up (score = 0.15832)',
 '_unknown_ (score = 0.00376)',
 'f632210f_nohash_1.wav',
 'stop (score = 0.68990)',
 'up (score = 0.12806)',
 'down (score = 0.05177)',
 '4290ca61_nohash_0.wav',
 'stop (score = 0.86563)',
 'off (score = 0.03857)',
 'up (score = 0.03350)',
 '893705bb_nohash_8.wav',
 'stop (score = 0.71492)',
 'up (score = 0.16641)',
 'go (score = 0.04935)',
 '3bfd30e6_nohash_0.wav',
 'stop (score = 0.63599)',
 'go (score = 0.14249)',
 'up (score = 0.10391)',
 '8ff44869_nohash_1.wav',
 'stop (score = 0.79370)',
 '_unknown_ (score = 0.06695)',
 'down (score = 0.06271)',
 '439c84f4_nohash_3.wav',
 'stop (score = 0.93857)',
 'go (score = 0.02124)',
 'down (score = 0.01481)',
 'd264f7b6_nohash_2.wav',
 'stop (score = 0.35819)',
 'up (score = 0.19114)',
 'down (score = 0.08839)',
 'b414c653_nohash_4.wav',
 'up (score = 0.58615)',
 'stop (score = 0.25748)',
 'off (score = 0.13660)',
 '0d85a428_nohash_0.wav',
 'stop (score = 0.16377)',
 'up (score = 0.14177)',
 '_unknown_ (score = 0.14092)',
 '525eaa62_nohash_2.wav',
 'stop (score = 0.99972)',
 'up (score = 0.00016)',
 '_unknown_ (score = 0.00011)',
 '6aafb34f_nohash_0.wav',
 'stop (score = 0.77719)',
 'up (score = 0.11667)',
 '_unknown_ (score = 0.03379)',
 '226537ab_nohash_0.wav',
 'stop (score = 0.97408)',
 'up (score = 0.02085)',
 '_unknown_ (score = 0.00265)',
 '559bc36a_nohash_1.wav',
 'stop (score = 0.97451)',
 'go (score = 0.01067)',
 '_unknown_ (score = 0.00638)',
 'df280250_nohash_1.wav',
 'stop (score = 0.98016)',
 'down (score = 0.00753)',
 'go (score = 0.00371)',
 'b5cf6ea8_nohash_4.wav',
 'stop (score = 0.99810)',
 'up (score = 0.00149)',
 '_unknown_ (score = 0.00032)',
 '587f3271_nohash_0.wav',
 'stop (score = 0.85410)',
 '_unknown_ (score = 0.06047)',
 'up (score = 0.02200)',
 '1ecfb537_nohash_4.wav',
 'stop (score = 0.99523)',
 '_unknown_ (score = 0.00262)',
 'left (score = 0.00094)',
 '321aba74_nohash_1.wav',
 'stop (score = 0.99719)',
 '_unknown_ (score = 0.00182)',
 'up (score = 0.00048)',
 '5170b77f_nohash_2.wav',
 'stop (score = 0.97167)',
 'up (score = 0.01566)',
 '_unknown_ (score = 0.00619)',
 'd78858d9_nohash_0.wav',
 'stop (score = 0.52191)',
 'up (score = 0.22323)',
 'left (score = 0.05919)',
 '113b3fbc_nohash_0.wav',
 'stop (score = 0.64668)',
 'down (score = 0.20922)',
 'no (score = 0.04130)',
 '3d9200b9_nohash_0.wav',
 'stop (score = 0.61945)',
 'up (score = 0.12795)',
 'off (score = 0.09970)',
 '06f6c194_nohash_4.wav',
 'stop (score = 0.58292)',
 'up (score = 0.39636)',
 'off (score = 0.00788)',
 '093f65a1_nohash_0.wav',
 'stop (score = 0.96773)',
 '_unknown_ (score = 0.00939)',
 'up (score = 0.00694)',
 'e41a903b_nohash_0.wav',
 'stop (score = 0.99393)',
 '_unknown_ (score = 0.00234)',
 'up (score = 0.00195)',
 '1b88bf70_nohash_0.wav',
 'stop (score = 0.89692)',
 'up (score = 0.04895)',
 'off (score = 0.02443)',
 '1ffd513b_nohash_0.wav',
 'stop (score = 0.99402)',
 'up (score = 0.00317)',
 'down (score = 0.00159)',
 'ecef25ba_nohash_0.wav',
 'stop (score = 0.18026)',
 'left (score = 0.13037)',
 'down (score = 0.12699)',
 '837a0f64_nohash_2.wav',
 'stop (score = 0.99877)',
 '_unknown_ (score = 0.00084)',
 'up (score = 0.00034)',
 '64220627_nohash_1.wav',
 'stop (score = 0.48925)',
 '_unknown_ (score = 0.12916)',
 'right (score = 0.11773)',
 'bd8412df_nohash_1.wav',
 'stop (score = 0.97986)',
 '_unknown_ (score = 0.01157)',
 'up (score = 0.00352)',
 'ed3c2d05_nohash_0.wav',
 'stop (score = 0.98379)',
 'up (score = 0.01037)',
 '_unknown_ (score = 0.00274)',
 '578d3efb_nohash_3.wav',
 'stop (score = 0.99270)',
 'up (score = 0.00675)',
 '_unknown_ (score = 0.00050)',
 '3df9a3d4_nohash_0.wav',
 'stop (score = 0.27674)',
 'up (score = 0.25178)',
 'no (score = 0.10935)',
 '2f0ce4d9_nohash_2.wav',
 'stop (score = 0.86411)',
 'off (score = 0.06190)',
 '_unknown_ (score = 0.03669)',
 'b3bdded5_nohash_2.wav',
 'stop (score = 0.88871)',
 'up (score = 0.09937)',
 '_unknown_ (score = 0.00315)',
 'a8e25ebb_nohash_0.wav',
 'stop (score = 0.48994)',
 'up (score = 0.12429)',
 'no (score = 0.10686)',
 'ad63d93c_nohash_1.wav',
 'stop (score = 0.51262)',
 'up (score = 0.38220)',
 '_unknown_ (score = 0.05622)',
 '4c3cddb8_nohash_4.wav',
 'stop (score = 0.99406)',
 'up (score = 0.00474)',
 '_unknown_ (score = 0.00085)',
 '0132a06d_nohash_2.wav',
 'stop (score = 0.98721)',
 'up (score = 0.01022)',
 '_unknown_ (score = 0.00182)',
 '7846fd85_nohash_1.wav',
 'stop (score = 0.92152)',
 'down (score = 0.04298)',
 'no (score = 0.00939)',
 'a04817c2_nohash_1.wav',
 'stop (score = 0.98223)',
 'up (score = 0.01197)',
 'down (score = 0.00265)',
 '605ed0ff_nohash_0.wav',
 'stop (score = 0.45783)',
 'up (score = 0.13047)',
 '_unknown_ (score = 0.08471)',
 '90e72357_nohash_2.wav',
 'stop (score = 0.98573)',
 '_unknown_ (score = 0.00698)',
 'go (score = 0.00461)',
 '2da58b32_nohash_4.wav',
 'stop (score = 0.79965)',
 'off (score = 0.06937)',
 'up (score = 0.04585)',
 '28ce0c58_nohash_1.wav',
 'go (score = 0.48640)',
 'up (score = 0.16947)',
 '_unknown_ (score = 0.15874)',
 '24a3e589_nohash_2.wav',
 'stop (score = 0.98158)',
 'up (score = 0.01708)',
 '_unknown_ (score = 0.00080)',
 '2579e514_nohash_1.wav',
 'stop (score = 0.48661)',
 'up (score = 0.15354)',
 'down (score = 0.10605)',
 '437455be_nohash_0.wav',
 'stop (score = 0.33172)',
 'up (score = 0.12149)',
 '_unknown_ (score = 0.10527)',
 '1b4c9b89_nohash_0.wav',
 'stop (score = 0.98440)',
 'up (score = 0.00508)',
 '_unknown_ (score = 0.00465)',
 'f798ac78_nohash_4.wav',
 'stop (score = 0.95439)',
 'up (score = 0.03564)',
 '_unknown_ (score = 0.00394)',
 '29229c21_nohash_1.wav',
 'stop (score = 0.99749)',
 'up (score = 0.00134)',
 '_unknown_ (score = 0.00095)',
 'd78858d9_nohash_2.wav',
 'stop (score = 0.70006)',
 'up (score = 0.20123)',
 'left (score = 0.03805)',
 '5ebc1cda_nohash_6.wav',
 'stop (score = 0.63192)',
 '_unknown_ (score = 0.15333)',
 'go (score = 0.12291)',
 'ab46af55_nohash_0.wav',
 'stop (score = 0.99821)',
 'up (score = 0.00156)',
 '_unknown_ (score = 0.00010)',
 '2296b1af_nohash_1.wav',
 'stop (score = 0.92353)',
 '_unknown_ (score = 0.02751)',
 'down (score = 0.02737)',
 '9a69672b_nohash_2.wav',
 'stop (score = 0.99818)',
 'up (score = 0.00158)',
 '_unknown_ (score = 0.00013)',
 '964e8cfd_nohash_1.wav',
 'stop (score = 0.95619)',
 'up (score = 0.02845)',
 '_unknown_ (score = 0.00523)',
 'cb72dfb6_nohash_0.wav',
 'stop (score = 0.93942)',
 'up (score = 0.03724)',
 'go (score = 0.00980)',
 '9be15e93_nohash_4.wav',
 'stop (score = 0.99567)',
 'up (score = 0.00360)',
 '_unknown_ (score = 0.00044)',
 '5ebc1cda_nohash_1.wav',
 'stop (score = 0.75682)',
 '_unknown_ (score = 0.09401)',
 'down (score = 0.04084)',
 '1e9b215e_nohash_1.wav',
 'stop (score = 0.46735)',
 'down (score = 0.11926)',
 '_unknown_ (score = 0.11014)',
 '28ce0c58_nohash_4.wav',
 '_unknown_ (score = 0.35277)',
 'stop (score = 0.25628)',
 'up (score = 0.22787)',
 '0135f3f2_nohash_1.wav',
 'down (score = 0.64915)',
 'stop (score = 0.13520)',
 'go (score = 0.06183)',
 '742d6431_nohash_0.wav',
 'stop (score = 0.51185)',
 'up (score = 0.45758)',
 'off (score = 0.01466)',
 '563aa4e6_nohash_3.wav',
 'stop (score = 0.98674)',
 '_unknown_ (score = 0.00840)',
 'up (score = 0.00273)',
 'bbaa7946_nohash_0.wav',
 'stop (score = 0.99574)',
 'down (score = 0.00201)',
 '_unknown_ (score = 0.00110)',
 'a7acbbeb_nohash_1.wav',
 'stop (score = 0.78802)',
 'up (score = 0.11744)',
 'down (score = 0.04498)',
 'ec21c46b_nohash_0.wav',
 'stop (score = 0.97416)',
 '_unknown_ (score = 0.01517)',
 'up (score = 0.00404)',
 '18ffa72d_nohash_1.wav',
 'off (score = 0.16264)',
 '_silence_ (score = 0.11814)',
 'yes (score = 0.09154)',
 '32ad5b65_nohash_0.wav',
 'stop (score = 0.99422)',
 'up (score = 0.00445)',
 '_unknown_ (score = 0.00075)',
 'fd32732a_nohash_0.wav',
 'stop (score = 0.98304)',
 '_unknown_ (score = 0.00850)',
 'down (score = 0.00307)',
 'a2b16113_nohash_0.wav',
 'stop (score = 0.91704)',
 'up (score = 0.07075)',
 'off (score = 0.00562)',
 '92b0a735_nohash_0.wav',
 'stop (score = 0.96610)',
 'up (score = 0.01413)',
 'down (score = 0.00977)',
 '6124b431_nohash_1.wav',
 'stop (score = 0.91341)',
 'up (score = 0.04878)',
 'down (score = 0.01150)',
 '7fb8d703_nohash_2.wav',
 'stop (score = 0.98021)',
 'up (score = 0.01911)',
 '_unknown_ (score = 0.00029)',
 '5e3dde6b_nohash_1.wav',
 'stop (score = 0.87034)',
 'up (score = 0.06751)',
 '_unknown_ (score = 0.04405)',
 'ce7a8e92_nohash_1.wav',
 'stop (score = 0.13540)',
 'up (score = 0.11850)',
 '_unknown_ (score = 0.10476)',
 'c0fb6812_nohash_0.wav',
 'stop (score = 0.67461)',
 'up (score = 0.11249)',
 'down (score = 0.05729)',
 'c79159aa_nohash_4.wav',
 'stop (score = 0.98290)',
 'up (score = 0.00918)',
 '_unknown_ (score = 0.00575)',
 'b3327675_nohash_1.wav',
 'stop (score = 0.49829)',
 'down (score = 0.13202)',
 '_unknown_ (score = 0.11668)',
 '87070229_nohash_4.wav',
 'stop (score = 0.98843)',
 '_unknown_ (score = 0.00946)',
 'go (score = 0.00120)',
 'a5d1becc_nohash_2.wav',
 'stop (score = 0.42095)',
 '_unknown_ (score = 0.13196)',
 'up (score = 0.09977)',
 '0ff728b5_nohash_3.wav',
 'stop (score = 0.97009)',
 '_unknown_ (score = 0.02019)',
 'go (score = 0.00626)',
 '964e8cfd_nohash_4.wav',
 'stop (score = 0.98068)',
 'up (score = 0.00856)',
 '_unknown_ (score = 0.00304)',
 '017c4098_nohash_4.wav',
 'stop (score = 0.99057)',
 'up (score = 0.00858)',
 'down (score = 0.00050)',
 'b72e58c9_nohash_0.wav',
 'stop (score = 0.98696)',
 '_unknown_ (score = 0.00550)',
 'up (score = 0.00223)',
 'ef77b778_nohash_2.wav',
 'stop (score = 0.93676)',
 'up (score = 0.06078)',
 '_unknown_ (score = 0.00106)',
 '0f3f64d5_nohash_1.wav',
 'stop (score = 0.84753)',
 'up (score = 0.11528)',
 'down (score = 0.01605)',
 '131e738d_nohash_4.wav',
 'stop (score = 0.86223)',
 '_unknown_ (score = 0.04718)',
 'go (score = 0.03105)',
 'bab36420_nohash_1.wav',
 'stop (score = 0.99879)',
 'up (score = 0.00114)',
 '_unknown_ (score = 0.00003)',
 '27c24504_nohash_0.wav',
 'stop (score = 0.91839)',
 'up (score = 0.03990)',
 '_unknown_ (score = 0.01542)',
 '21e8c417_nohash_0.wav',
 'no (score = 0.16717)',
 'go (score = 0.14407)',
 'down (score = 0.13581)',
 '54d9ccb5_nohash_1.wav',
 'stop (score = 0.98426)',
 'up (score = 0.01385)',
 '_unknown_ (score = 0.00076)',
 '9448c397_nohash_4.wav',
 'stop (score = 0.43997)',
 '_unknown_ (score = 0.21405)',
 'go (score = 0.05974)',
 'c7dc7278_nohash_0.wav',
 'stop (score = 0.99714)',
 'up (score = 0.00277)',
 '_unknown_ (score = 0.00007)',
 ...]

That was not pretty! We’d better define some helper functions to extract the model’s guesses from that messy output:

[15]:
def get_guesses(scores):
    scores = filter_scores(scores)
    if len(scores) % 4 != 0:
        raise ValueError(f"Expected scores list to have a length divisible by 4 after filtering but got length {len(scores)}")
    num_files = len(scores) / 4
    fnames = scores[0::4]
    guesses = [guess.split(' ')[0] for guess in scores[1::4]]
    return zip(fnames, guesses)

def score_directory(directory):
    scores = !python {example_path}/label_wav_dir.py \
        --graph={example_path}/trained_model/my_frozen_graph.pb \
        --labels={example_path}/trained_model/conv_labels.txt \
        --wav_dir={directory}
    return filter_scores(scores)

Define a function to generate errors in all wav files in a given directory. If an inclusion list is provided, only files on the list will be processed.

[16]:
def errorify_directory(data_root_dir, dir_name, tree_root, err_params, inclusion_list=None):
    clean_data_dir = data_root_dir / dir_name
    if not clean_data_dir.exists():
        raise ValueError(f"Directory {clean_data_dir} does not exist.")
    err_data_dir = data_root_dir / (dir_name + "_err")
    if not err_data_dir.exists():
        err_data_dir.mkdir()
    if not inclusion_list:
        inclusion_list = [f for f in clean_data_dir.iterdir() if ".wav" in str(f)]
    for file in inclusion_list:
        fname = file.name
        wav = read(file)
        clipped = tree_root.generate_error([wav], err_params)[0]
        err_file_path = err_data_dir / fname
        write(err_file_path, clipped[0], clipped[1])
    return err_data_dir

Define a function to generate errors in all wav files on a list. The function is needed when files from multiple categories are present on the list. To facilitate comparisons between clean and errorified data, the clean files the list can be automatically copied to suitably named directories. To do this, provide the parameter copy_clean=True.

[17]:
def errorify_list(data_files, categories, tree_root, err_params, copy_clean=False):
    data_root_dir = data_files[0].parents[1]
    for cat in categories:
        files_in_cat = [f for f in data_files if (cat + "/") in str(f)]
        print("category:", cat)
        print(f"{len(files_in_cat)}")
        errorify_directory(data_root_dir, cat, tree_root, err_params, inclusion_list=files_in_cat)
        if copy_clean:
            copy_dir = data_root_dir / (cat + "_clean")
            copy_dir.mkdir(exist_ok=True)
            for file in files_in_cat:
                shutil.copy(file, copy_dir)

Define a function to compare the model’s guesses on clean and errorified data. The results are returned in a Pandas dataframe.

[18]:
def compare(data_root, category, clean_ext="_clean", err_ext="_err"):
    scores_clean = score_directory(data_root / (category + clean_ext))
    guesses_clean = get_guesses(scores_clean)
    scores_err = score_directory(data_root / (category + err_ext))
    guesses_err = get_guesses(scores_err)
    df_clean = pd.DataFrame(guesses_clean, columns=["file", "clean_guess"])
    df_err = pd.DataFrame(guesses_err, columns=["file", "err_guess"])
    res = pd.merge(df_clean, df_err, on="file", how="inner")
    res['true_label'] = category
    return res

Generate errors in all test set audio clips.

[19]:
errorify_list(test_set_files, trained_categories, root_node, err_params, copy_clean=True)
category: yes
419
category: no
405
category: up
425
category: down
406
category: left
412
category: right
396
category: on
396
category: off
402
category: stop
411
category: go
402

Run model on clean and errorified data.

[20]:
results = [compare(data_dir, cat) for cat in trained_categories]
df = pd.concat(results)

Create confusion matrices for clean and errorified data, respectively.

[21]:
cm_clean = confusion_matrix(df['true_label'], df['clean_guess'], labels=labels)
cm_err = confusion_matrix(df['true_label'], df['err_guess'], labels=labels)

Visualize the confusion matrix for the clean data.

[22]:
visualize_confusion_matrix(df, cm_clean, 0, labels, "dyn_range", "true_label", "clean_guess")
../_images/case_studies_WAVClipping_39_0.png

Visualize the confusion matrix for the errorified data.

[23]:
visualize_confusion_matrix(df, cm_err, 0, labels, "dyn_range", "true_label", "err_guess")
../_images/case_studies_WAVClipping_41_0.png

The notebook for this case study can be found here.