Week 6: New features for pronunciation and Improved audiorecorder.js

This week James met Dr. Nakagawa and came up with 2 slightly different new feature extraction approaches:

First approach:

  1. Decode wav file with its original phone alignment grammar.
  2. Use pre-defined cepstral mean normalization values and 64 top Gaussians for scoring.
  3. Store start and end duration for each phone.
  4. Create separate (alternative) wav files with triphones extracted from original wav file.
  5. Decode the alternative wav file with alternative grammars created by choosing all the phonemes as probable replacement for target phoneme.
  6. Store target phone's triphone duration and the total number of alternative grammars in which target phoneme is found. This constitutes the features for training.

Second approach:
  1. This approach differs in terms of alternatives grammar creation where neighbor phonemes are used instead of actual phonemes present in the word being evaluated. Rest of the approach remain same as first approach.

I worked towards implementing these approaches in browser using Pocketsphinx.js. I faced certain challenges due to tight coupling of Pocketsphinx.js recognizer with the audiorecorder. Audio recorder allows recognizer to directly consume the buffer which makes it difficult for multiple recognizers to be instantiated with same buffer and different grammars. So I am extending the recorder to store buffer and export it as blob which can re-used by multiple instances of recognizer. This will also allow the decoupling of recorder from recognizer.

Will keep you posted on further updates.

Comments

Popular Posts