Assignment 3. Sphinx Language Modeling
Part two: Make a Difference! Due December 19th
FIRST (for everyone): Run a new the Voicemail data using the tool of your choice (Submit results December 9th)
- Download the test and training data
- Build a new LM
- Run a baseline test
NEXT: Choose one of the following projects: All projects should be done individually.
1. Significantly improve the performance starting from either your new VM baseline or your original SWBD baseline
- Better dictionary or cleaner data
- More data (e.g. select from web)
- Changing parameter settings
2. Analyze performance using SCLITE
- Change the output of Sphinx to go through Sclite (will require changes to the config and probably some scripting. /li>
- Come up with at least 3 hypotheses of errors (e.g. OOV, speech detection) and use the data to determine actual impact (again, some simple scripting can help)
- Suggest techniques that would address these issues (but you don't have to try them out)
3. Use more sophisticated LM techiques offered by the toolsets and test the results with perplexity, rather than WER (should use VM data for this, since there's more of it and it's a narrower domain)
- Get a baseline perplexity result
- Try 3 new LMs, such as class grammars, higher order ngrams, different models for long and short utterances
- Show results and discuss
4. I noticed some discussion of acoustic adaptation for Sphinx. Try it out. (Not for the faint of heart).
- Find the recipe
- Figure out what data you'll need and let me know
- Try it, show results and discuss
Part one: Get it running
Due date for parts 1 & 2 is November 28th. Submit the summary data provided by Sphinx forthe following. If you make any other changes other than swapping in the language model please note that, though I would prefer you simply changed the models and held off improvements such as a reasonable dictionary for the next step.
- The initial run, using Sphinx's built in models
- The baseline with the model built with CMUMUTK
- The baseline with the model built with SRILM
Part 3 will be to improve on these baselines, which will be due December 12th, so feel free to start on that right away. There's a lot of room for improvement! I may be getting you additional training and test data for Part 3, so make sure you get the baselines running and understand what the tools can do as soon as possible.
If you run into issues downloading the tools and getting them running , use the Languaeg Modeling tool support forum on Latte.
Step 1: Download Sphinx 4 and decode the switchboard files (Sphinx will produce a WER, so skip sclite for now) to get a baseline.
- Spinx4 download, documentation and more
- Data up on Latte:
- Test data: both 20 test sentences and the .batch file for them and
- Training data: a file of training sents All_SWBD_LM_training.text, zipped ~ 5MB, ~3M words.
Step 2: Use SRI LM and CMUCU to create 2 language models using the data on Latte and run a baseline for each
- SRILM:
- CMUCUTK:
- Making your models work with Sphinx
- convert to DMP format with sphinx_lm_convert from the sphinxbase package
- Use tests/performance/wsj5k_8kHz as an example of what you need to do
- Use the acoustic model that I emailed out a couple weeks ago
- You'll need to change the config and build files (best to make your own directory and copy these in). I also recommend keeping a log of what you're doing in case you need to backtrack.
Assignment 4: Recognizer analysis
Evaluate the differences in a selection of the papers below according to the one of the following dimenstions:
- Diachronic: Changes within one recognizer from 2000 to 2006.
- Synchronic: Changes within across recognizers in one time frame (2006 preferred).
- Application/language specific: Differences between the general 2006 description and the application or language specific 2007 system..
For this assignment, you will work on pairs to compare speech recognizers diachronically or synchronically. Your specific assignments were emailed out on the class forum and the papers are up on Latte.
By the next class (11/7), read the papers assigned to you and come up with a list of interesting similarities or differences among the papers (which either describe the same recognizer over time or different recognizers at the same time). You may use any of the other papers for background if you think it will be helpful. It's important you're ready to discuss the papers on the 7th!
You will have time in class on 11/7 to coordinate with your partner. Agree on a set of interesting things you'd like to share with the class and create a plan for a 10 minute presentation (you won't have time to complete it, just to discuss the points and divvy up the work).
On 11/9 you and your partner will present a small number of slides describing what you have found. Either email them to me before class or be ready to plug into the projector. Your presentation should be no more than 10 minutes (we'll have 7 groups and need time to switch between them.)
Email me the final set of slides by class on the 9th.
| BBN | IBM | SRI | |
|---|---|---|---|
| 2000 | The 2000 BBN Byblos LVCSR system | Recent Improvements in Speech Recognition Performance | The SRI March 2000 Hub-5 Conversational Speech Transcription System |
| 2004 | The BBN RT04 Broadcast News Transcription System The 20004 BBN/LIMSI English CT speech Recognition System |
The IBM Conversational Telephony System for Rich Transcription | SRI’s 2004 Broadcast News Speech to Text System |
| 2006 | Advances in the Transcription …within the combined EARS BBN/LIMSI system | Advances in Speech Transcription at IBM under the DARPA EARS program | Recent Innovations in SPeech0to0text Transcription at SRI-ICSI-UW |
| Apps | Progress in the BBN 2007 Mandarin Speech to Text System | The IBM Rich Transcriptin 2007 ... for Lecture Meetings | The SRI-ICSI Spring 2007 Meeting and Lecture Recognition System |
Quiz 2. Due Wednesday, October 19th
- Read the papers on the web assigned for 10/17
- Select 3 of the papers and briefly answer the following questions:
- What language modeling technique is the paper describing?
- What is the linguistic intuition or motivation for the approach?
- How was the technique evaluated?
- Describe one element of the process that was particularly interesting (data collection, backoff/interpolation strategy, combination with some other technique or approach).
Assignment 2. Due Wednesday, October 12th
Given a dictionary, determine which words are potentially “confusable” by computing the distance between the pronunciations. Just like edit distance, there should be costs for insertion, deletion, and substitution. Substitution costs should take into account similarity based on articulatory features to create a “distance” metric.
Using what you've learned about articularory phonetics, come up with classes that take into account the fact that some substitutions are more likely than others for each phoneme pair. The write up should include the details of the metrics. Their programs should be able to do the following:
Given an existing dictionary and a new set of words to be added to that dictionary:
- 1. Determine pronunciations for the new words (Dictionary Documentation)
- 2. For each new word, say whether that word is potentially confusable with other words in the dictionary and why, given your phonetic analysis.
Extra credit: What are the top 3 most confusable pairs in the dictionary and why (given your metrics).
Submit the program code for both the computation of the distance function and the comparisons along with the write up of your assumptions and results.
Use dynamic programming (e.g. edit distance) and declarative data structures that make your program easy to modify given different assumptions about confusability.
Programming Assignment 1: Build a speech application for submission to the AVIOS Student Contest.
We're going to start out the semester by building a speech application. Commercial tools have been made available through AVIOS (Applied Voice Input Output Society): http://www.avios.org/contest2012/info.htm as part of a student contest. (Extra credit will be given to students who submit their applications to the AVIOS contest). You may work alone or in pairs for this assignment. No more than two to a team.
Design: Due Monday October 10:
- Pick a toolset
- Describe your application
- Describe what resources you’ll need (grammar, prompts, etc)
- Indicate what issues you are running up against and might need help with.
Prototype: Running application due November 14th. Submit the following:
- Description of the applications functionality (does not have to be fully implemented
- Instructions for how to test the application (indictates limits)
Final application and presentations. Due December 14th (date of presentations TBD). Submit the following:
- Description of the applications functionality
- Instructions for how to test the application
- Demonstrate the application.
- Submit to AVIOS contest (Jan 13th).
