Great site for notes and tutorials.

Andrew Moore, CMU professor and Google employee, has a slew of very useful notes on his site, including some great slides covering SVM material.

Speech Recognition and HMMs

Here is a chapter for a mild introduction to speech recognition and HMMs, written by yours truly.

Nice Summary of Topics of Recent Consideration

You may find this document by Seong-Wook Joo to be a useful summary of some recent topics.

Test Two

Here is test two.

Final Project

The final project will involve two parts which may be related:

1) The first part includes a research project, paper, and presentation.

2) The second part includes locating or collecting problem-specific data and building a system using PR techniques to achieve a specified goal.

Peruse research papers from the past 5 years for major IEEE or ACM pattern-recognition-related conferences such as CVPR or ICPR or related journals such as PAMI. (Most conferences from the past few years still have their program with paper titles online — such as here (rough place for a conference, by the way).  You can also use the ACM Portal or IEEE Digital Library through our library).

Your goal is to choose a current topic. You may choose from something more traditional such as a face recognizer, word recognizer, handwriting recognizer, etc., or you may choose from other more recent problems of consideration such as loading an image and retrieving visually similar images from Flickr. Choose a task that interests you.

First, complete a brief proposal. This should include a brief statement on the following:

  • Topic to research, with at least one initial source.
  • Data set to use or collect for classification project.
  • Techniques to consider for classification project.

Email proposal on or before Mon., Nov. 16.

Second, for your project, complete the following:

  • 1) Obtain at least three recently published conference or journal papers as your initial sources.  You may, of course, use more.
  • 1) Read, analyze, and summarize the techniques used.
  • 1) Prepare a brief document and presentation concerning the work you researched.
  • 2) Obtain or collect a data set for a PR problem that interests you.
  • 2) Build a classifier-based system for your data.
  • 2) Prepare your data to present along with your research.

Graduate students should build their PR-system in a manner directly related to their researched topic. Undergraduates may choose separate current research topics and PR-systems if desired.

Due Thur., Dec. 10 at 7 p.m. (our final exam slot).

SVM and HMM

Continuing our adventure in abbreviated alphabet soup, we’ll be discussing SVMs (support vector machines) and HMMs (hidden Markov models). Here are some materials to read.

Support Vector Machines:
Read 5.11 in the Duda text.
There are several great tutorials online. One of my favorites is Steve Gunn’s.
You might also find the Wikipedia SVM entry useful.

For software implementations, there are numerous, many detailed by this page.  (There are also a lot of other great resources there).  I have used and like Gunn’s toolkit for Matlab which you can find here.  Here is one promising implementation available to Python.  (I have not tried it, yet, but am hoping to find one that I like as much as Gunn’s).  Another potential version is contained in PyML that also has some other tools and a wrapper for LIBSVM.

Hidden Markov Models:
Read 3.10 in the Duda text.
One of the best tutorials for HMM is this classic by Rabiner.
Also, read through the following tutorial on HMM. It contains information about a Matlab implementation of HMM.
Also, the Wikipedia HMM entry is a good introduction.

One of the best software implementations available is HTK.
There is also a Python wrapper for it now. :-)
Also, here are two other possible Python implementations: GHMM with a Python wrapper and Michael Hamilton’s HMM Python class.

Ubuntu and Matplotlib

Some tips, courtesy of Richie:

The Beta release of Ubuntu (9.10) supports matplotlib 0.99.1 “out of
the box”. This may be of interest to others on Linux who wish to take
advantage of mplot3d without having to manually build .99.1.

Quick install:

1) Download Ubuntu 9.10 ISO (use nightly build)

2) Install Ubuntu on favorite VM platform.

3) Once installed browse Panel System-Synaptic Package Manager. Seach
for matplotlib… Select for install.. ALL DEPENDENCIES WILL BE
INSTALLED AUTOMATICALLY.

4) Install IDLE 2.6 using Applications-Add remove applications.

Done. No manual builds required, no dependency hunting either. Tested
following example with no issues:

http://matplotlib.sourceforge.net/plot_directive/mpl_examples/mplot3d/surface3d_demo.py

Richie

Reading

Read this tutorial on principal components analysis (PCA) that is also linked under the “resources” page on this website. Also read 10.13 in your text related to component analysis. You may also find the Wikipedia page on PCA useful.
Secondly, begin reading chapter 5 in the Duda text related to linear discriminant functions. Also related and useful is the LDA entry at Wikipedia.

Useful summary documents for NumPy.

The idea of the “Mathesaurus” is to provide a quick reference for switching to open-source mathematical computation environments for computer algebra, numeric processing and data visualisation.

If you have ever used MATLAB, you may find the NumPy for MATLAB users particularly useful.

And even if you have not been a MATLAB user, you will likely still find the summary operations compared useful for finding NumPy methods.

Project 1: Due Thursday, October 22

Setup:

  • Download the face aging dataset referenced previously in this course blog.
  • Prepare separate folders for each problem solution.  You may find this easier if you keep your data set in one folder and have each coded solution refer back to that same folder (../data/).
  • Label and create keys (color) for each plot.
  • Save all plots, error rates computed, etc. to facilitate writing a short description of your work to submit.

Complete the following (exercises marked ** are for required only for graduate students but may be completed by undergraduates for bonus):

  1. Design and implement Python code to load all 540 feature vector files as 238 x 1 feature vectors
    • Generate a 2-D plot with a color key to visually distinguish gender (male/female) feature vectors using only the first two parameters of 238.
    • ** Generate a 2-D plot with a color key to visually distinguish ethnicity (can use letters used in file names as labels; see filenames and database paper for more info).
    • Generate a 3-D plot using the first three parameters for gender.
    • ** Generate a 3-D plot for ethnicity using the first three parameters.
  2. Bayesian Classifier[Randomly choose 440 vectors for training parameters and the other 100 vectors for testing purposes].
    • Using the first two parameters of each feature vector, find parameters assuming Gaussian distribution of class data, and build a Bayesian classifier to classify gender (M/F).  Create a 2-D plot (with colors for Male, Female, Misclassified) and compute the % error.
    • Using the first three parameters of each feature vector, build a Bayesian classifier.  Create a 3-D plot (with colors…) and compute the % error.
  3. K-Nearest Neighbors[Randomly choose 440 prototypes, 100 test vectors].
    • Use the first three parameters to classify gender with a k=1 nearest neighbor rule.  Create a 3D plot (with colors…).  Compute % error.
    • Do the same for k=3.
    • Do the same for k=5.
    • ** Do the same using the full feature vectors for gender classification.  (May create plot with 1st three parameters still for visualization).
    • ** Do the same using the full feature vectors for ethnicity.
  4. K-Means[May use all data -- how well does K-means perform]?
    • Using 2-means and the first three parameters, classify the data based on gender.  Create a 3D plot as before.  Compute the % error.
    • ** Using k-means and the full feature vectors, classify the data based on ethnicities.  Create a 3D plot (using the first three parameters for values) with colors (for ethnicities and error points).  Compute the % error.

When finished, create a .pdf (or other electronic document) that includes a brief summary of your work, all of your figures, and one table that includes all of your % error values.  [Undergraduates should have 8 figures and a table with 6 error values.  Graduate students should have 13 plots -- 5 more for the additional components -- and a table with 9 error values].

Make sure that your project is organized in one main folder containing a sub-directory with data in your format used, sub-directories for each solution, and your report document in the main folder that includes your brief discussion, figures, and error table.  You may wish to make some brief concluding remarks in your report commenting on the error rates and success for each technique used.  Create a .zip archive of this project folder to submit by the deadline.  Enjoy!

←Older