CS534 Final Project
The goal of this class project is to give you an opportunity to explore
an interesting machine
learning problem of your choice in the context of a real-world data
set. You are encouraged to
combine machine learning research ideas with problems in your own
research area. Your
class project must be about new things you
have done this term, you can't use results you have developed prior to
this class. Your project will be worth 25% of your final class
grade.
Note that, as with any conference, the
page limits are strict! Papers over the limit will not be considered.
What you need to do
- Find a partner to work on this project. (Two person teams are
encouraged, though you may work alone.
No three person
teams please.)
- Project
title and team member
- The data
set that you will use
- Project
idea. This should be approximately two paragraphs.
- Software
you will need to write.
- Papers
to read. Include 1-3 relevant papers. You will
probably want to read at least one of them before submitting your
proposal.
- May 22nd
milestone: What will you complete by May 22nd?
Experimental results of some kind are expected here.
- Turn in a final paper (no longer than 8 pages including
references, figures and tables in NIPS
format), due June
5th by 2pm. Each team
should turn in a single report and please email me
your report before the deadline.
Grading and determining when you have
done enough
A project that does
a solid job building the base learning system and carefully evaluating
and describing it might get 75–80% credit. A project that includes
additional pursuit of interesting extensions/alternatives or
investigations into important issues (such as overfitting, noise
tolerance, feature selection etc.), or achieves very impressive results
might get 90–100% credit. Weight will also be given to the
interestingness and novelty of the learning task considered.
Be creative! Exploring your own interesting ideas and comparing them
with the baseline approaches will receive credit whether they beat the
baseline or not.
Some Possible Learning Problems
- Computer Vision Recognition Tasks
- Optical Character Recognition
- Face Recognition Tasks
- Scene recognition (e.g. house
vs. no house)
- Object recognition in
aerial/satellite images
- See http://elm.eeng.dcu.ie/~oconaire/cv_datasets.html
for a list of computer vision benchmark data sets
- Audio Recognition Tasks
- Speaker identification
- Speaker sentiment
- Music genre
- Bird species recognition by
songs
- Text Classification and
Clustering
- Spam
- Email Folder Predictor
- Newgroup document classifier
- Sentiment
- Author classifier (i.e. take
latex files from different authors and try to classify according to
author)
- There are a few text
classification datasets on Andrew
McCallum's webpage
- Bio-informatics
- gene clustering/expression
profile analysis
- gene sequence analysis
- etc