Much of my research involves investigating novel machine learning / data mining problems that arise in other fields. Here are my main research interests:
I. Interactive Machine Learning
End-user debugging of Machine Learning Systems (with Margaret Burnett, Simone Stumpf, and Alex Groce)
How do you debug a program that was written by a machine instead of a person, especially when you do not know much about programming or machine learning, and are working with a program you cannot even see? This is the problem faced by users of a new type of program being used today, namely, machine learning systems that, after being deployed, customize themselves by learning from an end-user's behavior. Prime examples of these programs include adaptive user interfaces, intelligent desktop assistants, email classifiers, and recommender systems. Inaccurate predictions by these learning systems erode users' trust and curb widespread acceptance of such systems. We seek to improve both the performance and acceptance of machine learning systems by allowing end-users to debug these learned programs when they make incorrect predictions.
II. Ecosystem informatics
Species Distribution Mapping (with Tom Dietterich, Matt Betts, Julia Jones)
Species distribution mapping involves understanding species-habitat relationships. Environmental features such as temperature, precipitation, vegetation, land use, etc. determine if a site is viable habitat for a species. Species distribution mapping can be viewed as a supervised learning problem. However, a crucial predictive feature is the presence/absence of other species. We are developing novel multi-label classification algorithms for the prediction of multiple species simultaneously.
Range Shift (with Matt Betts and Julia Jones)
Several papers have indicated that species react in different ways to climate and land use change. Two common reactions include shifting their range and changing their community structure. We are developing machine learning algorithms to detect range shift and community change in species distribution data.
III. Time Series classification
Activity recognition and Energy Expenditure Prediction through Accelerometry (with Stewart Trost)
I am currently working with Stewart Trost (Health and Human Sciences) on algorithms for activity recognition from accelerometer data. This research involves classifying entire time series into activity classes and predicting energy expenditure.
IV. Anomaly Detection
Rare category detection
A challenging problem in anomaly detection is to discover interesting anomalies and not just statistically significant ones. Rare category detection is a human-in-the-loop process that bears a resemblance to active learning. In rare category detection, the machine learning algorithm presents interesting categories of anomalies to the user for labeling.
Surveilance
In the past, I worked on the early detection of disease outbreaks through the analysis of health-care data.