The Summer of 2012 CBI hosted interns from:
Dos Pueblos High School in Santa Barbara
California State University, San Bernadino
École polytechnique de l'université de Nantes in Nice, France
The Botanicam system is designed for plant image identification backed by the Bisque database. Botanicam’s workflow allows a user to upload an image of a plant to the server via the web interface or mobile application and receive back plant’s information, such as, genus, species, wikipedia entry, etc. The plant identification is performed on the server by first computing various image features and then using a trained model to classify the input image. We are using a local dataset of bushes from the Coal Oil Point Reserve that contains 11 classes as well as adding a new publicly available dataset from CLEF 2011 which consists of several thousands of images of leaves, trees and bushes. Our project consists of improving classification performance for speed and accuracy, automating model training process and accommodating new datasets and data types.
Probabilistic spatial object representation in databases
Predicting Visual Attention Under Varying Camera Focus
A saliency map is the prediction of regions in a photograph (or any visual scene) which captures the visual attention of the viewer. Until recently, most of these predictions have been bottom-up approaches using low-level features. Low-level features can be reliably computed from images which include bright colors, hard edges, and strong contrast. Relatively new algorithms make use of high-level semantic information, such as face, text, people and other object detections to predict visual attention. Some of the recent state-of-the-art advances come from Tilke Judd's work at MIT. Apart from high-level semantics we observe that camera focus plays a significant role in directing visual attention. Our work targets understanding and quantifying the role of camera focus in visual saliency. With the recently available Lytro camera we are able to take a snapshot of the complete light field of the scene which essentially contains multiple images, each with different focused regions. We will have users view all the images and track the eye movements and fixations of the subjects. Further, we compare the results of the visual attention map with our predicted saliency map. This predicted pixelwise saliency map is learned using a support vector machine. Finally we will discern the role of focus on the user’s attention from other semantics. This technique can also be applied to create futuristic autofocus algorithms when object detectors will be built into commercial cameras.
Computer Vision and Robot Control
The Microsoft Kinect is a small, mountable device with both a standard (RGB) camera and an infrared sensor that produces a point cloud. The goal of our project is to implement computer vision algorithms that use both types of image data to detect and track various objects. Ultimately we will track objects (e.g., obstacles and game tokens) in real time to autonomously control an iRobot Create, a small and inexpensive robot intended for educational purposes. A second goal is to incorporate gesture recognition using skeletal tracking so that human users may control the robot.
Time Series Analysis and Classification
Regie Felix and Sophie Darcy
This summer we are aiming to gain a better understanding of time series analysis and classification. Time series is a sequence of data that is taken in consistent time intervals. Using the data-mining software R, we will cover topics such as decomposition, classification, transformations, model-fitting, forecasting, and machine learning techniques such as decision trees and clustering. We will be applying these techniques to a variety of data sets to determine significant trends and predict future observations.
Improving Part Detection Algorithms using Functional MRI
Carter De Leo
Literature shows that humans can detect people in images better than machines. After breaking person detection into a four step algorithm, we hypothesize that several combinations using humans and/or machines for these different steps will show that the detection is especially more effective when humans do the features extraction.
Based on this analysis, we are trying to find out if the human brains react any differently when it sees human bodies (or human body parts) compared to when it sees any other kind of image (representing objects, blur, etc...). Using a functional MRI, we record the brain activities of the subject when he sees different type of images.
The next step is to extract the features from the functional MRI so as to create our own detection model and hopefully get better results than the detections algorithms already existing.
Instance Search on a Large Scale Data Set of Videos
An important need in many situations involving video collections (archive video search, personal video organization, surveillance, law enforcement, protection of brand/logo use) is to find more video segments of a certain specific person, object, or place, given a visual example. We are developing a system that given a collection of test clips and a collection of queries that delimit a person, object, or place entity in some example video, locates for each query clips most likely to contain a recognizable instance of the entity. This algorithm should be invariant to changes in illumination, viewpoint, and scale. We are investigating a system that works on a large scale database containing 70,000 video clips taken from different cameras with 21 topics.