The Center for Bio-Image Informatics (CBI) hosted 5 undergraduate students from the California State University at San Bernardino during the 8-week (June 22- Aug 14) summer program at UCSB. Professor Art Concepcion helped in identifying the applicant pool and then selecting the five students. This is the twelfth year in a row that the CBI hosted this summer research program. In addition to the five CSUSB student, we also had one international student from Universidade Estadual do Rio Grande do Sul, Guaíba - Brazil, and a UCSB student from the Network Science IGERT Program (PI: Ambuj Singh). Each of the students had a graduate student and a faculty mentor, and the whole team met weekly to review progress. The undergraduate student interns gave weekly presentations and a final end-of-the program presentation was held on Aug 12th and attended by Prof. Concepcion. Students also prepared posters that were presented at the UCSB-wide event on August 14.
Projects
Micro-UAV Sensor Fusion with Latent-Dynamic Conditional Random Fields in Coronal Plane Estimation
Mentor
Amir Mohaymen Rahimi, B.S. Manjunath
Student Interns
Raphael Ruschel dos SantosUniversidade Estadual do Rio Grande do Sul, Guaíba - Brazil
Abstract
We present an autonomous unmanned air vehicle (UAV) system capable of the following tasks in real time: human detection, coronal plane estimation and face recognition. Due to the challenging environment of low-altitude hovering UAV, the on-board camera is more susceptible to parallax effect. P-N learning from tracking-detection-learning (TLD) is a fast and robust technique that uses many positive and negative templates for modeling the visual appearance of the target. We decided to use P-N learning technique to model the appearance of human body mainly because of the robustness of the TLD algorithm to camera movements. We create appearance models for eight surrounding viewpoints of human body. Each model is then evaluated on a real time video sequence and the UAV is automatically sent to face the front of the person. We search for the face within the top part of human body using cascade of Haar features, after the face has been detected, we use optical flow to continuously track the face. Our current dataset consist of 124 videos captured from different altitude, orientation, gimbal angle, location and time. Using the frontal view videos, we created a face dataset containing images from eight selected person and this data was used to train a Fisherface classifier that is used for face recognition.
Automated Diabetic Retinopathy Detection Using Deep Neural Networks
Mentor
Oytun Ulutan, B.S. Manjunath
Student Interns
Keaton BoardmanCalifornia State University, San Bernardino
Abstract
Diabetic retinopathy, (DR), is increasingly prevalent in populations. According to the National Eye Institute, "From 2000 to 2010, the number of cases of diabetic retinopathy increased 89 percent from 4.06 million to 7.69 million" within the USA. Due to the high cost of examinations and the lack of physicians, an automated process for early diagnosis of DR is necessary. In this project we explore the utility of deep neural networks for retinal image analysis. Given retinal fundus images, our method classifies images into five distinct classes, healthy, mild, moderate, severe, and proliferative DR. Multiple neural network models are trained for this purpose where each model is specialized for different objectives which include per class classification (one vs rest), regression etc. The results of these models are treated as features, and are combined using an early feature fusion algorithm to obtain the final classification of the images. The model is trained and tested on the Kaggle DR challenge data set which consists of 88,704 retinopathy images with 35,127 labeled for training and 53,577 for testing.
Gaze Scanpath Prediction Using Recurrent Neural Networks
Mentor
Thuyen Ngo, Rohan Jain, B.S. Manjunath
Student Interns
Michael MonaghanCalifornia State University, San Bernardino
Joshua DunhamCalifornia State University, San Bernardino
Abstract
Human overt attention consists of only 4 to 8 degrees of our 205 degree field of view. A significant portion of all vision processing is performed within this local region. In order to understand the environment, we continuously move our eyes to build a concept of the scene. In this project we aim to model this behavior using human eye tracking data. Given an input image we predict the most likely sequence of locations that humans look at. We build and compare two different models using Long Short Term Memory and Reservoir Computing. Both models are trained using stochastic gradient descent with features extracted from a pretrained convolutional network. We validate our models using the MIT1003 dataset.
Deep Learning for Object Recognition
Mentor
Utkarsh Gaur, B.S. Manjunath
Student Interns
Mark SwoopeCalifornia State University, San Bernardino
Abstract
Object recognition is an important problem in computer vision with various interesting applications such as autonomous driving, image-based search and robotics. With the advent of large internet databases such as Flickr and YouTube, the computer vision research community now has access to terabytes of data. Recent research has shown models known as deep neural networks (DNNs) to be capable of taking advantage of such large image databases. DNNs are comprised of basic linear models called “neurons” placed in hierarchical stacks to attain a “deep”, non-linear overall structure. These DNNs are highly scalable and have been shown to effectively model complex visual concepts. In this project, we implemented multiple machine learning algorithms and simple neural network models to classify handwritten digits from the MNIST dataset. Next, we extended these models to construct a deep, convolutional neural network to recognize objects from a challenging large scale dataset called Tiny-ImageNet. This dataset consists of 100 thousand images collected from the web which span over 200 object categories including pedestrians, vehicles, and buildings.
Understanding the Perceptual Importance of Camera Views (Best Project Award)
Mentor
S. Karthikeyan, B.S. Manjunath
Student Interns
Mark MartinezComputer Systems, California State University, San Bernardino
Abstract
When an analyst queries a multi-camera network, selection of the most relevant camera views to transmit is a challenging problem. We quantify the relevance of events occurring in the video feeds by generating a perceptual rating. This rating, on a scale of 1-10, is obtained from multiple subjects who simulate analysts. The primary objective of this project is to predict the analysts’ perceptual rating given a video feed. We propose a regression based learning algorithm that processes videos and computes low-level (background subtraction, optical flow), mid-level (face/person detection), and high-level features (action bank, tweets) from videos. These multiple features are fused using state-of-the-art early and late fusion techniques to predict the perceptual rating. Our regression methods utilize a leave-one-view-out testing scheme to ensure generalizability to unseen camera views. The proposed method is evaluated on a large-scale high-definition video dataset of about 45 hours of videos. We demonstrate promising results and obtain mean absolute error of less than one to predict the human perceptual rating.