Visual Attention applied to Object Recognition

Eye-tracking and attention visualization for object recognition

Comparing CNN attention with human eye-tracking.

CNNTensorflowKerasEye-tracking

Project description

The aim of the project is to compare how Convolutional Neural Networks and Humans see the world by comparing where computer and human pay attention to during an object recognition task.

Overview

Due to the rise of Visual Attention mechanisms in machine learning it is now possible to model the cognitive process of paying attention in a computer. For a Cognitive Science course project we decided to see if the computer will choose to look at the same thing as a human does to recognize objects of 10 categories and if human eye-tracking data can be used to speed up the process of training the machine learning algorithm.

The project is part of a Cognitive Science course at the University of Copenhagen.

Technical Details

We use the POET dataset that provides eye-tracking data from multiple annotators that classified over 6 thousand images into 10 categories.

Eye tracking heatmaps — Source: http://calvin.inf.ed.ac.uk/datasets/poet-dataset/

The project is divided into multiple parts that explore different training environments and relationships between the machine and the human attention:

Standard CNN trained with a global pooling layer on top to gain spatially-dependent class activations (reference).
CNN with attention (reference).
Using eye-tracking data to improve machine learning.

Results

The final report can be found here. My work can be found under sections about developing the attention visualization based on gradient theory and comparing CAM, soft attention and human attention.

My contribution

I came up with the idea for the project and devised the approach. I proposed a novel approach to visualizing human attention from eye-tracking data using gradient theory.

I developed a soft attention model in Tensorflow and extended an open-source CAM model. I visualized both attention mechanisms and compared them to human attention using PCC.