Blog categories

Comments

Emotiva Is the World’s Second Most Accurate Predictive Eye Tracking Model

 

TL;DR — In March 2026, Emotiva V2 was evaluated on the MIT/Tübingen Saliency Benchmark (CAT2000 dataset) and achieved AUC 0.8755. That makes it the second highest-scoring model in the world among all saliency models tested, behind only DeepGaze MSDB from the Bethge Lab at the University of Tübingen. No other commercial model currently matches this level of accuracy in predicting where the human eye will land on an image.

 

What Is the MIT/Tübingen Saliency Benchmark?

 

The MIT/Tübingen Saliency Benchmark is the primary international standard for evaluating predictive eye tracking models: algorithms that forecast where the human eye will fixate when looking at an image, before anyone has actually looked at it.

Originally established at MIT and co-managed by the Bethge Lab at the University of Tübingen, the benchmark compares a model’s gaze predictions against real eye-tracking data collected from thousands of human participants across standardized image datasets. The main evaluation datasets are CAT2000 and MIT300.

Performance is measured across seven metrics: AUC (Area Under Curve), shuffled AUC (sAUC), NSS (Normalized Scanpath Saliency), CC (Correlation Coefficient), KL Divergence, SIM (Similarity), and Information Gain. Together, they measure how precisely a model separates fixated areas from non-fixated ones, accounting for known biases like center-bias that can inflate scores artificially.

To put the difficulty in context: of the approximately 35 models submitted to the CAT2000 leaderboard, fewer than ten exceed AUC 0.86. The theoretical upper limit, the Gold Standard, derived from actual human fixation data, stands at 0.9159.

 

Emotiva V2 Results (March 2026)

 

On the CAT2000 dataset, Emotiva V2 posted the following results (tested March 5, 2026):

Metric Emotiva V2 DeepGaze MSDB (1st) Gold Standard
AUC 0.8755 0.8847 0.9159
NSS 2.3232 2.5127 2.7429
CC 0.8721 0.9155 0.9685
Information Gain 0.3526 0.4932 0.8026

 

Only one real-world model ranks above Emotiva V2: DeepGaze MSDB, developed by Matthias Kümmerer’s team at the University of Tübingen and presented at ICCV 2025. It is, notably, from the same research group that manages the benchmark itself.

Below Emotiva V2: DeepGaze IIE (AUC 0.8692), UNISAL (0.8607), DeepGaze II (0.8640), SALICON (0.8406), and all other competing commercial and academic architectures.

Source: MIT/Tübingen Saliency Benchmark — CAT2000 Leaderboard, test dated March 5, 2026.

 

Why Predictive Eye Tracking Accuracy Matters in Marketing

 

Traditional eye tracking requires lab equipment, recruited participants, and weeks of setup. Predictive eye tracking via AI returns equivalent insights in seconds, on any image, before production begins.

The practical difference between a model at AUC 0.83, the typical range of commercial solutions before 2024, and one at 0.8755 is not a footnote in a technical paper. It is the difference between a directional estimate and an operational prediction: one you can stake real creative and media decisions on.

These are the use cases where that gap becomes measurable:

  • Packaging validation before print. Does the logo draw the eye before the product claim? Does the certification badge register at all? A model with AUC 0.87+ gives you a reliable reading before a single unit goes to press.
  • Pre-testing video scenes. Which frames hold attention — and which ones lose it? Predictive eye tracking on key scenes identifies drop-off points before editing is locked.
  • Headline and layout testing. Does the headline dominate the composition, or does a competing visual element steal focus first? Know before you buy media.
  • On-paper A/B testing. Run dozens of creative variants in minutes without a lab, without participants, without budget.
  • Regulatory and compliance review. Is the disclaimer visible? Does the warning label fall in the natural reading path? Verifiable with data, not assumptions.

The Emotiva V2 result on the MIT/Tübingen Benchmark is not a claim we are making. It is a publicly verifiable score, on a public leaderboard, produced by an independent academic institution. Open the link, find the row.

 

From Eye to Conversion: The Full Chain

 

Knowing where someone looks is only the first link. The full chain is:

eye → attention → emotion → decision → conversion

Each link depends on the precision of the one before it. A more accurate attention model produces more reliable emotional inference, which in turn yields more actionable behavioral predictions, all the way to the final step: conversion.

This is the territory explored in our latest research, Seeing Beyond: Unlocking Image Emotion with Contextual Depths, authored by Federico Cozzi, Andrea D’Eusanio, and Giuseppe Boccignone, published in ICIAP 2025 Workshops, Lecture Notes in Computer Science, vol. 16169, Springer, Cham, 2026.

The paper demonstrates that enriching a visual model with contextual text descriptions improves emotion classification accuracy by nearly 5%, even with relatively straightforward contextual additions. The principle is clear: the more precise the attentional model, the more robust the emotional one. The improvement from Emotiva V1 to V2 on the benchmark is not an isolated academic milestone, it raises the ceiling of every downstream link in the pipeline.

 

Predictive vs. Traditional Eye Tracking: When to Use Which

 

The honest answer is: they are complementary, not competing.

Traditional eye tracking (hardware-based, in-lab) remains the gold standard for academic research, clinical studies, and final-stage validation where precision on a specific stimulus in a specific context is required. It captures scanpath sequences, pupil dilation, blink rate — behavioral signals that predictive models do not produce.

Predictive eye tracking (AI-based) is unmatched at the pre-production stage: testing dozens of variants without a lab, scaling across markets and formats, integrating into a creative pipeline at speed. When the underlying model achieves AUC above 0.87, the two methodologies converge substantially for most marketing applications.

The choice is not which is “better.” It is which is right for the moment in your workflow.

 

Frequently Asked Questions

 

What is the MIT/Tübingen Saliency Benchmark?

The MIT/Tübingen Saliency Benchmark is an independent academic test that evaluates how accurately AI models can predict where human eyes will fixate on an image. It compares model predictions against real eye-tracking data on standardized datasets (CAT2000, MIT300, COCO Freeview). It is the primary international reference for ranking visual saliency models.

 

What is the most accurate visual saliency model in 2026?

As of May 2026, on the CAT2000 dataset, the highest-scoring real-world model is DeepGaze MSDB (AUC 0.8847), developed at the University of Tübingen. Emotiva V2 ranks second (AUC 0.8755). The theoretical Gold Standard, derived from human fixation data itself, is 0.9159.

 

What does AUC mean in eye tracking and saliency?

AUC (Area Under Curve) measures how well a saliency map distinguishes fixated areas from non-fixated ones, using a binary classifier framework. A score of 0.5 equals chance performance; 1.0 is a perfect prediction. Scores above 0.85 are considered state-of-the-art. It is the most widely used metric for comparing saliency models across benchmarks.

 

Can I verify Emotiva V2’s benchmark results independently?

Yes. The leaderboard is publicly accessible at saliency.tuebingen.ai/results_CAT2000.html. Locate the row labeled “Emotiva V2”, first tested March 5, 2026.

 

What does the Springer Nature study demonstrate?

Seeing Beyond: Unlocking Image Emotion with Contextual Depths (Cozzi, D’Eusanio, Boccignone — ICIAP 2025 Workshops, Springer 2026) shows that enriching a visual model with contextual text descriptions improves emotion classification accuracy by nearly 5%. The finding highlights the value of multimodal inputs for understanding the affective content of images, and its direct link to purchase decisions.

 

Is predictive eye tracking accurate enough to replace focus groups?

For most pre-production creative decisions, layout, visual hierarchy, attention flow, a model with AUC above 0.87 provides sufficiently reliable data to replace focus groups and informal review cycles. It does not replace human qualitative research for brand perception, messaging resonance, or emotional narrative. The right framing: predictive eye tracking answers “where does the eye go” at speed and scale; it does not answer “what does this mean to me.”

 

What is the difference between AUC and Information Gain in saliency benchmarks?

AUC measures classification accuracy in separating fixated from non-fixated pixels. Information Gain (IG) measures how much more predictive a model is compared to a center-bias baseline, in other words, how much of the accuracy comes from understanding image content versus exploiting the statistical tendency for people to look toward the center. A high IG (like Emotiva V2’s 0.3526) signals genuine content-driven predictive power, not statistical shortcutting.

 

Want to see Emotiva V2 applied to your own assets?

BOOK A DEMO
div#stuning-header .dfd-stuning-header-bg-container {background-image: url(https://emotiva.it/wp-content/uploads/2026/05/Emotiva-MIT-2026-1.png);background-color: #002f52;background-size: cover;background-position: top center;background-attachment: initial;background-repeat: no-repeat;}#stuning-header div.page-title-inner {min-height: 400px;}#main-content .dfd-content-wrap {margin: 0px;} #main-content .dfd-content-wrap > article {padding: 0px;}@media only screen and (min-width: 1101px) {#layout.dfd-portfolio-loop > .row.full-width > .blog-section.no-sidebars,#layout.dfd-gallery-loop > .row.full-width > .blog-section.no-sidebars {padding: 0 0px;}#layout.dfd-portfolio-loop > .row.full-width > .blog-section.no-sidebars > #main-content > .dfd-content-wrap:first-child,#layout.dfd-gallery-loop > .row.full-width > .blog-section.no-sidebars > #main-content > .dfd-content-wrap:first-child {border-top: 0px solid transparent; border-bottom: 0px solid transparent;}#layout.dfd-portfolio-loop > .row.full-width #right-sidebar,#layout.dfd-gallery-loop > .row.full-width #right-sidebar {padding-top: 0px;padding-bottom: 0px;}#layout.dfd-portfolio-loop > .row.full-width > .blog-section.no-sidebars .sort-panel,#layout.dfd-gallery-loop > .row.full-width > .blog-section.no-sidebars .sort-panel {margin-left: -0px;margin-right: -0px;}}#layout .dfd-content-wrap.layout-side-image,#layout > .row.full-width .dfd-content-wrap.layout-side-image {margin-left: 0;margin-right: 0;}