Visual Attention

Automatic or intentional


Visual Attention

One of the most enduring questions in cognitive science: how does the human visual system decide what to pay attention to? When we look at a crowded scene, our attention doesn’t spread evenly. It jumps, guided either by the stimuli themselves (a bright red dot catches our eye — a “bottom-up” process) or by our intentions and goals (we search for a friend in a crowd — a “top-down” process).

Researchers have long debated which of these forces dominates at the earliest, “pre-attentive” stage of vision — the moment before we consciously decide where to look. Two classic studies define the poles of this debate. Theeuwes (1992) claimed that our attention is inevitably captured by any physically salient item, even if we try to ignore it. Bacon and Egeth (1994), on the other hand, argued that attention can be guided strategically depending on the task — that top-down control can override distraction.

I tackled with this question in my MSc Cognitive Science thesis research at Bogazici University in 2006. My project set out to test these claims through a series of three visual search experiments. Participants viewed displays filled with coloured letters on a screen and had to find a target letter (for example, a “G” among many “C”s). Sometimes, one of the distractors was a colour singleton — brighter or darker than the rest — to see whether it would pull attention even when irrelevant. Reaction times and accuracy were measured across different layouts (random or circular), set sizes (5 or 20 letters), and colour combinations.

Surprisingly, the results did not fully support either of the existing theories. Participants were generally able to ignore irrelevant singletons, suggesting that bottom-up capture was weaker than predicted by Theeuwes. Yet the pattern of reaction times also showed that not all attention was under conscious control. Search efficiency depended on several intertwined factors — the colour of the target, its location on the screen, the layout of the display, and the number of items. The upper half of the display, for instance, was consistently processed faster than the lower half, hinting at asymmetries in how visual attention is deployed across space.

An important discovery was that the average time spent per item changed depending on whether the target was present or absent. When no target existed, participants spent longer inspecting each element — evidence that visual search involves an elimination process rather than a simple, uniform scan. These findings point toward a more dynamic model of attention in which perceptual and cognitive factors interact continuously rather than obeying a strict hierarchy.

To explore this idea further, one of the experiments was simulated in ACT-R/PM, a cognitive architecture that models human perception, memory, and motor control. While the model successfully captured the overall task structure, it was slower than human participants and failed to reproduce some key human patterns — especially the inverse relation between set size and item-processing time. This discrepancy highlights how much remains to be understood about the coordination between perceptual and cognitive mechanisms in human vision.

In sum, the study suggests that our attention is neither purely stimulus-driven nor purely goal-driven. Instead, it emerges from the ongoing negotiation between what the world presents and what the mind seeks.

(For those interested in the full study, the complete thesis is available here.)