Vision is a very comprehensive and complex system in the brain. It is commonly known that over half of our brain’s energy is committed to some sort of visual processing at any given time. As a result, our experience of reality is largely shaped by the process by which our brain produces visual perceptions. Considerable progress has been made in understanding the neurophysiological basis of our visual sense in the last two decades, especially with imaging technologies such as PET scanning, fMRI, EEG, MEG and DTI (Diffusion Tensor Imaging). But many theories on visual perceptions were developed decades before scientists could visualize and interpret brain firing patterns. In this essay, we will present first the biological basis of visual perception and what is known of the main neural pathways of visual processing. We will then review major theories on visual perception. Finally, we will discuss the neuroscientific evidence behind the emergence of an “integrative model” of visual perception.
The Biological Basis of Visual Perception
Visual processing happens in many areas in the brain. The process is rather linear and involves multiple compression levels. It also relies on several processing stations. First, photons are captured by photoreceptive cells in the retina. Visual data is initially compressed 150 times by ganglion cells and further transmitted to the Lateral Geniculate Nucleus (LGN) of the thalamus as well as the superior colliculus (SC) of the midbrain. Both areas are subcortical structures rooted in the oldest parts of the brain. Though it is estimated that only 10% of the visual information is transmitted to the LGN, it is a huge volume of information considering that it far exceeds the entire volume of fibers which make up the auditory track. After the information has been partially transmitted to the thalamus, it continues its journey to the primary visual cortex (PVC) which is located in the back of the brain in our occipital lobe. By then, the information has not only been heavily compressed but it has also been processed by four different types of neurons.
The visual cortex is made of more than 30 sections
which are specialized in different types of processing. Many studies on the visual cortex are done on monkeys so there is less direct evidence on the visual neural pathways of humans than those of animals. However, while variability between animals and humans is obvious at the anatomical level, most studies suggest that few differences exist at the functional level. For instance, an area called V5 in macaque monkeys contains neurons that show activity when an object enters the visual field. This area has also been confirmed to exist in humans (Zeki, 1993).
Our visual perceptions appear to be produced by cells which fire on a hierarchical basis, coding from simple to more complex features. A cell which fires when a subject is exposed to a complex object is called a gnostic. We understand that memory is involved in this process to the degree that a visual perception represents a match for an object that would have been pre-coded in the brain. However, the process by which we code the visual perception of a new object remains mostly unexplained. There is some support for a theory referred to as ensemble firing which suggests that visual perception is more a function of many cells firing together rather than just gnostics. Also, the more familiar an object is complex, the more visual extraction commits energy from the brain, which again supports the fact that firing is easier when objects are quickly recognized
or matched with prior memories (Gazzaniga, 2009, p. 213).
Finally, it is important to note that there are sites in the superior colliculus (SC) which have cells that fire with multiple senses. The SC has a significant role in controlling and orienting movement which would suggest that evolution of the brain has favored the integration of multiple senses to guide reflective actions as well as behaviors that ensure our survival. In fact, studies show that the combined response from multiple stimuli is greater than the c**ulative response
from each stimulus, a process called multisensory integration (Gazzaniga, 2009. p. 199).
While imaging technology has helped scientists identify firing patterns produced by the processing of visual percepts, it is very difficult to properly control other stimuli that could trigger other action potentials, such as thoughts or any other subcortical activities, particularly those related to autonomic functions like breathing or digestion. Nevertheless, there are at least two major cortical pathways that are known to specialize in either the what of the visual percept or the why of the visual percept.
The What and the Where Visual Pathways
The what pathway
is represented by a ventral cortical stream of fibers which respond to a large field of view. The function of this pathway is to identify the object. Shape, color, and motion are critical clues in that process. The where pathway
follows a dorsal cortical stream projecting on to the parietal lobe which is also responsible for the management of our attention. The where pathway responds to a much more limited field of view than the what pathway. In effect, the ventral pathway specializes in recognizing while the dorsal pathway manages what comes next in the form of an action or a plan. It can be said that we have two visual perception systems at play in any given situation
, one to help us label the stimuli and another to guide our interaction with it or with the context it creates.
While recent imaging studies do show that there are specific neural pathways producing our visual percepts, such experiments are difficult to conduct. That is why it is important we now review the dominant cognitive theories developed on visual perceptions over the last century. We will then revisit the neuroscientific evidence produced in the last decade to propose an integrative theory of visual perception.
Dominant General Theories on Visual Perception
Considering that visual processing is such a critical function of human behavior, it is not surprising that so many theories have been developed to explain and predict the basis of how we form visual percepts. There are two main theoretical frameworks in which most important schools of thoughts can be assigned: the constructive perception framework and the direct perception framework.
The Constructive Perception Framework
According to scientists who support this view, visual percepts are created primarily by capturing and comparing the stimuli with existing information already encoded in memory. This process is believed to result from prodigious and almost instant computational activities produced by several areas in our brain, many operating below our level of consciousness. As a result, it is commonly referred to as a bottom-up perspective because it does rely on our ability to generate a massive number of inferences at lightening speed, especially in subcortical areas such as the thalamus and the mid brain.
Many cognitive psychologists have embraced the constructive view such as Bruner, Gregory, or Rock. One of the key aspects of the constructive theory is that the brain has an amazing capacity to store a database of patterns of objects it has encountered over time. This ability to recognize a large number of patterns quickly is commonly referred to as template matching
. It is known to happen faster for common objects than novel objects for which deeper feature extraction is necessary in order to find a good match.
Despite rather wide support, many scientists still challenge the validity of the template matching theory because it does not properly explain how we can recognize so many objects, places, or even faces that do not offer a perfect match to a previously encoded memory. An alternative to this model is called prototype theory, a concept that expands on the template theory by suggesting that our brain does need a perfect match, but rather searches for the closest match to a prototype before it produces a complete percept. In that context, a prototype represents an abstraction of templates or the average of a set of exemplars (Neumann, 1977 p. 77).
The Direct Perception Framework
This view supports the notion that all percepts are produced by the direct acquisition of information from the environment at any given time. Access to memory is not necessary because it is suggested that all the information we need is available to decode and integrate any percept. Leading supporters of this approach are Gibson (1979) and Cutting (1993). In Cutting’s own words, “Direct perception assumes that the richness of the optic array just matches the richness of the world”(93: p. 247).
Direct perception is believed to be more primal or even instinctual
than resulting from top-down computational activity. This view is largely influenced by the Gestalt movement
which emerged in the early part of the 20th century. Though perception is only one of the many aspects of Gestalt theory, it is an important one. For supporters of the Gestalt approach, patterns are naturally organized (Wertheimer, 1923). They coined the term pragnanz to describe the quality of good form and argue that we are biased towards seeing or perceiving organized views of objects or even motion. One such bias is called the law of continuity, which states that we are essentially wired to finish our visual perception to complete an existing pattern, even when our physical access to the information is partial or distorted. This explains, for instance, why we are able to easily deduct what an object is based on a sketch or even recognize a face with limited view or light. This also explains what happens with many optical illusions. Though we believe our percepts are direct translations of information, in many cases our brain produces an interpretation which goes beyond the stimuli by projecting either motion or completion so as to match the image to a familiar prototype.
Toward An Integrative View of Visual Perception
Both the construction and the direct perception frameworks appear to represent opposite views from an information processing standpoint. However, they actually offer complementary positions which are supported by discoveries generated by the last decade of neuroscientific research. For instance, we know now that information processing in the brain is not just hierarchical or linear but rather organized as a distributed system. There are in fact three specific aspects of both theories which are now getting considerable attention: feature extraction, domain-specific areas and hemispheric dominance.
There is clear evidence that the brain is constantly engaged in processing features of objects and that special cortical areas exist in the occipital lobe for different aspects of such processing. For instance, there are visual areas which specialize in processing form, others for processing color or movement. The constructive perception approach is largely supported by current research to the degree that we know the existence of special neurons which respond to horizontal lines while others respond to vertical lines, colors or special movement of an object.
Domain Specific Areas
FMRI studies have demonstrated the existence of special areas in the brain that are specialized to specific categories of stimuli such as the fusiform face area (FFA) for faces or the parahippocampal place area (PPA) for places. Subjects who are exposed to images of either faces or places show distinct patterns of activation in those brain areas. Interestingly, patients who have lesions or disease that affect neural pathways in either the FFA or PPA areas have trouble recognizing faces or places. It is as if they have lost the ability to process these kinds of visual percepts all together. Clearly, feature extraction capabilities are not enough for the visual percept to be wh***. In summary, recent research does not necessarily demonstrate that a visual percept is produced via a constructive perception process. It is possible that it may also result from a direct perception process managed by a domain specific area of the brain.
In 1985, Warrington suggested a two-stage model of object recognition. He claimed that the first stage is perceptual and managed by the right hemisphere (Gestalt); the second is semantic and done in the left hemisphere (feature extraction). This model may explain the existence of networks for specific types of objects or living things because they activate similar parts of either the right of the left brain. For instance, some manufactured objects are easier to recognize than others because they trigger activation of areas in the brain that are involved in associated movements or actions which can be performed with such objects. The division of visual perception between the right and the left hemisphere is a complex area of research. Clearly, our perception is enhanced by some level of specialization from either side of our cortex.
To conclude, visual perception will continue to gain attention from cognitive psychologists and neuroscientists for years to come. One promising area of research in that respect is DTI, a technology that can help neuroscientists map neural pathways better than fMRI can because it can image fiber tracks which create the myriad of neural pathways in the brain.
Cutting, J. E. (1993). Perceptual artifacts and phenomena: Gibson's role in the 20th century. New York, NY: Elsevier.
Gazzaniga M. S., I. R. B., Mangun G. R. (2009). Cognitive neuroscience: the biology of the mind
(Third Edition ed.). New York, NY: W. W. Norton & Company.
Gibson, J. J. (1979). The ecological approach to visual perception. Boston, MA: Houghton Mifflin.
Neumann, P. G. (1977). Visual prototype formation with discontinuous representation of dimensions of variability. Memory and Cognition, 5, 187-197.
Warrington, E. K., Shallice, T., & Pt. (1984). Category specific semantic impairments. Brain, 107, 829-854.
Wertheimer, M. (1923). Untersuchungen zur Lehre von der Gestalt. Psychologie Forschung, 301-350.
Zeki, S. (1993). A vision of the brain. Oxford, UK: Blackwell.
Submitted by Christophe Morin to Fielding Graduate Institute, PhD program, September 2009.