Chapter 3: Pattern Recognition (words, objects, and faces)
Word Recognition
Early cognitive psychologists proposed template models to explain word recognition, suggesting that people match incoming visual stimuli to stored templates of specific letter shapes in memory. While this approach works well for systems like check scanners in banks, which match account and routing numbers to pre-defined templates, it falls short in explaining human word recognition. Unlike machines, people can easily recognize words written in unfamiliar fonts or styles. Research shows that instead of relying on templates, humans recognize words by breaking them down into basic features—such as edges and lines—using feature detectors. These features are combined to form letters, and letters are then assembled into words, allowing for flexibility in recognizing words across different fonts.
The word superiority effect further demonstrates the complexity of human word recognition – and the top-down processes involved in word reading. This phenomenon shows that people can more accurately and quickly identify a letter, like "k," when it is presented within a familiar word (e.g., "work") rather than in a random string of letters (e.g., "wkro") or even in isolation ("k"). This suggests that our perception of individual letters is influenced by the context of the word, implying that we don't just process letters in isolation but also integrate them with higher-level linguistic knowledge to enhance recognition.
Object Recognition: Biederman's Work
Biederman proposed that mental representations of three-dimensional objects
are composed of basic shapes called geons, much like how written language consists of letters. When we recognize objects, we first break them down into their components (edges and geons) and analyze how these parts connect, which helps us match the object with stored information in memory (see Figure 3.1).
Figure 3.1. This figure illustrates Biederman’s recognition by components theory. Edges are combined to make basic shapes (i.e., geons) and these geons are combined to make objects like a flashlight.
"Recognition by components example objects and geons." by Kahan, T.A. is licensed under CC BY-NC-SA 4.0
Two crucial aspects of this process are identifying edges and vertices. First, finding the edges of an object helps us recognize relationships between these edges that remain consistent regardless of our viewing angle, like the parallel lines of a brick. Second, we focus on vertices, where lines intersect, often at concave angles, which are particularly informative for identifying an object's shape. For example, when looking at a briefcase, the rectangular body connected to a curved top segment provides a key description that helps with recognition.
Biederman's research showed that vertices play a critical role in object recognition. If a pattern is degraded, it’s easier to recognize if smooth edges are missing than if vertices are missing. When vertices are deleted, recognition becomes significantly harder, and Biederman makes the argument that this happens because the vertices help us identify the geons. Even when a large portion of a continuous line is deleted, people make fewer errors identifying objects when the vertices are intact. However, deleting vertices caused error rates to rise dramatically, emphasizing the importance of these junctions in object recognition.
In addition to examining how objects are recognized when presented alone, Biederman has also examined whether scenes have a top-down influence on object recognition.
In studies conducted by Biederman, participants were shown scenes and asked to identify objects within them. The findings revealed that recognition accuracy was significantly higher when objects matched the scene context (e.g., a fire hydrant in a city scene) compared to incongruent contexts (e.g., a fire hydrant in a diner scene). This phenomenon suggests that scene context facilitates object recognition by guiding attention towards relevant objects that fit the scene.
However, criticisms have arisen regarding whether these findings merely reflect guessing biases rather than genuine context-based facilitation of recognition. Further research, including studies by Kathy Mathis, has attempted to address these concerns. Kathy Mathis examined whether objects presented in isolation and within scenes, automatically activate their semantic meanings, despite the intentions of participants to ignore them. Her work builds on previous research that found objects automatically activate their semantic representations. Mathis conducted a series of experiments to examine whether this semantic activation is mandatory, even in different contexts. The critical question addressed is whether the automatic processing of objects can be modulated by the surrounding scene context. Results from several experiments show that when words are embedded in incongruent objects, they are categorized more slowly, supporting the idea that objects activate their meanings involuntarily (i.e., a person cannot ignore the picture's meaning when responding to the word). Then, in another experiment Mathis presented these pictures in probable or improbable backgrounds. Results show that interference occurred with probable objects but not with improbable ones. The study's implications suggest that scene context can inhibit or block the automatic semantic processing of incongruent objects, yet whether this top-down influence is happening before the object had be recognized (categorized) is still unclear.
Face Recognition: Importance of the Fusiform Face Area (FFA)
Faces are processed differently than other objects in the brain, particularly within the fusiform face area (FFA). This specialized region in the temporal lobe is dedicated to recognizing and processing facial features. Research by Isabelle Gauthier and Nancy Kanwisher has shown that while the FFA is primarily associated with face recognition, it may also respond to other visual categories.
For example, Gauthier's experiments involved training participants to recognize novel objects called "Greebles," which share a similar configuration but vary in specific features. Before training, FFA activity in response to Greebles was minimal, while responses to faces remained strong. After extensive training in "Greeble recognition," the FFA’s response to Greebles significantly increased, matching its response to faces. This suggests that the FFA is not exclusively dedicated to face processing but may also be involved in recognizing other complex objects, especially those with which individuals have extensive experience.
The concept behind these findings is known as experience-dependent plasticity, which posits that neural responses adapt based on experience. This idea was supported by earlier animal studies, such as experiments where kittens raised in environments with only vertical stimuli developed neurons that responded primarily to vertical orientations. Similar plasticity is observed in humans, where specialized regions for recognizing letters and word forms develop as people learn to read, indicating that the brain’s ability to adapt is not purely innate but shaped by experience.
While Gauthier’s research suggests that the FFA’s function may extend beyond faces to include familiar complex objects, Kanwisher has maintained that the FFA is specifically tuned for face recognition and the Greeble studies are flawed because these are face-like stimuli that are learned with names. To better address this issue Gauthier has examined the brain activity when bird and car experts view images of birds and cars. These studies also find activity in the FFA which supports experience-dependent plasticity.
Face Superiority Effect
The face superiority effect illustrates that we are more adept at recognizing and processing intact faces compared to scrambled or rearranged facial features. This phenomenon underscores the holistic and specialized processing involved in face perception, wherein the brain integrates facial features into a coherent whole.
Disorders of Face Recognition
Disorders such as prosopagnosia (face blindness) and Capgras delusion provide insights into the neurological underpinnings of face perception. Individuals with prosopagnosia exhibit difficulty in recognizing familiar faces, often due to damage to the FFA. In contrast, Capgras delusion involves a belief that familiar individuals are impostors, highlighting the intricate relationship between facial recognition and emotional processing. With prosopagnosia a person cannot consciously recognize a face but may nonetheless have an emotional reaction when pleasured with galvanic skin conductance. However with the Capgras delusion a person can consciously recognize a face as looking like a specific individual yet the emotional reaction is missing. Together these conditions suggest that there is a double dissociation between conscious face recognition and emotional face recognition.
A double dissociation is a concept used in cognitive neuroscience and neuropsychology to demonstrate that two cognitive functions are independent of each other because they rely on different neural mechanisms. It is established by showing that one brain lesion or condition affects one function while sparing another, and a different lesion or condition has the opposite effect.
Conclusion
The exploration of object and face recognition reveals the complexity and specificity of neural mechanisms underlying these processes. Biederman’s theory of recognition by components emphasizes how we decompose objects into basic shapes, or geons, to achieve recognition, with vertices playing a critical role in this process. The influence of scene context on object recognition adds another layer of complexity, highlighting the interaction between bottom-up and top-down processing.
Similarly, the study of face perception, particularly through research on the fusiform face area (FFA), demonstrates that our brains have specialized areas for processing faces and possibly other complex objects with which we have expertise. Gauthier’s work on experience-dependent plasticity challenges the notion that the FFA is exclusively for faces, showing that it can adapt to recognize other stimuli with sufficient experience. On the other hand, Kanwisher’s position that the FFA is primarily specialized for faces sparks an ongoing debate in the field.
Taken together, these studies highlight the adaptability of the visual system and the brain’s remarkable ability to fine-tune itself based on experience. Disorders like prosopagnosia and Capgras delusion further underscore the complexity of face recognition, demonstrating that conscious recognition and emotional response may rely on distinct neural pathways. The concept of double dissociation, as seen in these conditions, provides a powerful framework for understanding the independent yet interconnected nature of cognitive functions.