Chapter 7: Object Recognition
Introduction
Object recognition is essential for identifying and categorizing the various objects and entities that populate our environment. In this chapter, we will explore how our brains process and recognize objects, touching upon key concepts and principles that help shape our perceptual experiences.
Gestalt Principles: Making Sense of Visual Information
The Birth of Gestalt Psychology
Max Wertheimer is credited with developing the Gestalt view of psychology. As the story goes, Wertheimer stumbled upon these groundbreaking ideas during a train journey in 1911. His encounter with a toy stroboscope, featuring alternating flashing lights, sparked a revelation. Instead of perceiving the lights as either stationary and flashing, or even as moving back and forth, Wertheimer perceived an invisible background element covering and uncovering the lights, creating the illusion of motion. This experience challenged the prevailing notion of structuralism, which asserted that sensations led to perceptions. In this case an invisible background object was perceived as moving in a manner to cover and uncover the lights. Wertheimer's observation hinted at something deeper - that our perceptual experience transcends sensory input.
Gestalt Grouping Principles
Following this Gestalt psychologists sought to create rules or laws that would help to explain visual perception. Gestalt psychology introduced a series of principles that illuminate how we perceive objects and scenes beyond the sum of their individual parts. These principles offer insights into how our brains naturally organize visual information:
- Proximity suggests that we tend to group objects that are physically close to each other. For example, when presented with a set of dots, our brains instinctively group nearby dots, forming perceptual clusters. This principle often influences our perception of spatial relationships in scenes.
- The principle of similarity highlights our tendency to group objects that share common attributes, such as shape, color, or size. When faced with a collection of elements, we naturally group those that exhibit similar characteristics, making it easier to identify patterns and objects within a scene.
- Good continuation dictates that we prefer to perceive continuous and uninterrupted forms or patterns. When lines or contours intersect, we tend to follow the smoothest and most continuous path. This principle helps us distinguish objects from their backgrounds and identify cohesive shapes.
- Closure refers to our inclination to mentally complete or "close" incomplete shapes or forms. Even when presented with partial information, our brains strive to perceive whole objects. This phenomenon allows us to recognize objects even when some parts are obscured or missing.
- Common region suggests that we group elements that appear within the same spatial region or boundary as belonging together. When objects share a common space, we perceive them as belonging to the same group, despite potential differences in their individual attributes.
- Connectedness occurs when objects are connected. When objects are physically connected by lines or other visual cues, we tend to group them together. This principle helps us perceive relationships and interactions between elements in a scene.
- The figure-ground distinction is a fundamental aspect of object recognition. We continuously assess our visual environment to distinguish between the main object of interest (the figure) and the surrounding context (the background). This distinction guides our attention and shapes our perception of objects.
Shading and Lighting: Perceiving Object Shape
Our perception of object shape and orientation is heavily influenced by shading and lighting cues. In our daily lives, we unconsciously assume that light comes from above, a bias likely shaped by our experience on Earth where sunlight illuminates objects from overhead. This assumption profoundly affects how we perceive the three-dimensional characteristics of objects.
Shading and Shape
Shading provides crucial information about an object's shape and depth. When an object is lit from above, it typically appears as if its upper surface is illuminated, while the lower parts remain in shadow. Ramachandran has found that this creates the impression of a raised or bulging form when a circle is lighter at the top and dark at the bottom. Conversely, when a circle is lighter at the bottom and darker at the top, it appears as though the upper surface is in shadow, creating the illusion of a depression or dimple.
Figure 7.1
People have a bias to see the light source from above which creates the perception of a bump when the top portion of a circle is lighter and a dimple when the bottom portion is lighter.
"Perception of circle." by Kahan, T.A. is licensed under CC BY-NC-SA 4.0
Real-World Application
In environments with microgravity, such as the International Space Station (ISS), there is no inherent "up" or "down." To address this, astronauts and engineers take advantage of our bias to perceive light from above. By designing spacecraft interiors with lighting schemes that mimic this bias, they provide visual cues for orientation, helping astronauts maintain their sense of uprightness and navigate their surroundings effectively.
Recognition by Components (RBC) Theory
Recognition by Components (RBC) theory, developed by Irving Biederman, is a major theory of object recognition that provides valuable insights into how our brain processes visual information. This theory suggests that the recognition of objects occurs by breaking them down into their fundamental components or geometric shapes, known as "geons." Here, we will delve into the details of Biederman's Recognition by Components theory.
Overview of Recognition by Components Theory
Biederman's theory postulates that the process of recognizing an object involves several stages:
- Edge Extraction: The initial step in object recognition is the extraction of edges from the visual input. When we perceive objects, our brain first identifies and processes the edges and contours that form the boundaries of those objects. This edge information is crucial for further analysis. This aligns with our understanding of visual processing in the brain, as regions like V1 (primary visual cortex) are known to respond strongly to edges in the visual field.
- Formation of Geons: Geons are basic geometric shapes that represent the building blocks of object recognition according to Biederman's theory. There are 36 such geons in total, and Biederman proposed that by combining these geons in various ways, we can recognize and represent all the diverse objects in the world. Geons can be thought of as three-dimensional shapes like cylinders, cones, cubes, and more. These shapes are simple yet versatile enough to compose complex objects.
- Object Recognition: By assembling and combining geons, we can recognize and identify different objects in our environment. For instance, putting together geons in a specific arrangement could represent a suitcase or a flashlight, demonstrating the versatility of this approach.
Challenges and Controversies in RBC Theory
While Recognition by Components theory has provided valuable insights into object recognition, it has also faced debates and challenges:
- Contextual Influence: One major point of debate is the extent to which context influences object recognition. Biederman himself explored whether the context in which an object is presented affects perception (he did this using the object detection task described below). For instance, does seeing a deer in a forest make it easier to recognize than seeing it in an unexpected place like a bedroom? Biederman felt that scenes do provide a top-down influence that facilitates the recognition of congruent objects.
- Functional Isolation Model: Some researchers, such as Henderson and Hollingworth, have proposed a Functional Isolation Model of object recognition. This model suggests that scene context only plays a role after initial object recognition.
Object Detection Task and Contextual Influence
Biederman designed experiments like the Object Detection Task to explore the influence of context on object recognition. Participants were asked to identify whether a prenamed object was present in a rapidly presented scene. The results suggested that context influenced recognition accuracy. For example, people were more accurate identifying the presence of a fire hydrant if it was shown in a city street than if this had been shown on the counter of a diner. However, some argued that these results could be attributed to guessing rather than context-driven recognition. Unfortunately, no study that I’m aware of has fully nailed this issue down.
Interference Task and Contextual Influence
Kathy Mathis attempted to answer this question by conducted experiments using an interference task. Participants had to categorize words as quickly as possible (categorizing the words as animal, food, furniture, or clothing) and these words would appear inside a picture of an object (e.g., a deer) or a nonobject (e.g., a blob that is object like but does not have meaning). The idea was that it would be harder to categorize the word shirt as clothing when shown inside a deer relative to a blob since the deer, if recognized, would make people want to press the animal key. This is exactly what happened. Critically, these words and pictures were then inserted into scenes where the objects could be probable (e.g., a deer in a forest) or improbable (e.g., a deer in a bedroom).
- In probable scenes (where the object fit the context), participants experienced more interference, indicating that the context facilitated recognition.
- In improbable scenes (where the object did not fit the context), participants experienced no interference. This finding indicates that the scene affects object recognition but it raises questions about the timing of this contextual influence.
- At first glance it may appear as though the object was fully ignored when it didn’t fit the scene, but in order to know the object mismatches with the scene the object and scene must first be recognized. So, when in the sequence of processing did the scene affect object recognition? Did it affect the online recognition of the object or did the scene only have an influence after the object was recognized?
Debate About Timing of Contextual Influence
The debate continues regarding whether the influence of scene context on object recognition occurs early in the processing stream (facilitating recognition) or late (affecting perception after recognition). Some argue that context influences early processing, while others suggest it might occur after recognition.
Object Updating Theory
Introduction to Object Updating Theory:
The Object Updating Theory, developed by James Enns and Vincent DiLollo, is a fundamental concept in the field of visual perception. This theory elucidates how our brains process and update information about objects. Before delving into specific visual effects that might be explained by this theory (such as the Flash Lag Effect and Common Onset Masking), let's explore the core principles of the Object Updating Theory.
Principles of Object Updating Theory:
- Object Files: One of the key tenets of the Object Updating Theory is the idea of continuous updating. When we perceive moving objects, our brains constantly update the information within these object files to track the objects' positions and attributes over time.
- Continuous Updating: As we encounter new objects in our visual field, our brains generate new object files for them. This allows us to seamlessly integrate these objects into our ongoing perceptual experience.
- Handling New Objects: As we encounter new objects in our visual field, our brains generate new object files for them. This allows us to seamlessly integrate these objects into our ongoing perceptual experience
The Flash Lag Effect:
The Flash Lag Effect is a captivating visual illusion that aligns with the principles of the Object Updating Theory. This phenomenon creates the perception that a flashed object lags behind a moving object, even when they are physically aligned. In simpler terms, it makes the moving object appear ahead of the flashed object. As an example, imagine a dot moving in a clockwise manner around a central fixation point. If a square appears between the dot and the fixation when the dot hits the 3:00 position of a clock, the two (dot and square) will appear aligned if the dot stops moving (stopped motion condition) but the dot will appear to be ahead of the square (i.e., the square will lag behind) at the moment it was flashed if the dot continues on its path toward the 4:00 position (continued motion condition).
Figure 7.2
In the flash lag effect a moving dot appears ahead of a flashed target if the dot continues the path of motion.
"Flash lag effect." by Kahan, T.A. is licensed under CC BY-NC-SA 4.0
Applying Object Updating Theory to the Flash Lag Effect: The Object Updating Theory offers an insightful explanation for the Flash Lag Effect. When we observe a moving object, our brains create an object file for it and continuously update its location. When a new object, such as a flashed square, appears at the same location, a new object file is generated. However, the updating process for the original moving object continues. As a result, when the dot continues moving and we are later asked where the moving dot had been when the dot was flashed people think the two were misaligned.
Size Change and Object Perception: Moreover, the Object Updating Theory predicts that dramatic changes in an object's attributes, like size, can influence our perception. When an object undergoes a significant size change, our brains may treat it as a new object, leading us to perceive multiple objects instead of one object that changes over time. This aspect of the theory sheds light on why people might perceive three distinct objects in a display involving a rapidly moving dot, a flashed square, and a dramatically resized dot. In this situation people perceive the small dot as a distinct object that is aligned with the square and central fixation at the moment the square was flashed (rather than perceiving the dot as appearing ahead of the square at the moment the square was flashed).
Figure 7.3
The flash lag effect goes away if the moving dot has a dramatic size change.
"Flash lag effect." by Kahan, T.A. is licensed under CC BY-NC-SA 4.0
Common Onset Masking:
Exploring Common Onset Masking: Common Onset Masking, also known as Object Substitution Masking or Four Dot Masking, is another visual phenomenon explicable through the lens of the Object Updating Theory. In this effect, a briefly displayed target object, surrounded by four small dots, becomes challenging to identify when the dots persist on the screen after the target vanishes. However, if the dots disappear simultaneously with the target, identification becomes more manageable. For example, if a person is asked to recognize a shape (e.g., a triangle) that is surrounded by four small dots the person will have relatively little difficulty doing this if the display is shown rapidly and dissapears. However, performance is much worse if the dots stay on the screen following offset of the shapes.
Figure 7.4
In common onset masking four dots will obscure a target if the dots remain visible beyond target offset.
"Common onset masking." by Kahan, T.A. is licensed under CC BY-NC-SA 4.0
Applying Object Updating Theory to Common Onset Masking: The Object Updating Theory provides a clear framework for understanding Common Onset Masking. When the target and surrounding dots are presented rapidly and disappear together, our brains create an object file for the target and masking dots (e.g., triangle plus dots). If the dots disappear with the target then person performs well when asked what shape had been shown. However, when the dots persist after the target's disappearance, the continuous updating process results in the object file containing the dots alone (without the triangle). Consequently, identifying the target becomes more difficult under these conditions.
Conclusion:
In conclusion, Gestalt grouping principles have demonstrated their influence on how individuals perceive and organize information within the visual environment, shaping the way elements are grouped into coherent and meaningful objects. Shading influences our perception of objects since humans have a bias to perceive the light source as coming from above. Concurrently, the Recognition by Components theory offers a framework that aligns with our understanding of brain processing, suggesting that objects are recognized by breaking them down into edges and then reassembling them into basic geometric shapes (geons) and combining these to form objects.
Furthermore, the Object Updating theory posits that iterative processing in the brain plays a pivotal role in tracking objects as they move and change, shedding light on various visual phenomena such as the flash lag effect and common onset masking. These theories collectively contribute to our comprehension of the intricate processes underlying object recognition and the dynamic nature of visual perception.