Tuesday, January 5, 2010

What is pattern recognition?


Introduction

How are objects recognized? This simple question encapsulates a difficult problem for the human visual system: taking a pattern of light from the world and finding in it surfaces, objects, other people, dangers, and pleasures. A paradox of the pattern-recognition process is that it all seems so easy: a person simply looks, and all the complexities of the world seem to jump out. One school of psychologists, typified by James Jerome Gibson, emphasizes direct perception, which to Gibson means that the first thing of which a person is aware in the perceptual process is objects themselves. All the intermediate stages, including the retinal image and numerous levels of physiological processing in the brain, simply do not matter to psychology.






That is how things appear, but there is considerable information processing between the light that strikes the eye and the recognition of a table or a friend. After the light is converted into electrical signals by the retina at the back of the eye, millions of nerve cells process the visual information. At the later stages of processing, many of the strategies used by the visual system are similar to the logical processes that one uses consciously in solving problems. Because the visual system performs these processes unconsciously, its operation has been called "unconscious inference." This term was coined by Hermann von Helmholtz
in the mid-nineteenth century to refer to the logical processes that occur in vision.


Perspective illusions provide an example of this unconscious processing. An image of a small bar can be placed on a photograph of railroad tracks that converge in the distance. The bar will appear small if it is over the near part of the tracks, but larger if it is over the more distant tracks. On the photograph, one’s visual inference process fails, because it is using a scheme that ordinarily works very well in the real world: near objects must produce a larger image on the retina to have the same real-world size as more distant objects. The human visual system puts distance and retinal size together so effectively that one normally experiences object constancy—the perception that objects at different distances are the same size even though they project as different sizes on the retina.




Distance and Pattern

One part of pattern recognition involves the distance of the pattern. Perception of distance is a good example of the unconscious inference that occurs in vision, because so many different sources of information go into a distance estimate. One source, based on binocular vision, is particularly useful because it can provide an estimate of absolute range. If one fixates an object with both eyes, the convergence of the eyes tells the brain how far away the object is. This is called the "stereoscopic cue," and it depends on coordination of the two eyes.


Other distance cues are effective even in a single eye. One is superposition: if one object partly covers another, the covered object must be more distant. This cue gives only relative distance, not absolute distance, as does the stereoscopic cue. Superposition works at any distance, however, while stereoscopic vision is useful only for objects within a few meters of the head. Another cue works only at very long distances—very distant objects, such as mountains, will look blue and hazy, while closer objects are sharp.


Still another source of distance information is motion. When one moves one’s head, nearby objects will sweep by faster than more distant ones. If the brain knows the distance to any one point of this sweeping texture (such as the distance to the pavement beneath one’s feet), it can calculate the distances to all other objects. These calculations would be difficult to do consciously even if one knew the mathematics required, but the brain performs the operations almost instantly.


Add to these distance cues other information such as familiar distance (one knows how far away the other side of one’s living room is without measuring it), and one has a large palette of information sources that can provide information about the distance of an object. These cues are put together and weighted according to their reliability, and the brain produces a composite distance estimate. A similar process occurs in recognizing objects. The visual image, sound and touch information, and one’s knowledge of the situation all combine to identify objects, even if they are at unfamiliar angles or if they cannot be seen clearly. It is the combination of many information sources, including memory, that makes the process quick and reliable.


A powerful theory of pattern recognition by psychologist Irving Biederman holds that people analyze objects into components that he calls "geons." These can be simple shapes, such as cylinders and cones, or patterns of edges. Each object has a characteristic set of geons, often not more than two or three, that defines it.




Pattern and Cognition

Pattern recognition stands at the center of human activity—it is essential for all aspects of mental life. In appreciating art, one is recognizing patterns, but the process also extends to the most mundane activities, such as recognizing coins when making change. The traditional way to study mental processes is to make a task requiring those processes more difficult until a subject can no longer perform it. In finding where and how the processes fail, psychologists can learn about their structures.


One way psychologists break down pattern recognition is to degrade images until they can no longer be seen. A classic experiment by Jerome Bruner
presents a good example. Bruner presented out-of-focus slides to groups of students, asking them to identify the objects in the slides as soon as they could. At first no one could name the objects, but as they became sharper and the visual information improved, the task became easier. The first guesses about the identities of the objects, however, were frequently wrong. Then Bruner played a trick on his subjects. Half of them saw the slides beginning in a very unfocused state, and many made wrong guesses about the objects presented. When most subjects had a guess but were still uncertain, the other half of the subjects were allowed to see the same slides for the first time. As the pictures came into focus, the second group was able to identify the objects sooner than the first, who had seen the unfocused images longer. The reason was that the first group had a lot of incorrect assumptions about the pictures, and these assumptions hindered their ability to change their minds as new information became available. The experiment shows the value of the information that one brings to a perceptual situation.


Another set of experiments, by Biederman, showed the key role of context and situation in identifying objects. A common object such as a sofa could be easily identified in a familiar setting such as a living room, but the same object in an unexpected setting, such as in a street scene, was very difficult to find. Subjects took several seconds to find the sofa in the street, even though they found it immediately in the living room. The sofa was identical in the two pictures—only the context was changed.


These examples show a contrast between two different types of information used to identify patterns. One begins with signals from the senses, called “bottom-up” information. It originates at the eyes, ears, and skin and is processed before arriving at the visual cortex. There the bottom-up information meets attention, motivation, and memory, the “top-down” sources of information. It is the meeting of top-down and bottom-up information that defines perception.


Another application of pattern recognition is in clinical medicine, where some patients experience brain damage that interferes with their pattern-recognition abilities. The damage is usually from strokes (interruptions of the blood supply to part of the brain), surgery, or accidents. Damage in different parts of the brain interferes with different aspects of the pattern-recognition process. It is clear that an injury that interrupts the nerve fibers linking the eye and the brain, for example, would interrupt pattern recognition by causing blindness. More interesting cases leave visual thresholds intact while disturbing recognition. Patients with this kind of damage have no difficulty in knowing that an object is present, or in avoiding it if it comes toward them, but do not know what it is.


One such case, described by Oliver Sacks in his book The Man Who Mistook His Wife for a Hat and Other Clinical Tales (1987), concerns a professor of music who remained productive and valued as a teacher although he had difficulty in recognizing things using vision alone. He would fail to recognize his students, for example, until he heard their voices. Then everything would snap into place and the lessons could begin. The professor suffered from visual agnosia, the inability to recognize objects. Milder forms of agnosia involve symptoms such as the difficulty in telling one person from another by looking at their faces or the inability to identify particular things in similar categories, such as the identities of flowers. These patients cannot be cured, but knowledge of the many sources of information in pattern recognition can help them cope with their handicaps. They can be taught to take more advantage of other information sources, such as sounds and context.




Pattern and Perception

Interest in pattern recognition is as old as the ancient Greeks, but little progress was made in explaining its mechanisms until well into the nineteenth century. During that century, mostly in Germany, methods were invented to investigate perception. Visual illusions, such as the railroad-track illusion described above, revealed some of the shortcuts that the visual system used to interpret scenes. These illusions are examples of the technique of stressing perception until it breaks down and learning about the process from the behavior of the system when it fails. The railroad-track illusion shows that vision uses perspective, among other things, to judge distance. Even when perspective cannot work, as in viewing a photograph or a drawing on a flat sheet, the system tries to use it anyway.


Perception is more than bringing a pattern into the brain. A pattern without a meaning is of no practical use. Perception, then, is the attaching of meaning to a pattern, a link of top-down with bottom-up information. At the core, it is a matching process. This has been described as template matching, like putting a stencil over a pattern. If the stencil (the template or concept in the head) matches the pattern (the signal from the eyes), then the world contains what was in the stencil and the pattern is recognized.


One real-world example of the importance of such template matching comes from neuroscience research on autism spectrum disorder (ASD). Studies conducted by neuroscientist Marcel Just and others suggest that while those with ASD can recognize patterns such as human faces, their brains have difficulty synchronizing the various parts of the brain that would give them meaning, such as their emotional content or familiarity.


There are many theories of perceptual recognition, but they all boil down to some form of template matching. Some machines, such as dollar-changing machines, also recognize patterns in this way. They look for exactly the required pattern—no variation is allowed. If the dollar bill is dirty or upside down, it is rejected. The system works well at rejecting counterfeit bills, but it is not flexible enough to do the sort of recognition in many contexts that humans do. Simple templates do not work very well when the pattern varies, as in identifying both “e” and “e” as the same letter. A template that matched the first letter would not recognize the second and would interpret the two patterns as having different meanings.


One solution to the matching problem is to match not the geometric pattern itself but some transformation of it. The pattern in the head would then have the same transformation. The letter “e,” for example, could be recognized by its features, as “closed pattern above a curved tail running from upper left to lower right.” This pair of features would identify both “e” and “e,” without picking other letters falsely. Some modern computer-based pattern-recognition machines use features in this way. The future of pattern-recognition research will be directed toward identifying the features that nature uses to recognize patterns and toward designing machines that use effective features. The human use of context and probability in recognition will also be built into more powerful recognition systems.




Bibliography


Biederman, Irving. “Perceiving Real World Scenes.” Science 7 July 1972: 77–80. Print.



Bridgeman, Bruce. The Biology of Behavior and Mind. New York: Wiley, 1988. Print.



Gregory, R. L. Eye and Brain: The Psychology of Seeing. 5th ed. Princeton: Princeton UP, 1998. Print.



Hamilton, Jon. "What's Different about the Brains of People with Autism?." Morning Edition. NPR, 4 June 2012. Web. 2 July 2014.



Root-Bernstein, Michele, and Robert Root-Bernstein. "What's the Pattern?." Psychology Today. Sussex, 31 Mar. 2011. Web. 2 July 2014.



Sacks, Oliver. The Man Who Mistook His Wife for a Hat and Other Clinical Tales. London: Picador, 2011. Print.



Snowden, Robert, Peter Thompson, and Tom Troscianko. Basic Vision: An Introduction to Visual Perception. Rev. ed. New York: Oxford UP, 2012. Print.



Solso, Robert L., Otto H. Maclin, and M. Kimberly Maclin. Cognitive Psychology. 8th ed. Boston: Pearson, 2008. Print.



Weisberg, Robert W., and Lauretta Reeves. Cognition: From Memory to Creativity. Hoboken: Wiley, 2013. Print.

No comments:

Post a Comment

How does the choice of details set the tone of the sermon?

Edwards is remembered for his choice of details, particularly in this classic sermon. His goal was not to tell people about his beliefs; he ...