WO2001009821A1

WO2001009821A1 - Automatic image characterization by class correlation processing

Info

Publication number: WO2001009821A1
Application number: PCT/US2000/020110
Authority: WO
Inventors: David T. Carrott
Original assignee: Litton Systems, Inc.
Priority date: 1999-07-30
Filing date: 2000-07-21
Publication date: 2001-02-08

Abstract

The invention provides a method and apparatus for evaluating class membership of a subject (22), by comparing an image derived from the subject with library images derived from known members of the class. Subject images are sequentially compared, preferably by optical correlation, to one or more composite images or 'masks' which are each derived from a plurality of library images. Each library image is from a distinct member belonging to a defined class (for example, an ethnic group). A high degree of correlation indicates a high probability of the subject's membership in the class. A learning algorithm similar to a neural network training procedure is used to develop appropriate composite images representative of the class or classes to be recognized.

Description

AUTOMATIC IMAGE CHARACTERIZATION BY CLASS CORRELATION

PROCESSING

BACKGROUND OF THE INVENTION Field of the Invention

This invention relates to automatic pattern recognition by image correlation, and more specifically to the automatic classification of objects or persons, for example the automatic classification of persons into racial or ethnic groups based on imaged facial characteristics.

Description of the Related Art The U.S. census identifies four major race groups: White,- Black; American Indian, Eskimo and Aleut; and Asian and Pacific Islander. These categories are general and to some extent artificially defined. Much more specific categorization is possible. For example, the 1990 U.S. Census provides a statistical portrait of the nation's populace through a break-down of the top ten ethnic categories: German (23.3 %) , Irish (23.3%), English (13.1 %) , African (9.6%), Italian (5.9 %) , Asian (5.0%), Mexican (4.7 %) , French (4.1%), Polish (3.8%) and American Indian (Native American, 3.5%).

For various purposes it is often desirable to recognize and classify individual persons according to physiological features or ethnic background. For example, such information might be useful in market research, political polling, medical screening, or to provide an index for database organization.

Optical correlation techniques have been used to recognize and identify individuals based upon facial characteristics. For example, U.S. Patent No. 5,699,449 to Javidi (1994) discloses a system which uses a nonlinear Joint transform correlator which employs a supervised perceptron learning algorithm in a two layer neural network for face recognition. Javidi ' s system identifies a subject face by reference to clusters of weighted facial reference images, with a low probability of error. The facial reference images are all previously derived from the subject face. Such a system is useful for identifying an individual or verifying an identity, but it does not provide a way to directly classify an individual as a member of a class of individuals.

SUMMARY OF THE INVENTION In view of the above problems, the present invention provides a method and apparatus for classifying a person as a member of an ethnic group based upon imaged facial characteristics. Although classification of facial features is a primary contemplated application, it is not the exclusive field of use. More generally, the invention provides a method and apparatus for evaluating class membership of a subject, by comparing an image derived from the subject with library images derived from known members of the class. For example, the invention can be used to recognize class membership of objects which fall generally into classes with common structural or textural characteristics, or with similar visual "rhythms".

The invention compares subject images to one or more composite images or "masks" which are each derived from a plurality of library images. Each library image is taken from a distinct member of a defined class (for example, an ethnic group) . An image of the subject is correlated, preferably by an optical correlator, with the composite class image. A high degree of correlation indicates a high probability of the subject's membership in the class.

In one embodiment a learning algorithm is used to develop appropriate composite images representative of the class or classes to be recognized. Independently obtained indicia of class membership are used in a training procedure whereby images are added to the composite image only if an index of class membership exceeds a predetermined level .

In another embodiment, a learning process to recognize characteristics of class members involves provisionally adding an image to the composite image, then testing the composite sequentially against a library of verified class members' images. If the provisional addition results in an improved overall correlation with the verified library of images, the provisional addition is retained as part of the composite; otherwise, it is discarded.

Preferably, the invention correlates a subject image by reference to composites representing multiple classes, quantifies the resulting correlations as data fields, and stores the data fields ("metadata") for future reference. The resulting data fields can aid image selection or retrieval, or can serve as organizational indicia in a database.

These and other features and advantages of the invention will be apparent to those skilled in the art from the following detailed description of preferred embodiments, taken together with the accompanying drawings, in which:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram summarizing the system of the invention; FIG. 2 is a flow chart of the procedure used in the invention to classify images;

FIG. 3a, 3b, 3c and 3d are representative views of library images from a single class which are used to form a composite image associated with the class; FIG. 4 is a flow chart of a training procedure which is used in one embodiment of the invention to learn recognition of class members' images; FIG. 5 is a symbolic diagram illustrating the manner of combining images in the training procedure of FIG. 4;

FIG. 6 is a flow diagram showing an alternate training procedure which is used in another embodiment of the invention; and

FIG. 7 is a schematic diagram showing an optical signal flow in the invention, and also an optional image formatting scheme.

DETAILED DESCRIPTION OF THE INVENTION

The system of the invention is shown in summary form in FIG. 1. An imaging device 20, which suitably may be an electronic camera, video camera, CCD imager, or other imaging device, views a subject 22 and produces an electronic signal representative of the subject 22. The subject is frequently a human face, but the invention is not limited to such subjects. Any object which can be imaged could provide a subject. The signal from imaging device 20 is processed by pre-processor 24. Pre-processing typically includes digitizing the signal, formatting it suitably for optical correlation, and non-linear image manipulations such as thresholding or cropping. Preprocessing can be performed by a microprocessor with readily available image processing software. A 32 bit or better microprocessor or a dedicated signal processing chip designed for image pre-processing would be preferable.

The processed image signal from pre-processor 24 is output to input drive electronics 26 which drive an input SLM (spatial light modulator) of an optical correlator 28 (discussed in greater detail below) . The optical correlator 28 correlates the input image with a "filter" provided at filter input 30 via filter drive electronics 32.

The filter presented for comparison is selected and/or modified by a post-processor 34 from image and data storage

36, according to the method of the invention as described below (in connection with FIGs . 2 and 4-6) . Briefly, the filter is a composite image or "mask" formed according to a learning algorithm from multiple stored images (together comprising a characteristic "library") and/or input images. The filter thus represents a generalized class of images with certain characteristics in common. This filter is presented to the correlator 28 either in a Fourier transformed representation (if a Vanderlugt type correlator is used) or in a spatial representation (if a joint transform correlator is used) . The correlator 28 produces a detector output at 38 which is a correlation image, expressing the cross- correlation between the input and filter images. This output 38 is processed by post-processor 34 to find and quantify correlation peaks, which indicate a high degree of correlation between the input and filter images. The correlation peaks are interpreted by the post-processor 34 and the resulting data is output via an output channel 40 and/or stored in image and data storage 36.

Post-processor 34 can suitably be a general purpose microprocessor. Any processor having sufficient speed could be used. The same processor could serve as both the pre-processor 24 and the post-processor 34, but performance is enhanced by using dual processors. Sufficient random access memory should be provided to store at least several images simultaneously. Images should preferably be of at least 128 x 128 (rows x columns) resolution, with higher resolution most preferred.

The system operates in two modes : learning mode and operating mode (although optionally learning can occur simultaneously with operating mode) . When operating in learning mode, the invention receives manual information input via an input channel 42 , which may suitably be a keyboard, bar code scanner, or any convenient data entry device. Although in actual use learning mode should generally precede operating mode, the latter mode will be discussed first because this sequence will facilitate explanation. The learning mode is discussed in detail below in connection with FIGs . 4-6.

In operating mode the system executes a method of classifying objects by image analysis as shown in FIG. 2. The steps are preferably programmed in software and controlled by post-processor 34. First, an image of a real object is input as step 50 (accomplished by imager 20 in FIG. 1) . The post-processor 34 next selects (step 52) a filter or "mask" from image storage. The filter selected is specifically associated with a class of objects (members of the class) . This filter has been previously produced, preferably by the method of FIG. 5 below, and stored for reference. Next, the post-processor 34 correlates the input image with the filter, preferably by writing the images respectively to a input and filter inputs of an optical correlator.

Image correlation by optical correlators is known. See for example, U.S. Patent No. 5,311,359 to Lucas et al . , and U.S. Patent No. 5,148,496 to Anderson. Both of these patents disclose compact optical correlators capable of performing cross-correlation of digitized, pixellated images. The principles of operation of these devices are discussed in those patents. Their operation may be summarized as follows: An digital electronic input image is written to an electronically addressable, input spatial light modulator (SLM) , pixel by pixel. That SLM, internal to the optical correlator, modulates a coherent light beam. The modulated beam is optically Fourier transformed and the transform focused on a second or "filter" SLM. The filter SLM is electronically modulated, pixel by pixel, with another image which is the (usually digitally obtained) Fourier transform of a comparison image. The twice- modulated beam is then inversely Fourier transformed (optically) and the resulting image is read by a photodetector array, to provide an output image which is the cross-correlation of the input image and the comparison image .

To use an optical correlator in the present invention, an input image is written, under control of pre-processor 24,, to the input image input of optical correlator 28 (shown in FIG. 1) . The digitally obtained two-dimensional Fourier transform of the other image is similarly written to the filter input of the optical correlator 28. The correlator output 38 is then read by the data processor 34. The cross-correlation operation could be performed by alternative methods, including digital computation in either a frequency domain or a spatial representation, using the post-processor 34 (or another signal processing circuit), by well known methods. Such alternatives are also within the intended scope of the invention. In such an alternative realization, the correlator block 28 in FIG. 1 would symbolize the processor or circuit which is used to produce a correlation image. However, use of an optical correlator will typically result in much faster correlation, by several orders of magnitude.

Returning to FIG. 2, after the correlation step 54 the correlation image is analyzed (step 56) to detect correlation peaks. The position of the correlation peak signal will vary within the frame, but the presence of a clear localized peak indicates strong correlation between the input image and the filter (the composite image) . The peak level is then quantified by the processor as a correlation value which is saved (step 58) as "metadata", in association with the subject, and preferably also is output for a user. For example, if the subject correlates strongly with a given class, say class "A", then the correlation of the subject image with composite image or mask "A" has a prominent peak, and the peak intensity is saved in a database field indicating the subject's membership in class "A". If the application demands that a subject be compared with multiple classes, the method tests to see whether all classes have been tested (step 60) and if not returns via branch 62 to repeat steps 52-60 above. The resulting multiple correlation values are preferably all saved as metadata, and the combination of multiple values can be used to characterize a subject with specificity.

The above described method can recognize a subject's membership in a group only to the extent that a useful composite image (mask) is provided which records and generalizes the class defining characteristics in image form. In one embodiment, the invention relies upon a predefined database of composite images. Each composite image is associated with a specific class of objects, and is produced by a pixel-by-pixel summation of images derived from different members of that specific class. For example, FIGs. 3a through 3d shows images from 4 human faces. Assume that these faces are selected based upon their independently verified membership in an identified ethnic group (group B) . Although the faces are diverse, members of a common ethnic group will tend to manifest some (often subtle) facial structural similarities. A composite image is formed, pursuant to the invention, by taking a sum of the images: that is to say, the invention scans each image, according to a pixellated matrix, then assigns numerical values to each image pixel quantifying the image density (lightness or darkness) at that pixel location. The invention then sums pixel values at each location into a corresponding pixel on a composite image. Although only four images are shown in FIGs. 3a- 3d, in practice a much larger number of images (ten or more) should preferably be accumulated in a weighted average, in order to capture a wide range of in-class variation in features among members of a class. In a simple embodiment of the invention, as described above, a simple summation of images is performed, which assigns equal weight to each member image. This is appropriate to a circumstance in which any of the member images represents an equally probable instance of the class. If on the other hand, it is known that certain member images in a library have a higher probability of occurrence in a population, the library images should preferably be multiplied by weighting factors before summation, the weighting factors corresponding to the statistical probability of the associated image, to assign proportionally greater weight to statistically more probable library images. In another embodiment of the invention, instead of relying on pre-calculated composite images to characterize classes, or to supplement the initial class composites by learning experience, the invention adaptively "learns" to recognize members of one or more classes according to an algorithm which is essentially an optical realization of a supervised perceptron learning algorithm (as is commonly used with electronic neural networks) .

FIG. 4 shows the procedure of a suitable learning algorithm for training the system. A subject image is first input into the system (step 80) and correlated with an initial library of stored composite images (step 82) . The initial library is preferably created by summing weighted averages, pixel-by-pixel, of verified class members to form class composites, as described above. If the subject image correlates with any of the existing composites with a correlation peak value of more than some pre-determined threshold, the subject image is discarded and the system loops back via branch 84 to input another image and repeat. If any subject image fails to correlate above the threshold, a decision is made (step 86) and an operator is flagged to input class membership data associated with the subject (step 88) . Typically, this might take the form of questioning the subject as to ethnic background, or reading information from a birth certificate or passport if available. In other applications where learning is from a pre-compiled image library, associated class membership data may be already associated with each record, and step 88 could be fully automated by simply retrieving a data field from a database. A correlation value of 65% would be a suitable threshold value in a typical application.

If class membership data is available associated with the subject, the procedure makes another decision (step 90) : if the class identified with the subject is already associated with some composite image in the reference library, the method adds the image, optionally multiplied by a weight, to the associated composite (step 92) . This results in an improved recognition mask for the class. Typically the new image should be weighted equally with each of the individual basis images originally used to generate the composite, but in some cases membership may be uncertain or merely partial. In such a case the weight may advantageously be adjusted downward to reflect the subject's partial or uncertain membership.

If the class independently input in step 88 is not already in the class library of composites, the method preferably interrogates an operator (step 94) for instructions: whether to create a new class composite or disregard the subject image. If the operator inputs instructions to create a new class (step 96) , the subject image is used as the new composite for that class, with weight 1. As additional images are encountered from the same class, they will be averaged into the composite, broadening the scope of the composite image. For example, in an application identifying ethnicity of subjects, an Inuit native American might be encountered by the system for the first time. The operator could then, after verifying the subject as Inuit (step 88) , instruct the system in step 96 to create a new category for "Inuit". The Subject's image would then form the initial composite image, with additional images averaged in as more Inuit subjects are imaged by the system.

This training mode is further illustrated in FIG. 5, which symbolically shows a method of combining weighted images into composites ("morphing" the images) . As used herein, "morphing" means combining two or more images into a composite by (1) (optionally) multiplying each component image, pixel by pixel, by a weighting factor associated with that component image and held constant across the image, and (2) adding the images, pixel-by-pixel to form a composite image. The pixel value to be multiplied represents an intensity or density (lightness or darkness) , while the weighting factor represents a statistical weight . In a simple application, the weighting factors could all be equal, yielding a composite which is a simple average of the component images .

In an embodiment which uses a variant of a Vanderlugt optical correlator (for example, as described in the Lucas Patent, U.S. Patent No. 5,311,359), the composite image used for class identification should be two-dimensionally Fourier transformed before being written to the filter SLM for correlation. It may be desirable in some applications to save the Fourier transformed composite image rather than (or in addition to) the spatial representation. Learning can proceed according to the same method described above, by adding a weighted image to the composite image, performing the summation in the Fourier transform representation (first transform the images, then add them) . This is made possible by the linear property of the Fourier transform, as is well known. The resulting modified composite image or images can then also be stored in a Fourier transformed form.

In the FIG. 5, the formation of initial composites is illustrated generally at 102. Individual portrait-style images Wl through Wn, API through Apn, etc., correspond respectively to images of 1-n different white subjects, l-n Asian-Pacific subjects, etc. Images in the same grouping are added (pixel-by-pixel) . Corresponding pixels in each of Wl through Wn are added or "morphed" to create the corresponding pixel in the composite image "White", API through Apn added to create "AsianPacific", and so on. Other categories or classes are possible, including but not limited to those illustrated: male, female, and various hair types.

Learning mode is represented generally at 104. A new subject image 106 independently verified as pertaining to an Asian Pacific individual is added to the existing Asian- Pacific composite 108 to form a new Asian-Pacific composite 110. Typically the new image would be weighted equally with each of the original images (API through Apn in this example) .

Another training procedure which can be used with the invention is illustrated in FIG. 6. In this procedure, first a working composite image or "mask" is tested against a preferably large library of known class member images (stored in storage 36 on FIG. 1) , and the resulting correlation values are accumulated (step 112) . The invention then views various new subject images (which could occur during the course of routine subject classification procedures) . During this operation, occassionally a subject image may be encountered which yields an exceptionally high correlation value with a particular composite (higher than some predetermined threshold T, which suitably can be the average correlation value of the class member images which are used in the existing composite) . This image is then selected (step 113) for provisional addition to the composite. The new image is then provisionally added to the initial composite image (step 114) , then the modified composite is sequentially correlated with each image from the same library of class members' images (step 116 ) . If the provisional addition results in improved average correlation between the new composite and the known library of class members' images, the provisional addition is retained (branch node 117) ; otherwise, it is discarded, a new subject image is selected (step 118 ) and the method repeats, returning via branch 119 to step 113. This method can be used to create more accurate composites by drawing upon a large image database for training, even while operating to classify subjects. FIG. 7 shows an example of an optical signal process flow in one particular embodiment of the invention which is capable of correlating a subject image simultaneously with four different composite filters. The input SLM 120 is a 128 x 128 pixel matrix, electronically modulated with a 128 x 128 pixellated input image (subject image) . The input SLM 120 modulates a coherent beam (as described in the Lucas et al patent, for example) and is optically Fourier transformed, then further modulated by a filter SLM 122. The Filter SLM 122 is advantageously a 256 x 256 pixellated matrix as shown, divided into four quadrants 122a, b,c, and d. Each quadrant is modulated with a distinct and different (Fourier transformed) composite image, each composite image derived from a different class . The twice modulated beam is then Fourier transformed again (resulting in a spatial domain representation again) before focusing on the detector 124, which most suitably is another 256 x 256 matrix. Four independent correlation quadrants, 124 a,b,c and d, can be identified on the detector, corresponding respectively to the subject image's correlation with the filters 122a, b, c, and d. Optical correlation values OC1, OC2, OC3 , and 0C4 are then analyzed by a data processor. The correlation results can be summarized as "metadata", and stored in association with the subjects identification and image. The metadata is typically stored as one or more data fields, which succinctly summarize the subjects class memberships and facial characteristics. For example, data fields might be defined which record, for each subject, such data as : ethnicity, presence or absence of facial hair, hair texture, sex, etc., which characteristics were extracted from an image by the invention. The multiple metadata fields measured from a subject are preferably stored as relative degrees of correlation. For example, a subject might yield a correlation value of .6 with an Asian Pacific mask, a correlation of .1 with an African American, and a correlation of .8 with an American Indian. These values can be considered somewhat arbitrary metadata identifier tags, specific to an individual. With a large number of such metadata values, an individual could be very specifically characterized, providing a biometrically related numerical identifier. It is noteworthy that these correlation values can provide an identifier regardless of whether the subject actually belongs to any of the ethnic groups used for correlation. The (multiple) resemblances provide the biometric.

The greater the array size or resolution depth, the more detail which can be analyzed for facial characteristic detection and classification. Conversely, smaller arrays are preferable in applications where speed of operation is a priority.

The images used by the invention should preferably be taken in as nearly as possible identical conditions and orientations. For example, all frontal full face or all profile, and with identical magnification and lighting. Perfectly identical conditions are not necessary, however, to obtain useful correlations. It is an advantage of the invention that it is relatively insensitive to linear translation registration errors, because such errors merely shift the location of the peak correlation within a frame, without noticeably affecting its peak value.

The invention can readily be adapted for use on other than full face images. For example, the same technique could be used to analyze physiological details such as nose shape, eye shape, hair texture, etc. Indeed, any set of images which is categorizable into certain classes having more or less common shapes, textures, or rhythms can be analyzed by the invention. Some examples are imaged terrain, vegetation, physiological tissues, vehicles, wood grain or material textures. It is not necessary that the common characteristics be easily recognized by a human eye; it is sufficient if a composite image can be learned by the invention which successfully discriminates at some useful level of accuracy (which will depend upon the application) . While several illustrative embodiments of the invention have been shown and described, numerous variations and alternate embodiments will occur to those skilled in the art. For example, the invention has been described in connection with an optical image correlator. While such a correlator will in general be preferred, the correlation operations described could be performed by other means, including digital computation by a computer or a dedicated signal processor. This and other such variations are also within the scope of the invention. Such variations and alternate embodiments are contemplated, and can be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method of evaluating membership of a subject (22) in a class, comprising the steps of: forming a composite image from a plurality of library images, wherein each of said plurality of library images is derived from a distinct known class member; cross-correlating (54) a subject image with the composite image to obtain a correlation value; and evaluating (56) the class membership of the subject based upon the magnitude of the correlation value.

2. The method of claim 1, wherein said step of cross- correlating comprises the steps of: inputting (50) the subject image and the composite image to an optical correlator (28); computing (54) in said optical correlator a cross correlation image which is a cross correlation of the subject image with the composite image; and generating a correlation signal in response to a peak in the cross correlation image.

3. The method of claim 1, wherein the step of forming a composite image comprises the steps of: associating multiple library images derived from individuals which are verified members of a common class; and summing the associated multiple library images to obtain a composite image.

4. The method of claim 3, wherein said library images are spatial representations of individual subjects, represented by pixellated image frames, and said summation comprises adding image intensities, pixel-by-pixel, to obtain a corresponding composite pixel of the composite image .

5. The method of claim 3, further comprising the step of: before summing the associated multiple library images, multiplying each of said associated library images by an assigned weighting factor.

6. The method of claim 3, wherein said library images are frequency domain filters, represented by pixellated image frames, and said composite image is formed in the frequency domain by adding corresponding frequency components from frequency representations of library images, pixel-by-pixel, to obtain a frequency component of the composite image.

7. The method of claim 6, comprising the further step of: before summing the associated multiple library images, multiplying each of said associated library images by an assigned weighting factor.

8. The method of claim 3, wherein at least one of said subject and said composite image is optically Fourier transformed and the other one of said images is digitally Fourier transformed; and wherein said optical correlation is performed in the frequency domain.

9. A system for determining a person's membership in a specific facial characteristics group, comprising: an imager (20) for capturing an input image of the person (22) ; an image store (36) for storing a composite reference mask, which represents a weighted sum of library images independently known to pertain to a specific facial characteristics group; and a correlator (28) for correlating the input image with the composite reference mask to produce a correlation output signal indicating the degree of correlation with the composite reference mask formed from said specific facial characteristics group.

10. The system of claim 9, wherein said correlator is a digital computer.

11. The system of claim 9, wherein said correlator is an optical correlator.

12. The system of claim 11, wherein said optical correlator comprises: an input spatial light modulator (120) arranged to modulate a coherent beam of electromagnetic radiation with the input information to produce a modulated beam; a first optical transformer positioned to optically transform said modulated beam to a frequency domain representation of said input information; a filter spatial light modulator (122a-d) arranged to modify said frequency domain representation according to a filter; and a second optical transformer positioned to optically transform the modified frequency domain representation to produce an output image.

13. The system of claim 9, further comprising: a data processor (34), programmed to receive a group characteristic input identifying a group characteristic associated with an input image, and to modify said composite reference mask by adding a weighted input image based upon the group characteristic identifying input, according to a programmed learning algorithm, thereby creating a modified composite reference mask based upon said group characteristic input.