G - Physics – 06 – K
Patent
G - Physics
06
K
354/59
G06K 9/36 (2006.01) G06F 17/24 (2006.01) G06K 9/20 (2006.01)
Patent
CA 2000012
A computer-implemented method operable with conventional OCR scanning equipment and software, extracts character data from printed forms. A blank master form is scanned and its digital image stored. Clusters of ON bits of the master form image are first recognized as part of a line and then connected to form lines. All of the lines in the master form image are then identified by row and column start position and column end position, thereby creating a master-form-description. The resulting image, which consists only of lines in the master form, can then be displayed. Regions or masks in the displayed image of master form lines are then created, each mask corresponding to a field where data would be located in a filled-in form. Each data mask is spaced from nearby lines by a predetermined data margin, referred to as D. A filled-in or data form is then scanned and lines are also recognized and identified in a similar manner to create a data-form-description. The data-form-description is compared with the master-form-description by computing the horizontal and vertical offsets and skew of the two forms relative to one another. The created data masks, whose orientation with respect to the master form has been previously determined, are then transposed into the data form image using the computed values of horizontal and vertical offsets and skew. In this manner, the data masks are correctly located on the data form so that the actual data values in the data form reside within the corresponding data masks. Routines are then implemented for detecting extraneous data intruding into the data masks and for growing the masks, i.e. enlarging the masks to capture data which may extend beyond the perimeter of the masks. Thus, the data masks are adaptive in that they are grown if data does not lie entirely within the perimeter of the masks. During the mask growth routine, lines which are part of the background form are detected and removed by line removal algorithms. Following the removal of extraneous data from the masks, the growth of the masks to capture data, and any subsequent line removal, the remaining data from the masks is extracted and transferred to a new file. The new file then contains only data comprising characters of the data values in the desired regions, which can then be operated on by conventional OCR software to identify the specific character values.
Casey Richard G.
Ferguson David R.
International Business Machines Corporation
Saunders Raymond H.
LandOfFree
Computer-implemented method for automatic extraction of data... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Computer-implemented method for automatic extraction of data..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computer-implemented method for automatic extraction of data... will most certainly appreciate the feedback.
Profile ID: LFCA-PAI-O-2015795