G - Physics – 06 – F
Patent
G - Physics
06
F
G06F 17/27 (2006.01) G06F 17/22 (2006.01) G06K 9/20 (2006.01)
Patent
CA 2486528
A method of automated document structure identification based on visual cues is disclosed herein. The two dimensional layout of the document is analyzed to discern visual cues related to the structure of the document, and the text of the document is tokenized so that similarly structured elements are treated similarly. The method can be applied in the generation of extensible mark-up language files, natural language parsing and search engine ranking mechanisms.
L'invention concerne un procédé destiné à identifier la structure d'un document sur la base d'indices visuels. La disposition bidimensionnelle du document est analysée en vue de détecter des indices visuels associés à la structure du document, le texte du document étant marqué de façon que des éléments de structure similaire soient traités de manière similaire. Ce procédé peut être mis en application dans la génération de fichiers de langage XML, l'analyse de langages naturels et les mécanismes de classement de moteurs de recherche.
Borden Ladner Gervais Llp
Tata Consultancy Services Limited
Tata Infotech Ltd.
LandOfFree
Document structure identifier does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Document structure identifier, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Document structure identifier will most certainly appreciate the feedback.
Profile ID: LFCA-PAI-O-1623037