G - Physics – 06 – F
Patent
G - Physics
06
F
G06F 17/00 (2006.01) G06F 7/00 (2006.01) G06F 17/27 (2006.01) G06F 17/30 (2006.01)
Patent
CA 2425850
Processing of source documents to generate data for indexing, and of queries to generate data for searching, is done in accordance with retrieved tokenization rules and, if desired, retrieved normalization rules. Tokenization rules are used to define exactly what characters (letters, numbers, punctuation characters, etc.) and exactly what patterns of those characters (one or more contiguous characters, every individual character, etc.) comprise indexable and searchable units of data. Normalization rules are used to (potentially) modify the tokens created by the tokenizer in indexing and/or searching operations. Normalization accounts for things such as case- insensitive searching and language-specific nuances in which document authors can use accepted variations in the spelling of words. Query processing must employ the same tokenization and normalization rules as source processing in order for queries to accurately search the databases, and must also employ another set of concordable characters for use in the query language. This set of "reserved" characters includes characters for wildcard searching, quoted strings, field- qualitied searching, range searching and so forth.
Dow Jones Reuters Business Interactive Llc
Factiva Inc.
Gowling Lafleur Henderson Llp
LandOfFree
Apparatus and method for generating data useful in indexing... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Apparatus and method for generating data useful in indexing..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Apparatus and method for generating data useful in indexing... will most certainly appreciate the feedback.
Profile ID: LFCA-PAI-O-1797380