Method and system for automatically extracting data from web...

G - Physics – 06 – F

Patent

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

G06F 7/00 (2006.01)

Patent

CA 2614774

In accordance with an embodiment, data may be automatically extracted from semi-structured web sites. Unsupervised learning may be used to analyze web sites and discover their structure. One method utilizes a set of heterogeneous "experts," each expert being capable of identifying certain types of generic structure. Each expert represents its discoveries as "hints." Based on these hints, the system may cluster the pages and text segments and identify semi- structured data that can be extracted. To identify a good clustering, a probabilistic model of the hint-generation process may be used.

Selon un mode de réalisation de l'invention, des données peuvent être automatiquement extraites à partir de sites Web semi-structurés. Un apprentissage non supervisé peut être utilisé pour analyser des sites Web et pour découvrir leur structure. Une méthode de l'invention fait appel à un ensemble "d'experts" hétérogènes, chaque expert permettant d'identifier certains types de structure générique. Chaque expert représente ses découvertes sous forme "d'indices". En fonction de ces indices, le système peut regrouper les pages et des segments de texte, et identifier des données semi-structurées pouvant être extraites. Pour identifier un bon rassemblement de pages, un modèle probabilistique du procédé de génération d'indices peut être utilisé.

LandOfFree

Say what you really think

Search LandOfFree.com for Canadian inventors and patents. Rate them and share your experience with other people.

Rating

Method and system for automatically extracting data from web... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and system for automatically extracting data from web..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and system for automatically extracting data from web... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFCA-PAI-O-1510761

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.