Detecting duplicate and near-duplicate files

G - Physics – 06 – F

Patent

Rate now

[ 0.00 ] – not rated yet Voters 0 Comments 0

Details Detecting duplicate and near-duplicate files Detecting duplicate and near-duplicate files

: G - Physics
: 06
: F

: G06F 17/00 (2006.01) G06F 17/30 (2006.01)
: Patent
: CA 2660202
: Near duplicate documents may be identified by processing an accepted set of documents to determine a first set of near duplicate documents using a first technique, and processing the first set to determine a second set of near duplicate documents using a second technique. The first technique might be token order dependent, and the second technique might be order independent. The first technique might be token frequency independent, and the second technique might be frequency dependent. The first technique might determine whether two documents are near duplicates using representations based on a subset of the words or tokens of the documents, and the second technique might determine whether two documents are near duplicates using representations based on all of the words or tokens of the documents. The first technique might use set intersection to determine whether or not documents are near duplicates, and the second technique might use random projections to determine whether or not documents are near duplicates.

Des documents quasi-identiques peuvent être identifiés par, d'une part, le traitement d'un ensemble accepté de documents afin de déterminer un premier ensemble de documents quasi-identiques au moyen d'une première technique, et, d'autre part, le traitement du premier ensemble pour déterminer un deuxième ensemble de documents quasi-identiques au moyen d'une deuxième technique. La première technique peut dépendre d'un ordre de marque tandis que la deuxième technique peut être indépendante d'un ordre. La première technique peut être indépendante d'une fréquence de marque tandis que la deuxième technique peut être dépendante d'une fréquence. La première technique peut déterminer si deux documents sont quasi-identiques en utilisant des représentations basées sur un sous-ensemble des mots ou marques des documents tandis que la deuxième technique peut déterminer si deux documents sont quasi-identiques en utilisant des représentations basées sur tous les mots ou marques des documents. La première technique peut utiliser une intersection d'ensembles pour déterminer si des documents sont quasi-identiques ou non, tandis que la deuxième technique peut utiliser des projections aléatoires pour déterminer si des documents sont quasi-identiques ou non.

Affiliated with

Henzinger Monika H.

G - Physics – 06 – F

Inventor

[ 0.00 ] – not rated yet Voters 0 Comments 0

Also associated with

Google Inc.

G - Physics – 06 – F

Owner

[ 0.00 ] – not rated yet Voters 0 Comments 0

Kirby Eades Gale Baker

H - Electricity – 04 – J

Agent

[ 0.00 ] – not rated yet Voters 0 Comments 0

LandOfFree

Say what you really think

Search LandOfFree.com for Canadian inventors and patents. Rate them and share your experience with other people.

Rating

Detecting duplicate and near-duplicate files does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with Detecting duplicate and near-duplicate files, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Detecting duplicate and near-duplicate files will most certainly appreciate the feedback.

Rate now

Comments { 0 }

Profile ID: LFCA-PAI-O-1519994

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.

Canada

Charities
Companies
MP Candidates
Patents
Employee Salary Disclosure

World

Places of the World
Scientific Papers

United States

Banks
Companies
Counties
Patents
Employee Salary Disclosure