Databases can generate a list of identical data records within a matter of moments. They do this by creating indices. These tree-like data structures can find a particular record even in very large databases with just a few access operations. Searching for absolute duplicates, in other words completely identical data records, is therefore a very simple task.
In contrast, finding Similar records, e.g. addresses containing minor spelling mistakes, reversals, missing letters etc. is a major problem for a computer. While a person will see at first glance that two data records are similar, the term similar is extremely difficult to express in computer rules (algorithms).
The other side of the coin is that a person will find it impossible to pick out duplicate records from a pool of just a few hundred data records. Duplicates typically account for at least 1% to 3% of each database that we come across — even the best maintained.
These duplicate records are a major source of increased costs when performing tasks such as sending out catalogs and also cause serious problems in terms of accountancy, support, controlling etc. Performing a Fuzzy duplicate search is of particular importance if you are amalgamating data, e.g. following purchase of new addresses.
Features:
* Fast fuzzy duplicate search in many data sources
* Fuzzy merge of two lists
* Fuzzy match with external list
* Complete migration to DotNet 2.0/C#. This makes the program fit for the future, stable and provides automatic 64-bit functionality under a 64-bit operating system.
* Enhanced algorithms, in particular our new pattern matching Algorithm (see above)
* Uses considerably less memory
* User-defined normalization rules
* Full Unicode support. This GeneRally allows fuzzy duplicate searches to be performed in Unicode languages
* Improved user interface
* Direct deletion from MS Outlook and Windows address book
* Many other refinements