DZone

When you compare two Salesforce records, or any other CRM for that matter, side-by-side, you can easily determine whether or not they are duplicates. However, even if you have a small number of records, let’s say less than 100,000, it would be almost impossible to sift through them one by one by one, and perform such a comparison. This is why companies have developed various tools that automate such processes, but, to do a good job, the machines need to be able to recognize all of the similarities and differences between the records. In this article, we will take a closer look at some of the methods used by data scientists to train machine learning systems to identify duplicates. 

How Can Machine Learning Systems Compare and Contrast Records? 

One of the main tools researchers use is string metrics. This is when you take two strings of data and return a that is low if the strings are similar and high if they are different. How does this work in practice? Well, let’s take a look at the two records below: 

Source: DZone