SigParser uses ML.NET to detect "non-human" emails
Customer
SigParser
Products & services
ML.NET
Office 365
Industry
Software / Telecommunications
Organization Size
Small (<100 employees)
Country/region
USA
SigParser is an API and service that automates the tedious (and often expensive) process of adding to and maintaining customer relationship management (CRM) systems. SigParser extracts contact information, such as names, email addresses, and phone numbers, from email signatures and feeds all that information as contacts into CRM systems or databases.
Business problem
When SigParser processes emails for a company, many of the emails are non-human (for example, newsletters, payment notifications, passwords resets, and so on). The sender's information from these types of emails should not show up in contact lists or be pushed into a CRM system. Thus, SigParser decided to use machine learning to predict if email messages are "spammy looking."
Take the following notification email from a forum as an example. The sender of this email isn't a contact that should show up in a CRM, so a machine learning model predicts that "isSpammyLookingEmailMessage" is true:
Why ML.NET?
When the team at SigParser decided to utilize machine learning, they originally tried using R; however, they found it was very difficult to maintain and integrate with their API, which is built with .NET Core.
Paul Mendoza, CEO and founder of SigParser, said that R "was just too disconnected from the development process. With R, we were generating all the constants and then we would copy and paste those into .NET and then try the model out for real and learn it didn't quite work and have to repeat. This was too slow."
Thus, they turned to ML.NET to bring everything into one application.
With ML.NET, we're able to train the model and then immediately test it inside of our code. This makes shipping new changes faster because all the tooling was together in one place."
Impact of ML.NET
The impact of moving to ML.NET from R has been a 10x productivity improvement. Additionally, until SigParser moved to R, they only utilized one machine learning model. Since the conversion to ML.NET, they've now got 6 machine learning models for various aspects of email parsing. This increase has come about because it's now possible with ML.NET to quickly experiment with new machine learning ideas and show the results in the application quickly.
Solution architecture
Data processing
SigParser first used the well-known Enron dataset to train their model, but when they realized that it was quite outdated, they ended up labeling a couple thousand emails in their own email accounts (keeping with GDPR compliance) as either human or non-human and used this as a training dataset.
Machine learning features
SigParser's ML.NET model has two Features (used to make the prediction "IsHumanE-mail"):
HasUnsubscribes
—True if an email has an "unsubscribe" or "opt out" in the email bodyEmailBodyCleaned
—Normalizes the HTML email body to make the email language agnostic and to remove any personally identifiable information
Machine learning algorithm
These two Features are inputted into a Binary FastTree algorithm, which is an algorithm for classification scenarios, and the output is the prediction of whether the email was sent from a "real human" or from an automated source. Currently, SigParser is processing millions of emails per month with this ML.NET model.
var mlContext = new MLContext();
var(trainData, testData) = mlContext.BinaryClassification.TrainTestSplit(mlContext.CreateStreamingDataView(totalSampleSet), testFraction:0.2);
var pipeline = mlContext.Transforms.Text.FeaturizeText("EmailBodyCleaned", "EmailHTMLFeaturized")
.Append(mlContext.Transforms.Concatenate("Features", "HasUnsubscribes", "EmailHTMLFeaturized"))
.Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumn: "IsHumanEmail", featureColumn: "Features"));
Console.WriteLine("Fitting data");
var fitResult = pipeline.Fit(trainData);
Console.WriteLine("Evaluating metrics");
var metrics = mlContext.BinaryClassification.Evaluate(fitResult.Transform(testData), label: "IsHumanEmail");
Console.WriteLine("Accuracy: " + metrics.Accuracy);
using (var stream = File.Create(emailParsingPath + "EmailHTMLTypeClassifier.zip"))
{
mlContext.Model.Save(fitResult, stream);
}
SigParser uses ML.NET's data transformations and algorithms for multiple machine learning solutions, including the spam detection model mentioned above, which has enabled them to automatically export the correct contact information to customer databases from email signatures, bypassing the need for time-consuming and error-prone manual contact data entry.
Ready to get started?
Our step-by-step tutorial will help you get ML.NET running on your computer.