Machine Learning Private Preview
Join the preview and help us shape the future of machine learning in .NET
Preview
ML.NET is brand new, you can anticipate some thrills along the way.

SigParser uses ML.NET

Customer
SigParser

Products & services
ML.NET
Office 365

Industry
Software / Telecommunications

Organization Size
Small (<100 employees)

Country
USA

For many companies, having CRM systems are integral to storing and managing customer information. However, keeping these systems up to date can be a hassle, and manually adding and maintaining CRM systems or customer databases is often a tedious (and expensive) task. Even with the hundreds of e-mails the average office worker gets a day, only 20% of contacts and their contact details from those e-mails make it into CRM systems.

SigParser was created to automate this task and ensure that contact data is not overlooked. SigParser is an API and service that parses signatures from e-mails, converts them into contacts, and then feeds those contacts into other systems, such as a database or CRM. It can find names, phone numbers, titles, and even social media from single e-mails, e-mail chains, and e-mails from years in the past.

The SigParser application lets you provide a sample email and preview the metadata it is able to determine about the email.

Why ML.NET?

When the team at SigParser decided to utilize Machine Learning, they originally tried using R; however, they found it was very difficult to maintain and integrate with their API, which is built with .NET Core.

Paul Mendoza, CEO and founder of SigParser, said that R "was just too disconnected from the development process. With R we were generating all the constants and then we would copy and paste those into .NET and then try the model out for real and learn it didn't quite work and have to repeat. This was too slow."

Thus, they turned to ML.NET to bring everything into one application.

"With ML.NET, we're able to train the model and then immediately test it inside of our code. This makes shipping new changes faster because all the tooling was together in one place."

Paul Mendoza, CEO and founder SigParser

ML.NET Scenario — Detecting "Non-Human" E-mails

When SigParser processes e-mails for a company, many of the e-mails are non-human (e.g. newsletters, payment notifications, passwords resets, etc.). The sender's information from these types of e-mails should not show up in contact lists or be pushed into a CRM system. Thus, SigParser uses ML.NET to predict if it is a "spammy looking E-mail message." Take the following notification email from a forum as an example. The sender of this isn't a contact that should show up in a CRM:

The sample email comes from a noreply email address and has generated information about unread notifications etc.

SigParser clasifies the sample email as a 'spammy looking E-mail message', using their ML.NET model

ML.NET Training Data

SigParser first used the well-known Enron dataset to train their model, but when they realized that it was quite outdated, they ended up labeling a couple thousand e-mails in their own e-mail accounts (keeping with GDPR compliance) as either human or non-human and used this as a training dataset.

ML.NET Features

SigParser's ML.NET model has two Features (used to make the prediction "IsHumanE-mail"):

  • HasUnsubscribes — True if an e-mail has an "unsubscribe" or "opt out" in the e-mail body
  • EmailBodyCleaned — Normalizes the HTML e-mail body to make the e-mail language agnostic and to remove any personally identifiable information

ML.NET Algorithm

These two Features are inputted into a Binary FastTree algorithm, which is an algorithm for classification scenarios, and the output is the prediction of whether the e-mail was sent from a "real human" or from an automated source. Currently, SigParser is processing millions of e-mails per month with this ML.NET model.


var mlContext = new MLContext();

var(trainData, testData) = mlContext.BinaryClassification.TrainTestSplit(mlContext.CreateStreamingDataView(totalSampleSet), testFraction:0.2);

var pipeline = mlContext.Transforms.Text.FeaturizeText("EmailBodyCleaned", "EmailHTMLFeaturized")
    .Append(mlContext.Transforms.Concatenate("Features", "HasUnsubscribes", "EmailHTMLFeaturized"))
    .Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumn: "IsHumanEmail", featureColumn: "Features"));

Console.WriteLine("Fitting data");
var fitResult = pipeline.Fit(trainData);

Console.WriteLine("Evaluating metrics");
var metrics = mlContext.BinaryClassification.Evaluate(fitResult.Transform(testData), label: "IsHumanEmail");
Console.WriteLine("Accuracy: " + metrics.Accuracy);

using (var stream = File.Create(emailParsingPath + "EmailHTMLTypeClassifier.zip"))
{
    mlContext.Model.Save(fitResult, stream);
}

Impact of ML.NET

The impact of moving to ML.NET from R has been a 10x productivity improvement. Additionally, until SigParser moved to R, they only utilized one machine learning model. Since the conversion to ML.NET, they've now got 6 machine learning models for various aspects of email parsing. This increase has come about because it's now possible with ML.NET to quickly experiment with new machine learning ideas and show the results in the application quickly.

Ready to Get Started?

Our step-by-step tutorial will help you get ML.NET running on your computer.

Supported on Windows, Linux, and macOS

Get Started