
Customer
SigParser
Products & services
ML.NET
Office 365
Industry
Software / Telecommunications
Organization Size
Small (<100 employees)
Country
USA
SigParser is an API and service that automates the tedious (and often expensive) process of adding to and maintaining customer relationship management (CRM) systems. SigParser extracts contact information, such as names, e-mail addresses, and phone numbers, from e-mail signatures and feeds all that information as contacts into CRM systems or databases.

When SigParser processes e-mails for a company, many of the e-mails are non-human (for example, newsletters, payment notifications, passwords resets, and so on). The sender's information from these types of e-mails should not show up in contact lists or be pushed into a CRM system. Thus, SigParser decided to use machine learning to predict if e-mail messages are "spammy looking."
Take the following notification e-mail from a forum as an example. The sender of this e-mail isn't a contact that should show up in a CRM, so a machine learning model predicts that "isSpammyLookingEmailMessage" is true:


When the team at SigParser decided to utilize Machine Learning, they originally tried using R; however, they found it was very difficult to maintain and integrate with their API, which is built with .NET Core.
Paul Mendoza, CEO and founder of SigParser, said that R "was just too disconnected from the development process. With R we were generating all the constants and then we would copy and paste those into .NET and then try the model out for real and learn it didn't quite work and have to repeat. This was too slow."
Thus, they turned to ML.NET to bring everything into one application.
"With ML.NET, we're able to train the model and then immediately test it inside of our code. This makes shipping new changes faster because all the tooling was together in one place."
The impact of moving to ML.NET from R has been a 10x productivity improvement. Additionally, until SigParser moved to R, they only utilized one machine learning model. Since the conversion to ML.NET, they've now got 6 machine learning models for various aspects of email parsing. This increase has come about because it's now possible with ML.NET to quickly experiment with new machine learning ideas and show the results in the application quickly.
SigParser first used the well-known Enron dataset to train their model, but when they realized that it was quite outdated, they ended up labeling a couple thousand e-mails in their own e-mail accounts (keeping with GDPR compliance) as either human or non-human and used this as a training dataset.
SigParser's ML.NET model has two Features (used to make the prediction "IsHumanE-mail"):
HasUnsubscribes — True if an e-mail has an "unsubscribe" or "opt out" in the e-mail bodyEmailBodyCleaned — Normalizes the HTML e-mail body to make the e-mail language agnostic and to remove any personally identifiable informationThese two Features are inputted into a Binary FastTree algorithm, which is an algorithm for classification scenarios, and the output is the prediction of whether the e-mail was sent from a "real human" or from an automated source. Currently, SigParser is processing millions of e-mails per month with this ML.NET model.
var mlContext = new MLContext();
var(trainData, testData) = mlContext.BinaryClassification.TrainTestSplit(mlContext.CreateStreamingDataView(totalSampleSet), testFraction:0.2);
var pipeline = mlContext.Transforms.Text.FeaturizeText("EmailBodyCleaned", "EmailHTMLFeaturized")
.Append(mlContext.Transforms.Concatenate("Features", "HasUnsubscribes", "EmailHTMLFeaturized"))
.Append(mlContext.BinaryClassification.Trainers.FastTree(labelColumn: "IsHumanEmail", featureColumn: "Features"));
Console.WriteLine("Fitting data");
var fitResult = pipeline.Fit(trainData);
Console.WriteLine("Evaluating metrics");
var metrics = mlContext.BinaryClassification.Evaluate(fitResult.Transform(testData), label: "IsHumanEmail");
Console.WriteLine("Accuracy: " + metrics.Accuracy);
using (var stream = File.Create(emailParsingPath + "EmailHTMLTypeClassifier.zip"))
{
mlContext.Model.Save(fitResult, stream);
}
SigParser uses ML.NET's data transformations and algorithms for multiple machine learning solutions, including the spam detection model mentioned above, which has enabled them to automatically export the correct contact information to customer databases from e-mail signatures, bypassing the need for time-consuming and error-prone manual contact data entry.
Our step-by-step tutorial will help you get ML.NET running on your computer.
Supported on Windows, Linux, and macOS