ML.NET Tutorial - Get started in 10 minutes

Download and add data

Download the Sentiment Labelled Sentences datasets from the UCI Machine Learning Repository. Unzip sentiment labelled and save the yelp_labelled.txt file to the myMLApp directory.

Each row in yelp_labelled.txt represents a different review of a restaurant left by a user on Yelp. The first column represents the comment left by the user, and the second column represents the sentiment of the text (0 is negative, 1 is positive). The columns are separated by tabs, and the dataset has no header. The data looks like the following:

Wow... Loved this place.	        1
Crust is not good.	        0
Not tasty and the texture was just nasty.	        0

Add data

In Model Builder, you can add data from a local file or connect to a SQL Server database. In this case, you'll add yelp_labelled.txt from a file.

  1. Select File as the input data source type.

  2. Browse for yelp_labelled.txt. Once you select your dataset, a preview of your data appears in the Data Preview section. Since your dataset does not have a header, headers are auto-generated ("col0" and "col1").

  3. Under Column to predict (Label), select "col1." The Label is what you're predicting, which in this case is the sentiment found in the second column ("col1") of the dataset.

  4. The columns that are used to help predict the Label are called Features. All of the columns in the dataset besides the Label are automatically selected as Features. In this case, the review comment column ("col0") is the Feature column. You can update the Feature columns and modify other data loading options in Advanced data options, but it is not necessary for this example.

Model Builder Data step

After adding your data, go to the Train step.