Each row in
yelp_labelled.txt represents a different review of a restaurant left by a user on Yelp. The first column represents the comment left by the user, and the second column represents the sentiment of the text (0 is negative, 1 is positive). The columns are separated by tabs, and the dataset has no header. The data looks like the following:
Wow... Loved this place. 1 Crust is not good. 0 Not tasty and the texture was just nasty. 0
In Model Builder, you can add data from a local file or connect to a SQL Server database. In this case, you'll add
yelp_labelled.txt from a file.
Select File as the input data source type.
yelp_labelled.txt. Once you select your dataset, a preview of your data appears in the Data Preview section. Since your dataset does not have a header, headers are auto-generated ("col0" and "col1").
Under Column to predict (Label), select "col1." The Label is what you're predicting, which in this case is the sentiment found in the second column ("col1") of the dataset.
The columns that are used to help predict the Label are called Features. All of the columns in the dataset besides the Label are automatically selected as Features. In this case, the review comment column ("col0") is the Feature column. You can update the Feature columns and modify other data loading options in Advanced data options, but it is not necessary for this example.
After adding your data, go to the Train step.