Download the Wikipedia detox dataset and save it as
wikipedia-detox-250-line-data.tsv in the
myMLApp directory you created.
Each row in
wikipedia-detox-250-line-data.tsv represents a different review left by a user on Wikipedia. The first column represents the sentiment of the text (0 is non-toxic, 1 is toxic), and the second column represents the comment left by the user. The columns are separated by tabs. The data looks like the following:
Sentiment SentimentText 1 ==RUDE== Dude, you are rude upload that carl picture back, or else. 1 == OK! == IM GOING TO VANDALIZE WILD ONES WIKI THEN!!! 0 I hope this helps.
In Model Builder, you can add data from a local file or connect to a SQL Server database. In this case, you'll add
wikipedia-detox-250-line-data.tsv from a file.
Select File as the input data source in the drop-down, and in Select a file find and select
Under Column to predict (Label), select "Sentiment." The Label is what you're predicting, which in this case is the Sentiment found in the first column of the dataset.
The columns that are used to help predict the Label are called Features. In this case, the review comment is the Feature, so leave "SentimentText" checked as the Input Column (Feature).
After adding your data, go to the Train step.