machine learning

Using the ML.NET CLI tool or the new Model Builder extension for Visual Studio to automate model generation and training

ML.Net is the open source machine learning framework, created by Microsoft, for the cross-platform .NET developer platform. Regardless of your experience with machine learning or the ML.NET framework itself, this post will hopefully have something for you.

ML.Net can start to get a little more difficult once you venture outside of the samples and demo applications provided. Deciding which model is best suited to your data, training you model to an accuracy you are happy with or getting some sample code up and running to start testing out some predictions can also be time consuming.

Enter ML.NET Model Builder and CLI tools that help speed up that process by trying to find the best model for a given task, for example finding the best model to use for a regression or classification task.

In this post we will explore using both the CLI tool and the new Model Builder extension for Visual Studio, including how to get everything installed and how to automate suggestion of a model to use and also scaffolding of some initial code to start testing predictions against your dataset.

I'm going to use a dataset from my local governments open data website, the dataset in particular is for Northern Ireland Popluation by Age and Gender, which you can find here

If you wish to use another dataset, you can find a huge selection on the Awesome Public Datasets GitHub page here

Download the dataset in CSV format then we can copy into our project folder once created.

First up lets get started with the CLI tool, we will go through a similar process for the Model Builder later in this post. From your command line, install the tool with the following command

dotnet tool install -g mlnet

Once created create a folder for your machine learning project

mkdir PopulationML
cd PopulationML

Now copy the CSV file you downloaded into your newly created project folder. If my case running the following command copies from my Downloads folder to my new project folder.

mv ..\..\Downloads\northern-ireland-by-single-year-of-age-and-gender-mid-1971-to-mid-2018.csv .

We are now ready to run our ML.NET CLI tool against our data, we are going to try to predict the 'population_estimate' column in the CSV file so will provide that as our label name

mlnet auto-train --task regression --dataset .\northern-ireland-by-single-year-of-age-and-gender-mid-1971-to-mid-2018.csv -n Population_Estimate

Once you run this command the tool will experiment with each of the algorithms available for the selected task, in our case Regression. This process may take a while, in my case it took approx 30mins to complete.

Once complete you will be prompted with a summary of the experiment results along with a trained model ready for use along with some example code for testing the model. You can see from the results below FastTreeTweedieRegression was the most optimal algorithm for our dataset

RegressionResults

Within your project folder there will now be a SampleRegression folder containing your newly created model and sample code. Open that folder in your editor of choice, in my case VSCode (you can do that from the command line using the following)

cd SampleRegression
code .

You will now see two projects, one containing your model and a console application you can run to test the model.

If you run the console application you should see a console output showing the prediction for the first row of data in your dataset.

SinglePrediction

To see a random prediction rather than the first every time, open the Program.cs file and replace the following code

 ModelInput sampleForPrediction = mlContext.Data.CreateEnumerable<ModelInput>(dataView, false).First();

with

var random = new Random();
var list = mlContext.Data.CreateEnumerable<ModelInput>(dataView, false);
int index = random.Next(list.Count());
ModelInput sampleForPrediction = list.ElementAt(index);

When you run the console application again it should now return a random prediction from our test dataset. You could even go as far as printing out 100 random results replacing the single call under the main method with a loop and tidy the output to include some additional model data.

Replace the following

    // Create sample data to do a single prediction with it 
    ModelInput sampleData = CreateSingleDataSample(mlContext, DATA_FILEPATH);

    // Try a single prediction
    ModelOutput predictionResult = predEngine.Predict(sampleData);

    Console.WriteLine($"Single Prediction --> Actual value: {sampleData.Population_Estimate} | Predicted value: {predictionResult.Score}");

with

 for (int i = 0; i < 100; i++)
 {
    // Create sample data to do a single prediction with it 
    ModelInput sampleData = CreateSingleDataSample(mlContext, DATA_FILEPATH);

    // Try a single prediction
    ModelOutput predictionResult = predEngine.Predict(sampleData);

    Console.WriteLine($"Predicted population of {string.Format("{0:n0}", predictionResult.Score)} in {sampleData.Mid_Year_Ending} for {sampleData.Gender} aged {sampleData.Age} (actual {string.Format("{0:n0}", sampleData.Population_Estimate)})");
 }

Running again now will give us a bigger sample of predictions

Multiple2

Now lets try the new Model Builder extension for Visual Studio to build and train our model and sample code.

First download the extension from here and get it installed.

Next launch visual studio and create a new C# Console App (.NET Core)

Once created you can add the machine learning functionality by right clicking on the project in Solution Explorer then Add > Machine Learning

Add_ML

Next select 'Custom Scenario'

CustomScenario

And finally we can browse to our CSV file downloaded earlier in this post (you can either reference from the other project folder or copy the file into this project folder). Once you add your CSV file it will populate a preview of your data along with asking you to select a column to predict. In this case we want to use 'Population_Estimate'

SelectFile

Click train at the bottom to progress and on the next screen, select 'regression' as the machine learning task and finally click 'Start Training'. On my machine the default 'Time to train' of 10 seconds worked ok but you may increase this, say to 30 seconds if you the total models explored is too low.

Train2

Once this step has completed, click 'Evaluate' at the bottom and it will give you a summary of the training, including the models that were evaluated and the accuracy of each. In this case 'LightGbmRegression' was the best model for our data.

evaluate

We are now ready to generate our sample code projects, click 'Code' at the bottom of the page then 'Add projects'. You should now be prompted with next steps and your sample code and console app should be available in the Solution Explorer

Code

To test your model, set the ConsoleApp project to be the startup project in Visual Studio and hit F5. It should then launch your console application and return the prediction for the first row from the sample dataset.

If however you get some errors due to duplicate Assembly attributes, you can resolve this by setting GenerateAssemblyInfo to false in your csproj files, for example

  <PropertyGroup>
    <TargetFramework>netstandard2.0</TargetFramework>
    <GenerateAssemblyInfo>false</GenerateAssemblyInfo>
  </PropertyGroup>

Your output should be similar to the following

Finally you can view the code generated to train your model under the console app project in the 'ModelBuilder.cs' file

ExplorerCode

As you can see using the CLI tools along with the new Model Builder tool for Visual Studio you can easily determine the best algorithm to use for your dataset, train an initial model and get to the point of testing that model or integrating into your application very quickly.

Download completed sample application from my GitHub

You can also find out more about ML.NET from here