Following up on the Adult Cencus Income tutorial for Azure Machine Learning experiment

As I mentioned in one of my previous posts, Azure provides very detailed and robust examples when it comes to Machine Learning.
One of such helpful pieces is a tutorial that would get you started and explain how to use ML Studio. To see it in action, log in to the studio, select experiments, click on the "New" button and trigger the tutorial from the appeared modal.

Creating new experiments, accessing samples and tutorials.
Below, I will cover the tutorial steps and add some personal notes that I found useful as the tutorial is very practical, but lacks explanation.

  1. Add a data set to the experiment
    The left hand side of ML Studio features a panel with basic modules for the experiment. It has search capabilities, which is quite handy, and includes the "Saved Datasets" section from which you can select sample data.
    In real-life scenario you would probably be using a combination of other sections to get the data into the experiment: "Data Input and output", "Data Format Conversion" and "Data Transformation". Unfortunately, not all documentation is finalized at this point and certain pages show you almost nothing and quickly redirect you to the main Azure Reference page.

    As you can see, functionality to get data into the system is much more flexible than in AWS, which, at this moment, only supports S3, Redshift, RDS over MySQL.

    In any case, for a basic example, you can just select an existing data set.
  2. Split the data set to "training" and "testing" blocks.
    This is done in order to not only train, but also verify your model. Hurray, science! The process is a straightforward selection of a "Split Data" Module from the "Data Transformation section" and setting up a split percentage. At this stage you can see the beauty of the GUI in the process. Dragging, dropping, connecting...
  3. Select Machine Learning algorithm and training the model.
    With a little explanation, the tutorial selects the "Two-Class Boosted Decision Tree" algorithm for the model. Again, real-life scenarios would benefit some consideration and reflection on the choice of the algorithm. This is one of the most important parts in the whole process. Some helpful information could be found on the cheat-sheet here and a blog-post here.
    Once the algorithm is selected, it is time to Train the Model. Again, it boils down to connecting the dots from the used modules: Algorithm and data set fragment are the inputs and the trained model is the output. The "Train Model" module needs to have a column selected from the data set. By doing this, you specify the target of the experiment.
  4. Make predictions
    At this point we have our brand new shiny model, but it is not assessed and we have no idea how good it is. Azure provides two additional modules that would allow us to quantitatively analyze our results: "Score Model" and "Evaluate Model". Again, not all the documentation has been finalized at this stage and the site redirects you so it is difficult to read any text on the page, here is a screenshot:

    We just need to connect the modules and run the experiment. After the run is finished, you can visualize the results and assess the scores by right clicking on the modules and selecting the "Visualize" option from the context menu.

    That is it! If the data right and the algorithm is chosen correctly for the task, you can now see the results of your effort and even check how good the result is!
  5. Set up a web service
    At this stage we have done the most important part of the work and now it's good to have it exposed to the world so that the model would actually add some value. Azure allows us to create a web service on top of the model, which is quite convenient. I am not so sure about access restrictions, security, scalability and other concerns, but at least we can test it with our apps and see how viable the idea is. Here is how we do it
    1. Trigger the we service setup process
      In the previous screenshot you can see the "Set up Web service" button. By clicking it you initiate the process. Once this is done, you will see the "Predictive experiment" tab in ML Studio. The system will automatically redirect you to it and some more modules will appear: "Web Service Input" and "Web Service Output". You can also notice the "Trained Model" module that we now have. This is the result of the experiment that we created previously.
    2. Infer the input/output datasets
      Now we need to provide the datasets for the web service to work with. Basically this is a bunch of input parameters and the output data model. Both are pretty much the same with the difference of the column selection. Drag the "Select Columns in Dataset" to the workspace and connect the one you are going to use for input with our sample dataset (now we can select the columns based on the data) and the other one with the "Score Model" module (Here we can select the probabilities)
    3. Run
      After this, all the dots are connected once again and we hit run to get the experiment going. If all is good we should see the green ticks for the new data set modules.
    4. Deploy 
      The web service deployment is as easy as clicking the "Deploy web service" button next to "Run". Once clicked, we will be redirected to the Web Services tab of the ML Studio where you can see the new web service in the list of endpoints.

      The web service can be used in batch or by sending an individual request. If you click the "Test" button, you would be able to see the input form for the api after subitting which, you would see the result of your effort.

Comments

Popular Posts