Azure Machine Learning - introduction

February 12, 2017

Azure Machine Learning - introduction

There is a number of competing Machine Learning systems out there and some of the most notorious ones are created by cloud service providers such as AWS or Azure.

Later, I will write a post comparing some of them, but today i would like to have a look at the Azure Machine Learning solution and learn what Microsoft has to offer on this front.

First thing I noticed is how much of documentation, tutorials and examples is available for Azure Machine Learning so it should be easy to navigate through the unknown territory. It is easy to begin by just hitting this url.

The Documentation link on the page brings you straight away to the concepts you would need to know to get going. There are even introductory videos about data science and how you would approach your data to make it usable for Azure ML. The videos are a bit simplistic, but can still be find useful so let's start with those and add some knowledge to our heads by analyzing some of them.

Machine Learning and required algorithms

Firstly, it is important to know what algorithms are used by the service in order to help you find the answers to data-related questions.

This is the gist:

Classification algorithms - these help you separate the data into categories and understand to which category should all future items come to.
There are boolean or multiclass options available. The first one is a straightforward classification, while the second has a stronger dependency on the data groups. The higher the variety, the better the prediction.
Anomaly Detection algorithms - this group of algorithms assesses deviation of a data entry point from the norm. This helps the user to understand if behavior is unexpectedly different or just weird.
Regression algorithms - these ones provide numerical predictions on the data and help to answer the questions.
Clustering algorithms - really helpful to find patterns in the data set, such algorithms are handy when you are trying to understand what are some common aspects in the data.
Reinforcement learning algorithms - this group provides an actionable answer based on the data. The algorithms learn from trial and error via positive and negative encouragement received after the action is taken.

All these categories can me divided into three broader groups:

Supervised algorithms that benefit from data sets with labeled entries and make predictions for new data based on the trained model. Classification, Anomaly Detection and Regression algorithms fall into this group.
Unsupervised algorithms for which the used data has no labels and the goal is to categorize or analyze the data structure. The only example for this one is the Clustering algorithm.
Reinforcement algorithm category. This group has been described above, but due to the action triggering nature and ability to receive feedback this algorithm is separate to the other ones.

My main interest at the moment lies in the supervised group so I will research them in more detail further in one of the next posts.

Data preparation

Secondly, to have a correctly working system, we must train it with a large set of high quality data. One might argue what is the right size and how clear the data should
be, but it all boils down to collecting data that is
- Relevant to the posed question.
- With as little gaps among data entries as possible.

Asking the right question

After watching the video in this section, a question I asked myself is what exactly do I want to know? A question about the same topic can be asked differently and a different algorithm would be used to find the answer to it.
Example:
- Is this request fraudulent? The answer can be a yes or no and classification algorithm would be best for it.
- What are the odds of this request to be fraudulent? This is more of a regression algorithm domain, provided that you can assess a risk in a numerical representation.
- How weird is this request? Anomaly detection will find the answer here.

I think I would have to play with all three wordings for this and see which one makes more sense.

Azure Machine Learning Gallery
Azure Machine Learning Gallery, also known as Cortana Intelligence Gallery, is a repository that lists various experiments created by the Azure community participants. These experiments can be copied to your personal project space where you can see how the data was manipulated with, what algorithms were selected and how the system was trained. I think this is a great starting point. Besides, looking at patterns such as price growth predictions or restaurant ratings has a certain level of entertainment on its own. Another note: after the last US elections there is a new "Fake news" trend and you can find something similar in the gallery to see whether a piece of news can be considered as "fake" or not. I wonder if Facebook is using something similar for their content management.
Further example: this experiment illustrates the usage of Azure ML models for letter recognition, a very interesting read. You can also copy it and play with the data yourself.

Search This Blog

Code Green

Azure Machine Learning - introduction

Comments

Post a Comment

Popular Posts

A small bash alias trick

Following up on the Adult Cencus Income tutorial for Azure Machine Learning experiment