Monday, February 19, 2018

Beginner's guide to Artificial Intelligence

We've seen a lot of hype when it comes to AI. We also know about Sophie - The first robot to become a citizen. Some people have shown concerns about AI being dangerous and some are very excited about the seamless possibilities it brings with it.
Along with this excitement, there are way too many questions also. The amount of information we get when we google AI is overwhelming. So, in today's post, I intend to answer some of those questions and give useful links to readers for them to get started.

Now, there are 3 most important things one needs to remember, to understand AI.
  1. Lots of data is required.
  2. Artificial Neural Networks are the way to go.
  3. Training will take a lot of time.
The basic concept of AI is that it "learns" on its own using examples and you don't need to "hard wire" it to do something specific when a task is given. 

Let's take an example:
This is for EC students to understand AI -  
If you have to code a simple logic gate like NAND, what you would do is write down its truth table and then create the respective variables and then assign them using a nested if-else or a switch case and your work is done! 
Suppose you try bringing in AI here, the scenario changes -
  • Here, you can create a model, and train it by "teaching" what NAND gate does. 
  • Later, the model can be trained to get outputs for other logic gates as well, by making it learn by example. 
  • When it is training, it'll give ridiculous outputs and get "rewarded" or "punished" accordingly. (Covered later in the post)
  • Based on how it is rewarded, it improves its own outputs and predicts accurately over time.

So the next time you said, 'what is the output of A OR B?', it'll still be able to generate it accurately even when you thought you created a model for the NAND gate. Really, you made a general logic gates model and the AI learnt by example on how to give the output for that OR gate!
Now, you might think that all of this sounds great, but how do I actually implement it?
By using Artificial Neural Networks!

Now let me introduce you to ANNs:
Credits:Medium

The concept of ANNs comes from a mathematical object, 'Perceptron'. If I combine multiple such math elements together, in some order, I get a neural network! (I won't be jumping into the math of ANN as I have saved that for another post)
Generally, a neural network consists of these perceptrons in an input layer, one or many hidden layers and an output layer.
If I go back to the logic gates example, then since I have 2 inputs, I'll have 2 perceptrons in my input layer. Since there are 4 combinations, I'll have 4 perceptrons in my hidden layer. Since I get a single output, I'll have just one perceptron in my output layer. Within the hidden layer is where 'magic' happens.
Credits: http://ataspinar.com/2016/12/22/the-perceptron/
Basically, a perceptron always has two inputs. One of them is the normal input and the other one is weight. This weight input is the one that can be changed to vary the perceptron's output. The weight is the reason ANNs can 'learn'. 
Basically, learning involves changing these weights according to some criteria so that the output of the perceptron changes. This criteria is decided by "rewards" and "punishments".

Suppose I'm teaching my perceptron and I know the output should be 1. Initially the weights will be at random. Suppose the perceptron does give me a 1, then I 'reward' it. 
Rewarding means increasing the weight so that the output remains what it was (in this case, 1).
Suppose I get a 0 then I 'punish' it. 
Punishing means decreasing the weight so I don't get the output that I got now (in this case, 0).
Now the question arises, how to reward or punish the network?
Here comes the 'loss function' which is nothing but the difference between the output and input, which has to be minimized.
So how do I minimize this loss function?
The simplest way to do that is through 'gradient descent', which means that you minimize this loss in small steps, till you get a zero loss. My next post will contain more detail on this.

So the story so far is that the network learns using 3 things - changing weights, evaluating the loss function and by minimizing loss(gradient descent).
You can use ANNs here to distinguish several images. There are images available online that are called 'Datasets' which can be used to make your network learn. 

Now that you know how the network learns, it's time to dive in  & learn how this will help in a real time application. 

Let's take a look at the generalized steps:
  • First, you consider any algorithm of your choice and run these datasets along with known output, through your network.
  • The network 'trains' itself from the data you just gave it. This step takes a lot of time.
  • At the end of the training, the network would have adjusted its weights and other parameters(called hyper parameters) to get the right output on this large dataset.
  • Now, you test it - You send one image in and see the efficiency of your network. If it's perfect, yay! If it isn't, then you change the algorithm and try again.
  • After you've got your optimized algorithm & trained your network, you can generate a json or xml file of the trained network's hyper parameters and then deploy them wherever you like and it'll work like magic!

So, this is how ANNs can be used to do things without explicitly coding them. This is the first step towards a generalized AI, where a lot more goes in.

These steps are actually of a learning process called 'supervised learning' where you know the real world output and you're just training the network to tell you that. There are 2 other learning formats - semi supervised learning and unsupervised learning. I'll cover these two in a later post.

That's it for today's post. Stick around to know the math behind all this, which will be the topic for my next post. I'll also put up useful links there and several helpful videos.