Psoriasis Treat to Target

Psoriasis is a chronic inflammatory disease and requires long term management due to which patients become non-adherent to the therapy. Hence recent study aims to assess the impact of clinical…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




From Zero to a first Classification Model with Fast.ai

The goal of this article is to provide a fast first introduction to fast.ai and how to quickly create a first model.

I did it the old way, downloading the files and exporting them to Drive, before noticing that you can directly import data from Kaggle to Google Colab… Will do better next time.

For now, I just created the Jupyter file and the folder with the data in the same subfolder.

Let’s first define the place I can find the data:

Checking the data structure, notice that here all pictures are classified by labels.

We are normalizing the data in order to prepare the NN to receive data that it can actually read as efficiently as possible.

Now that we have all the data loaded, let’s make sure everything is good:

Now is the part I’m not really proud of but I hope this may help others not to reproduce the same mistake.

I started the learning process on 4 epoch (an epoch is a cycle where all the data are checked once):

89% accuracy with the default setup.

Unsurprisingly, this took me way too long to process: 4.5 min. Indeed, I started to process 4.000+ pictures. Four times. Good results tho’.

So before trying anything else, I created a smaller data set: ~400 pictures.

We could surely target a better LR, between 5e-3 and 2e-2, at the end of the slope.

Let’s train the model on that 400 pictures with the following lr:

88% accuracy in 1 min time (for twice the original epochs).

Let’s analyze again after unfreezing the model, the learning rate/loss relation now. This will help us get the best from our CNN, while minimizing the loss.

After plotting we can define a better LR: from 2e-4 to 1e-6.

The result is good enough for me. We got 3.2 improvements from the original model and ending with 91.2% accuracy.

Now that I’m happy with the results, I can run the same setup on the full data, for more epochs.

94.3% accuracy. Not bad.

94.3% accuracy. Could not hope for better for a first try.

It’s always interesting to check where were the biggest mistakes and see if it’s an honest mistake or if there is a real problem in the model.

Let’s first analyze the overall error.

Confusion Matrix

We can see that the model has issues classifying the roses and tulips. This doesn’t surprise me much, as I’ve issues distinguishing them before fully bloomed. All other ratio sounds reasonably good.

Let’s now check the highest misclassification.

I’ve no idea what the fifth picture is and the painting on the bottom left one is of course nothing easy to classify.

We can see 3 tulip/rose issues, as noticed on the confusion matrix. The other pictures are either too zoomed out or zoomed in during early bloom, which makes them hard to analyze. It’s all good.

It is over for today :-)

The next article will be about Learning Rate and what it really is.

If you have any feedback either to improve this or to ask a question, feel free to ask!

Thanks also to Rémi for his article, that helped me shaping mine.

Add a comment

Related posts:

Pair Programming

If you have some familiarity or experience with software development whether, in a professional or academic capacity, you are likely aware of the term “pair programming.” Pair programming is a…

Certificate Programs in Information Technology

An IT Certification Program (IT) provides students with a comprehensive understanding of operating systems, information systems, programming, software, and hardware. Learn more about program details…

Beautiful Boy

I watched Beautiful Boy yesterday evening. The title is lifted from the words of the father of the ‘Beautiful Boy. The film is based on David Sheff’s bestselling memoir “Beautiful Boy: A Father’s…