AI Explorations: Exploring Data, Statistics, Machine Learning and AI

Finishing up the Columbia University + Emeritus Post Graduate Diploma in ML and AI

Published by

on

I have had a very interesting nine months or so deepening my fundamental skills and learning new skills in AI and Machine Learning. With the Coronavirus crisis and the associated disruption to all our lives across the world in a matter of mere months, it seems like the world has changed overnight. It is now more than ever that skill development and improvement can sustain us all through tough times and unforeseen challenges.

In this context, the PGDMLAI program at Columbia, in retrospect, was a well crafted program that pushed my boundaries. It developed true new skill and changed the way I think about AI and ML problem statements, despite being an active industry professional for several years in the AI and ML space.

On the AI side, most of my new explorations have been in the realms of advanced deep learning and reinforcement learning. In the last month or two, for example, I’ve explored FaceNet, MTCNNs, GANs and DeepRL techniques. In this context, learning about search techniques, Markov decision processes, CSPs and reinforcement learning techniques (policy and value iteration methods) in this program, was particularly rewarding.

Learning Experience, Faculty and Assignments

The course content is highly mathematically detailed and well paced. There is an emphasis on the core ideas of each algorithm you learn, and the assumptions you make in each case. Interesting concepts and sub-problems such as transformations for monotonic functions, two-class problems in SVMs, the maximum likelihood principle, Bellman’s equation, etc., are discussed in context, and additional resources there gave me an opportunity to dig into this content too when I had gotten done with the course content or assignments.

As for pre-requisites, it goes without saying that even for me as an industry professional in AI and ML who’s involved on a regular basis in the development of AI and ML code and solutions, the depth of ideas presented is advanced – it is a graduate level course. I needed some study pre-lecture and revision and revisiting ideas post-lecture to understand certain concepts correctly. At times, I’d feel out of depth and would need to revisit course videos several times and sometimes even pre-requisites. In my case, I went back quite a bit to linear algebra lessons from MIT OCW (Gilbert Strang’s lectures) and elsewhere. I also went back to a lot of Python programming on DataCamp, which all students in the course had access to.

In addition to the core topics within AI and ML, several interesting topics in CSPs and Reinforcement Learning were also discussed. Cryptarithmetic puzzles that are solved by graph search techniques like backtracking search, and the Sudoku solvers stand out. Also, for someone like me without a formal background in computer science, the initial lectures on graph search algorithms were gold, they were essential to understand the deeper ideas within AI eventually.

A related thread from my Twitter page:

I want to take a moment to appreciate the incredible faculty and staff for this course:

  1. Prof. David Yakobovitch – the course leader for the program who shared invaluable knowledge in all the webinars
  2. Prof. Ansaf Salleb Al-Aoussi – the AI expert who taught various search and ML techniques lucidly and clearly
  3. Prof. Jacob Koehler – all the excellent office hours sessions that were incredibly helpful in clarifying hard problems in implementing solutions we’d learnt conceptually
  4. Prof. John Paisley – who provided lucid and mathematically comprehensive lectures in machine learning

All above staff (and many others) routinely engaged with many of us students on the course forums, and I am sure I’ve made a few friends among the students during the course of this program.

I also want to thank the Emeritus and Columbia team for making DataCamp’s data science and ML courses available to all students. This definitely helped me in the course of the learning experience.

As someone who has been coding up ML algorithms in Python for years, some of the content wasn’t new to me, but that didn’t keep the assignments from being challenging or interesting. A whole lot of the program, especially the pieces around reinforcement learning, search and also many ML algorithms (KNNs, K-Means, Linear and Logistic regression, regularized regression methods, and more) from scratch. During this program, I picked up some more Tensorflow and Keras than I already knew, and also picked up PyTorch!

Time (and Energy) Management 

The course took many weekend sessions and evenings over the last 9 or so months to complete. I would eagerly await some of the break weeks sometimes to keep pace – and the weekly exercises were comprehensive and not just coding assignments at one level – they were full fledged problem solving opportunities. 

Being in a full time job, and managing responsibilities at home besides learning ML and AI requires good time management at some level. I definitely could have done some things better, looking back – after all, nobody is perfect! If I were to list three things I could have done better they’d be: a) sticking with the assignment problems everyday and trying different approaches, b) writing my own sub-problems to solve the assignment problem at hand, and c) Setting aside time to read and replicate papers related to the topics at hand.

Capstone Project Experience

The final part of the program was a capstone assignment, featuring some particularly dirty and complex data. The assignment emphasized the importance of extracting business value from data. This aspect of data science is unchanged over years – it is as it was in 2015 when I first took up data science. Value from data is where the rubber hits the road for organizations adopting data science and ML/AI. In the last nine or so months at work, while on this program, I’ve helped build a serverless data lake on the cloud, I’ve helped solve many large scale machine learning problems for telecommunications networks – and in all these cases, as in the capstone, it came back to “How do we tie the results from our analysis to the business decisions to be taken?”

In this sense, the Capstone project was interesting and challenging – like in real world projects, you have to make and state assumptions, qualify and confirm some of these assumptions, and ensure your analysis makes sense. You’d have to go back and redo some of the analysis based on new data that comes to light. You’d have to spend a significant amount of time planning your data preparation process. For instance, if you’re building a supervised classification model, you’d need to identify the hypotheses, the associated features and budget time for data preparation tasks. A large part of data science is ensuring that your dataset is suitable for machine learning – that you have target variables identified and your features sorted – and this process of developing a data pipeline is as important as any other step of the process.

I was fortunate for my Capstone project submission to be awarded an “Exemplary Assignment” badge by the course faculty (badge below)! I received feedback via a grading rubric, which was also very interesting and meaningful.

EMERITUS_Badge

Onward!

As anyone within the technology industry will know, learning new things is a regular part of our lives and our mental flexibility in learning new things takes us forward in our careers. This is even more the case with data, machine learning, AI and related areas of technology today. I cherish the learning experience within the PGDMLAI as with others in the past, and treasure it as much as a real project – in many ways, as a comprehensive, well put together program, it was a great way to pick up in-depth skills and solve challenging problems. The proverbial axe has, therefore, been sharpened. 

I hope to spend even more time on reinforcement learning centric problem statements in the coming months. Topics such as GANs and the associated new challenges they present are also interesting. The current Coronavirus disease crisis has given me opportunities to think about the problems that need to be solved in the world today, and new and innovative ways in which data and AI can be used to solve these problems. I look forward to ideate and determine such interesting problems and find ways to extend my newly gained skills to these problems, be they in domains as diverse as healthcare, pharma, telecommunications, manufacturing or technology.

 

 

Leave a comment

Blog at WordPress.com.