Beginner to Data Scientist in 25 months: Every Project and What I Learnt
Learning to code as a full-time doctor đ©ș
At the start of 2018, I had recently graduated from medical school. I knew basic web development and not much more.
Since mid-2020, Iâve been working full-time as a data scientist, analysing Big Data with Spark, SQL and Python (for a company with $300 million annual revenue).
I built my skills alongside a full-time job as a doctor, by doing courses and projects in my free moments.
Here Iâm going to share every course and project that took me from zero to a full-time technical role in just over 2 years.
đ Whatâs the use of a project list?
This isnât a prescriptive list. Nor is it a âhow-to guideâ.
Instead, I hope to:
- Give inspiration for your own project ideas
- Give insight into the journey of building data science + coding skills
Iâve shared project details, timelines and the rough amount of time spent on each. Of course, rate of progress depends on many factors (amount of time you can/want to put is key đ). I did this alongside a fairly demanding full-time job, so obviously could have made faster progress. But it was high on my priority list outside of work, so it could also have been slower.
đĄ Themes I noticed
Making this list helped me identify two themes:
- I started with online courses but shifted to projects as soon as possible. (Iâm a big believer that projects are where the real learning happens.)
- The projects got cooler and more complicated as time progressed. While each project builds your skills, it also helps you identify (i) whatâs possible and (ii) whatâs interesting.
đŁ THE JOURNEY: FROM BEGINNER TO NOW
(1) The Javascript Road Trips 1, 2 and 3âââCodeAcademy [COURSE]
- Date Completed: JanâââFeb 2018
- Time Committed: ~30 hours (roughly 2 evenings per week for 4-6 weeks)
- Link: Here (now on Pluralsight).
When starting off, I had no idea where to begin. I wasnât even sure what coding language to learn.
So I googled something along the lines of âBest coding language to learn 2018â and Javascript and Python seemed the most consistent top answers.
I decided to start with Javascript, looked for a course online and found this one.
Main takeaway(s):
- Itâs more import to start learning than to start learning the ârightâ language. In retrospect, I should have started with Python (I have used it way more since). But this course taught me so much about object-oriented programming, which is relevant for many languages, so absolutely wasnât a waste of time.
- Projects help you learn and remember. Looking back, the main principles that I still remember from this course are those I applied to later projects. A lot of the knowledge slipped out my brain (âuse it or lose itâ).
đ (2) Python for Data ScienceâââDataCamp [COURSE]
- Date Completed: March 2018
- Time Committed: ~30 hours (roughly 2 evenings per week for 4â6 weeks)
- Link: Courses 1, 2 and 4 from this learning track.
This was a great course for introducing the basics of coding for data science; numpy, pandas and matplotlib, which I still use heavily to this day.
Main takeaway(s):
- When learning a new coding language or library, itâs important to build a reference of the functions you learn. Iâve lost count of the number of times I forgot and re-remembered (via Google) how to index into pandas DataFrames. The moment you have somewhere to refer each time (whether thatâs your own notes or a âCheatSheetâ like this), life gets a whole lot easier.
đŹ (3) Text Analysis Tools [PROJECT]
- Date Completed: April 2018
- Time Committed: ~20 hours (roughly 2 evenings per week for 4 weeks)
- Link: I shared what I can on GitHub here.
While working part-time at a health-tech start-up in south London, I had the idea to make tools for analysing text. The company provides home care and they had a lot of unstructured text reports from visits.
I wrote code to perform various basic functions such as (i) split long reports into individual sentences, (ii) to count frequency of words to understand common topics and (iii) to create a .txt file that could be used for a language model (a later project).
Main takeaway(s):
- Get started with projects as soon as possible (no matter how small or unimportant). Looking back, this was a really simple project that I could probably do now in less than 10 minutes. At the time, it took me much, much longer. But in the struggle of figuring out how to properly import a .csv file, define functions and execute loops, I learnt so much that I couldnât have learnt in a course environment.
đ„ (4) Machine Learning by Andrew NgâââCoursera [COURSE]
- Date Completed: AprilâââMay 2018
- Time Committed: ~40â50 hours (roughly 2 x 2Âœ hour sessions per week for videos, plus 1 long session on the programming assignment for ~2 months)
- Link: Here.
This is the course where so many MLers begin and a common first window into the world of MOOCs.
The course design is great; the video with text transcripts, regular check-in questions and useful quizzes and assignments at the end.
Main takeaway(s):
- If you want to get started with machine learning, commit to finishing this course. It wonât be a waste of time. If you hate it, machine learning may not be for you. (I struggled with some of the maths at timesâââpush through.)
- Doing programming assignments helps you understand the topics. They say you donât truly understand something until you have taught it. Iâd say until you have programmed it.
- Thereâs a lot of machine learning that this course doesnât cover. No mention of tree-based approaches, graphical models, bayesian inference and lots more good-stuff.
đȘ (5) Deep Learning by Andrew NgâââCoursera [COURSE]
- Date Completed: JuneâââJuly 2018
- Time Committed: ~15-20 hours (in free-momentsâââcommute, lunchtime, etcâââfor around 2 months. Only watched videos, didnât do coding exercises.)
- Link: Here.
Main takeaway(s):
- Sometimes itâs more efficient to just watch the videos (and not do the programming assignments). It depends how much time you have and how deep you want to go. I learnt a lot with programming anything here.
đ (6) Physical Health Monitoring in Community Mental Health Trust [PROJECT]
- Date Completed: AugâââSept 2018
- Time Committed: ~10 hours (roughly 2 x 2.5h sessions per week for ~2 weeks)
- Link: not able to share
While undertaking a psychiatry placement in South London, I noticed that the team were finding it hard to track and maintain the different physical health monitoring requirements of different patients. Patients on a particular drug would need blood tests or physical examinations at a certain frequency, but this varied a lot between patients.
I found that the key patient details could be exported to a .CSV file which I could put into Excel to analyse. I still wasnât overly confident with Python at this stage, so I opted to use combinations of some simple Excel algorithms to process the data and highlight the next action required.
After we implemented this system, we were able to improve the percentage receiving optimal monitoring from 22% to 71%.
Main takeaway(s):
- You donât always have to write complex code to make a difference. I didnât write a line of code, but this was the most impactful data science-type thing Iâd done at that point. What problem in your work-place could you solve with what you currently know (or could learn)?
đ (7) Predicting drug response using epigenetics [PROJECT]
Date Completed: November 2018
Time Committed: ~14 hours (one full weekendâ-âcreated at the Cambridge Cancer Genomics Hackathon)
The response to anti-cancer medication varies widely. Before starting treatment, itâs often unclear who will respond well and who wonât. Completing a course of ineffective drugs wastes valuable time and can cause side effects.
In a team of 5, we tried to build a tool that could predict who would respond based on epigenetic changes after initial administration of the drug. We build a proof-of-concept algorithm (and won an award at the Hackathon).
However, this canât be applied in the real-world yet because we simply donât have the data. Itâs not routine for epigenetic data to be collected before, during and after drug treatment. Hopefully in the future, though!
Main takeaway(s):Hackathons are a great way to work on interesting problems and level-up skills. The technical mentors and other team members taught me so much, and I used sk-learn for the first time.
đ (8) A language model based on carer reports [PROJECT]
- Date Completed: Dec 2018âââJan 2019
- Time Committed: ~10â15 hours (roughly 2 sessions per week for 4-6 weeks)
- Link: On my GitHub page here.
This was a continuation of earlier work for the health-tech start-up I was working for (Project 3).
To better understand the contents of written carer reports, I decided to build a language model (which uses the examples you show to write âfakeâ alternative reports). This was partly educational and partly to better understand the types of things coming up in the reports.
Main takeaway(s):
- Most of the battle is getting data into the right form. This was my first time making a language model and my first time using Keras. Almost all my time was spent figuring out how to process the data to feed it into the model. Almost no time was spent on coding the model itself (which Keras makes really easy)
đ (9) Preventing medication mistakes with âThe Pill Detectorâ [PROJECT]
- Date Completed: Jan 2019
- Time Committed: ~20 hours (one full weekend, including nightsâââcoded at the Cambridge University Hackathon)
- Links: code on GitHub / submission / video presentation
The idea behind this project was: People often make mistakes when taking their medication, such as confusing different pills. This is particularly a problem in the elderly.
We wanted to make a device to reduce this. We made a tool which takes a picture of a pill, classifies the medication and then checks against a patientsâ medical record to see if itâs right to take. We felt this could be a helpful âlast minute checkâ before taking the pill itself.
Main takeaway(s):
- If your coding is not strong, you can still add value with domain-specific insight. While I helped with some of the code, others on the team were much stronger. What was most useful for the team was my medical insight.
đ (10) Preventing clinical deterioration in the elderly [PROJECT]
- Date Completed: Aug 2018â Sept 2019
- Time Committed: ~200+ hours (not only writing code, working roughly 1 working day per week)
- Links: NHSx report / Media report / unable to share model (commercially sensitive)
While working for Cera Care (a healthtech company in London), we built an algorithmic platform for predicting clinical deterioration. This was a big factor in the companyâs subsequent ÂŁ54 million funding round.
This was a fairly hefty project, and the programming was only a fraction of the overall work.
Main learnings and reflections:
- Data structure is really, really important. A machine learning model is only as good as the data itâs trained on. For companies that utilise data science, their value is largely determined by the quality the data they have. A huge part of this project was to improve the structure structureâââonly a very small part was training the actual model.
đšâđ» (11) A database of âsocial prescribingâ services [PROJECT]
- Date Completed: AprilâââJuly 2018
- Time Committed: ~20â25 hours (On average, an evening or two per week for a few months)
- Links: GitHub
Social prescribing is when a doctor âprescribesâ a social activity, like a dance class, social meet-up or other event that âfocuses on improved quality of life and emotional wellbeing.
However, itâs hard to keep track of services. A GP and I set about making a web-facing database that could keep track.
I created an initial skeleton using the Django framework, but ultimately didnât have enough time to commit for it to take off. Also, somebody else working on the same idea received a lot of money and started building it at scale, so we sidelined the project.
Main learnings and reflections:
- Accountability is really helpful. It can be the difference between a completed project and dormant code. In our case, once the end-goal dissipated, so did the energy.
đšâđ« (12) Educational coding exercises: âCoding Medical Applicationsâ [PROJECT]
- Date Completed: JuneâââSept 2019
- Time Committed: ~15â20 hours (2 or 3 evenings per coding exerciseâââfour exercises in total)
- Link: Code available here and here / Video series here
I decided to make some coding exercises specifically applied to healthcare, to encourage people with medical backgrounds to learn to code. I ended up making several:
- How to code a medical calculator for SIRS: blog / video
- How to code a neural network to predict hospital attendance: blog / video
- Diagnosing breast cancer with AI: blog / video
Main learnings and reflections:
- Teaching is an amazing way to consolidate learning. Doing this was a helpful prompt to re-visit core principles and really reinforce the basics.
(13) Predicting Loan Non-Repayment [PROJECT]
- Date Completed: Oct 2019
- Time Committed: ~10â15 hours (a couple of full days over the course of a couple weeks)
- Link: Code on GitHub
This project was set as an intra-university challenge at UCL, to secure some consulting work. In the end, I decided not to take on the work but still completed the full project.
The idea was to predict who wouldnât make their loan payments based on a wide range of input variables.
Main learnings and reflections:
- Get feedback. I shared this code with a few different technical people, and the feedback was invaluable. I learn so much every time I ask people to review my code.
đ (14) Predicting depth of faults in âheat exchangerâ tubes [PROJECT]
- Date Completed: MarchâââApril 2020
- Time Committed: ~150 hours (not only coding, but also making presentation and having teams meetings. Worked 9â5, 5 days a week, for 5 weeks)
- Link: Code on GitHub
When coronavirus came along and we went into lockdown, I was keen to get some practical coding experience remotely. Thankfully, I was accepted onto the S2DS remote program.
I worked in a team of 4 to build a machine learning model that predicted depth of faults in âheat exchangerâ tubes and achieved a good RMSE score.
The S2DS program is greatâââIâd highly recommend it.
Main learnings and reflections:
- The fastest way to learn is full-time. Thereâs really no substitute to immersing yourself into a project. I learnt more in 5 days of this project that I did in 5 spaced-out days spent on previous projects.
- Having technical mentors at-hand are a game-changer. When you get stuck, having someone to message saves a huge amount of time.
đŒ (15) AI-generated Art [PROJECT]
- Date Completed: MayâââJune 2020
- Time Committed: ~5â10 hours (a couple evenings a week for around a month)
- Link: Blog
I used the AI technique âstyle transferâ to take personal photos and give them an artistic style. I adapted existing code from GitHub.
Main learnings and reflections:
- Having fun makes everything easier. This project was genuinely exciting and I felt I could embrace my creative side. It was also cool to decorate my house with the photos that I generated.
đ (16) Automating my job search with Python [PROJECT]
- Date Completed: May 2020
- Time Committed: ~10â15 hours (a couple of evenings for a couple of weeks)
- Link: Code on GitHub / Blog
I got bored while searching for new jobs, so wrote code that would automate the process.
Main learnings and reflections:
- Scratch your own itch. This idea was just a small solution to a genuine problem I was having.
- Share what you make. Iâve had people reach out and say this code was helpful. This is a really cool feeling. You have to share your code to make this possible (of course!)
đč (17) I created my own YouTube algorithm (to stop me wasting time) [PROJECT]
- Date Completed: Juneâ Nov 2020
- Time Committed: ~40 hours (around 10â15 hours a week for a few weeks, then a pause, then a weekend to wrap it up)
- Link: Code on GitHub / Blog
I felt the YouTube algorithm was hit-and-miss (at best), so I coded an alternative.
The blog write-up ended up ranking first on Medium, which was pretty cool. I had several people reach out, and build on top of the code also.
Main learnings and reflections:
- Think about âtechâ that you donât think works well. How could you improve it?
đ„ The end of the beginning: starting my first full-time data science job
After all of the above, I landed my first full-time data science job. Iâm loving it! I learn a lot every day and work on exciting real-world problems.
Iâve written my best code since starting this job (but unfortunately canât share it here). And of course, Iâm still learning!
I hope some of these projects have given you ideas and inspiration for your coding journey.
Chris
Comments powered by Disqus.