top of page
Search

How to Write a Great Resume as a Data Scientist — For Beginners

Updated: Oct 26, 2021

As an AI director who hired many AI engineers in recent years, I want to share with you how I select candidates based on their resumes. The hiring process for an AI engineer in most companies has many steps such as a take-home AI assignment and technical interview. However, you need to be selected for those steps. That never happens with a bad resume. Here, I describe how to present the must-have skills that you should highlight on your resume to be selected for the next steps. The skills include, but are not limited to, coding, machine learning, and data.



— Coding

You code every day as a data scientist. So, you must be aware of the best practice in coding and presenting that to your potential employer. The question is how? Nothing can make me more excited about candidates rather than seeing them developed a well-documented ML package on Github. You do not need to develop a complex library; however, you must show that you are aware of all the basics of coding.


The best way of showing that is to put a link to a past ML project maintained on Github or Bitbucket. You must implement the best practices of software development in your project; otherwise, think twice. For example, your project must have a succinct, easy-to-read, and useful document. Plus, the codes must be written based on a specific style guide such as PEP8 in Python. Last, but not least, the code must have a series of tests that can easily run in each step of development. Having a series of tests shows that you may be aware of fundamentals in test-driven development and appreciate the importance of having tests in the development process.


As a sample, you can find a simple Python package sent2vec that I developed and host on Github. You should try to build something similar to this, less or more.



— Machine Learning

A common misunderstanding is to undermine machine learning skills only to “training ML models”. A successful ML product is much beyond training an ML model. You must show in your resume that you can build and maintain an ML pipeline. Nowadays, one can train an ML model through several lines of codes using, for example, the scikit-learn library in Python. However, you need to have real-world experiences in order to know answers to questions such as “how to properly test an ML model?” or “how to maintain and update an ML model?”. Another important skill to show in your resume is whether you know “how to serve an ML model in the cloud or on an application-specific hardware?”. You can use different technologies such as AWS or Databricks but it is important to show on your resume that you know an ML model is not useful until serving to the customer.


In short, you must let your potential employer know that you are aware of the importance of an ML pipeline from training to serving. For example, you can take some hints from the following statements, and tailor them to your past experiences, to describe your skills: [a] “Built, deployed, and managed an NLP solution to analyze legal documents using Spacy, CircleCI, DockerHub, Terraform, and AWS”, or [b] “Developed a complete ML pipeline including feature extraction, model building, model evaluation, and model selection”.


— Data

Many candidates neglect this pilar. If you are a good coder and know various machine learning techniques, it still does not mean you are a good data scientist. To be a good data scientist, you must know answers to many data-focused questions such as:

  • “How to design a scalable data pipeline?”,

  • “How to measure and evaluate the quality of data?”,

  • “How to identify the relevancy of data to the business objective?”, or

  • “How to efficiently collect a vast amount of data?”.

Saying that you must show in your resume that you worked with a large amount of data and have insight about challenges that a bad data architecture or low quality of data may arise. So, make sure to emphasize that you address these sorts of challenges in your past experiences on your resume.


For example, one can write this: “Designed specifications of imaging setup such as camera model and lighting pattern to ensure collecting quality data with low noise.” This statement in a resume shows that the candidate is aware of the importance of quality data as well as data collection setup.


The Last Words

To be able to take an interview for a data scientist position, you must certainly show the following skills in your resume:

  • “How to code in Python compliance with all the best practices?”,

  • “How to build, evaluate, serve, and manage an ML pipeline?”,

  • “How to resolve issues raised by the large volume and low-quality data?”.

Please note that nothing is guaranteed but if you take these steps your chance will be increased.








Comments


bottom of page