This series is meant to help Econ PhDs understand the tech industry job landscape and prepare for the job hunting process.
Links to other parts of the series:
Part 3 — the Job Hunting Process I (when to apply, how to find positions, networking, CV)
Part 4 — the Job Hunting Process II (the interview process, timeline management, wage bargaining)
SELECT, FROM, WHERE, GROUP BY, ORDER BY, LEFT JOIN/INNER JOIN, UNION, CASE WHEN, window functions (e.g. RANK(), ROW_NUMBER()), etc.
Coding: algorithm and data structures
binary search, different sorting algorithms (e.g. quick sort, merge sort), operations on strings (e.g. de-duplication; palindrome); list/array, tuple, dictionary, tree, stack, queue, heap; time- and space- complexity analysis, etc.
Coding: dataset manipulation
You should be able to do what you can do in Stata in Python pandas (or R’s equivalent package):
show summary statistics; deal with missing values, duplicates, outliers; drop or add columns or rows; replace values by a certain standard; encode categorical variables (label vs one-hot encoding); plot graphs (density plot, bar charts, line plot, scatter plots, etc.); collapse (e.g. groupby()); reshape (e.g. pivot()); merge data frames; run regressions; hypothesis testing, etc.
A/B testing aka experimental design
what is one unit of analysis (e.g. user? user-day? city?), how to sample, how to randomize, what variables to collect, how long to run the experiment/sample size analysis, hypothesis testing, what to do if there’s no significance, if there’s significance do you roll out, etc.
given this business objective, what metrics will you use, what are the pros and cons of them, what can you do to solve those cons, what alternative metric would you propose, etc.
diff-in-diff, RDD, synthetic control, propensity score matching, etc.; to each of them, be prepared to be asked very detailed follow-up questions
supervised and unsupervised learning models (linear regression, logistic regression, naive Bayes, tree based methods, KNN, k-means, LDA, PCA, etc.), feature selection and dimensionality reduction, ensemble methods (random forests, gradient boosting decision tree, etc.), regularization (Lasso, Ridge, ElasticNet), standardizing variables, train-test split, k-fold cross validation, result evaluation (mean-squared errors, confusion matrix, ROC, AUC, etc.), model selection, multi-armed bandits etc.
Open-ended case study
The interviewer describe to you a problem the firm is trying to solve and ask you how you would solve it; in the process of answering it you might need to use any of the above elements.
To prepare for this type of questions, use their products (if it’s to-C); read their data science or engineering blog; search on YouTube for any talks their data scientists did; go to their company’s website to see if they have public-facing videos explaining their business (e.g. Uber does)
Anything on your CV is fair game; you can be asked to talk about one of your research papers, and answer seminar-style questions
why not academia, why tech, why our firm, why our team, describe a time when you X-ed (e.g. succeeded, failed, collaborated, communicated, impressed, learnt, etc.). Be prepared to be asked follow-up questions
Often not asked standalone (e.g. often incorporated into a case study): definitions of type I and II error, p-value, significance level, power, confidence interval, z-score, t-stat; hypothesis testing, different types of tests, multiple hypothesis testing problem, etc.
‘what’s the probability of a certain poker hand’ type questions
write python code to run simulations, bootstraps, etc.
Of course not all interviews test for all of the above elements. However, in my job searching process, *all* of the above topics were asked/tested in one way or another in at least one of the interviews at one of the firms. The recruiter indeed will let you know what to prepare for, but in my experience, what they said are not always accurate, so it’s good to be prepared for anything that’s a fair game.
There’s *a ton* of resources out there on the web. To name a few:
- Susan Athey and Michael Luca wrote a nice review on ‘Economists (and Economics) in Tech Companies’
- Elite Data Science Academy: has both free and paid contents. I paid for its ‘Machine Learning Accelerator’ and ‘Interview Prep Kit’ and found them useful (I was able to get an offer with ‘Machine Learning’ in its title after using these two). Caveat: my level of ML knowledge was nonexistent before that — I thought ‘supervised learning’ means that a human needs to supervise the computer when it’s running the algorithm and ‘unsupervised learning’ means you can go do your house chores and leave the computer alone :) If you already have taken a ML class, you’ll likely find EDS too basic.
- TowardsDataScience: a platform with lots of articles on specific topics in Machine Learning or data science in general, e.g. ‘what is the bias-variance trade-off?’, ‘how to encode categorical variables?’, etc.
- Interview Query: website that aggregates Data Science interview questions. Also contains both free and paid content.
- LeetCode.com: for practicing coding in Python or SQL
- w3schools.com: for learning Python or SQL
- Kaggle.com: practice ML by working on projects. Caveat: in addition to Kaggle projects, I would recommend using a *real* dataset to do a ML project either for the sake of having it on your CV or being able to talk about it in an interview. Often times in the real world the hardest part is figuring out what features you want to collect, collecting them, and cleaning them (‘out of these ~500 variables, where do I even start??’ — what I ran into when I tried using a *real* dataset to do a house price prediction ML project)
- Levels.fyi, GlassDoor, Blind, HandShake, Indeed, LinkedIn, etc.: places to find salary info, reviews on different firms, interview questions, connections, job postings, job alerts, etc.
- withralph.com, candor.co, etc.: firms who help with your wage negotiation for a commission
- Coursera, Udemy, Udacity, DataCamp, etc.: platforms where you can take classes on data science related topics
- Insight Data Science, Zipfian, Laioffer, etc.: paid bootcamp-style job-seeking-oriented programs that teach/train you Data Science related knowledge/skills
- To prepare for the case study interviews, use the company’s product as much as possible; read/watch their ‘engineering blog’, ‘data science blog’, videos on their website, their youtube videos — especially for bigger firms, their data science team would go to different universities to give talks on problems they’re solving. Here are examples for that of google and uber
The above are just examples of a large set of options in each category; in general if you just google around you’ll soon end up with more resources than you need.
How to Ace the In Person Data Science Interview
I’ve written previously about my recent job hunt, but this article is solely devoted to the in-person interview. That…
The Data Science Interview Study Guide
121 resources to help you land your data science dream job
Notes and technical questions from interviewing as a Data Scientist in 2018
After almost three years at Jobr/Monster, I have decided to leave to pursue other opportunities. This gave me the…
Giving Some Tips For Data Science Interviews, After Interviewing 60 Candidates at Expedia
During past year, I interviewed many people for data science position at Expedia group, from entry level to senior…
The Ultimate Data Interview Checklist
Nervous for your Data Science / Data Engineering interview? Start here.
Compilation of resources and insights that helped me on my journey to data scientist. Reposted for readability: The Big…