top of page

Data-driven decisions, actionable insights, business impact — you’ve seen these buzzwords in data science jobs descriptions. This article is about data-driven indecisiveness, finding misleading insights and making a negative business impact.

The Model

I was working at a fashion e-commerce website and built a product ranking model to show visitors the most relevant products. The idea was to predict if a user would click on a product they see. Based on this score, we then would show the products with the highest expected click-through-rate first.

A large and clean dataset with views and clicks was ready and I used it to train the model. Some fit-predict magic, and, voila, we had the predictions on whether a user clicks on the product. The ROC AUC measured on train/test split was close to 0.9. I was happy with the work I’ve done and ready to put this on my achievements list. But alas, life had other plans.

The Metric

To quantify our success, we ran a controlled experiment on real users. A few weeks later an A/B test showed a double-digit increase in CTR in the treatment group. Happy, I started browsing articles on how to negotiate a promotion. It’s not every day that you make such a business impact at this scale!

A few more days passed by and it turned out that along with the uplift in CTR, the number of orders in the treatment group actually went down. Here’s what happened: the model optimised for products with catchy images and it resulted in window-shopping behaviour. Rather than increasing sales, it actually led people away from buying!

I quickly updated the model to predict purchases, instead of a CTR. And, to account for delayed conversions, it allowed for one week window between seeing a product and placing the order. Another experiment, a few more weeks of waiting, and the results were ready: both CTR and 7-day conversion were up. We had twice as many orders in treatment!

My joy didn’t last long as it turned out that the average basket size went down. Each order was half the value, which means visitors in control and treatment spent exactly the same amount. Apparently, the price was the most outcome defining feature in the model. Instead of coats, we started selling a lot of underwear.

A few weeks later, a colleague mentioned a drop in order returns. We had a free return policy for one month after the purchase and most people tended to send their goods back around this deadline.

This was good news, wasn’t it? Return-related expenses (shipping, return handling, customer support costs) went down, and so did the costs, in general. While visitors in control and treatment spent the same amount, the drop in costs brought the contribution margin up, making an average order in treatment group profit-making. My promotion was a slam dunk!

The Finance

Now let’s take a look at the earnings report of an online retailer.

It starts with the Key Performance Indicators. Remember when we improved the number of orders but decreased the average basket size?

Then we have the income statement, and it starts with the top-line — revenue. Revenue is essentially the amount users paid and in our case, it’s equal to the number of orders multiplied by the average basket size.

Then we have the costs, and we contributed to this part by reducing the return-related expenses.

And finally, we have the bottom-line — profits.

As we were trying to optimize for different metrics, from conversions to profits, we followed the same order as in the financial report:

Speaking of our high-performance model, how did it affect the business performance?

Product Metrics Interview Questions

Our original hypothesis was that by building a model with a high ROC AUC we would increase CTR, which, in turn, was supposed to have a positive impact on conversion. These conversions would generate more revenue, which was expected to drive up the profits. In our case, all of these ideas turned out to be wrong. Whoops!

The Indecisiveness

Let’s get back to our experiment with equal revenue in control and treatment. As our model doubled the number of orders, it also resulted in an increase in the new buyers that created accounts and subscribed to our email updates. We were selling everyday items that people buy regularly so some of the new users would come back and order again. In the following months, a significant number of users made repeated purchases and generated recurring revenue. Bingo, we made the right choice!

What Would You Do?

As a side project, I built a data science interview chatbot and included this case study. Over a thousand people provided their text answers, not biased by a set of pre-defined options and here’s the distribution of the results:

Product Data Science Interview Questions

1. “You built a product ranking model that is expected to improve the relevance of product catalogues. What would you choose as the primary metric to help you decide if this model is better than the previous one?”

  • 47% of responses include clicks, click-trough-rate and similar metrics

  • 34% orders, conversion rate, revenue, and other sales-related metrics

  • 19% session length, customer satisfaction and other UX metrics

  • 12% ROC AUC, RMSE, precision/recall and other ML performance metrics

  • 8% website visits, customer satisfaction, and other unpopular options

(the total exceeds 100% as some responses mention several metrics from different groups)

2. “It turned out the users that saw the new version made twice as many purchases but each order was half the value compared to the control group. What other metrics would you look at to decide which version is better?”

  • 83% revenue, total spend amount and other revenue-related responses (in fact, revenue is equal between two versions)

  • 8% retention, lifetime value, frequency of purchases and other long-term metrics

  • 5% profit, cost of acquisition, operating expenses, return rate and other cost metrics

  • 3% customer satisfaction, NPS, and similar qualitative data points

  • 1% new customers, user acquisition

3. "Would you make the same decision about the results of this experiment if it wasn’t a clothing store but an online grocery store?"

  • 68% no (because “retention is higher”, “groceries expire and have smaller profitability margin”, etc)

  • 32% yes (because “statistics works everywhere”, “data is objective”, “both are retail”, “the goals are the same”, etc)

The Summary

In its guide to product metrics, Mixpanel recommends this simple question to test whether you’re dealing with a good indicator: “If we improve this number will the product’s long-term performance improve?”

We just walked over three different metric types: ML model performance, product metrics and business metrics. You can evaluate the model performance based on historical data but it might be too far from the business goals. The business metrics, take a longer time to measure but, at the end of the day, our models are expected to have a positive impact on the bottom line.

Product Data Science Interview Questions

A good metric can be tracked and serves as a proxy to long-term business performance. And while it might be hard and time-consuming to track the direct impact on profits, the closer we can get, the better.

Originally posted by Oleg Novikov on

I was a data scientist in a startup when a product manager walked up to my desk and asked, “What is unsupervised learning?”. I gave them a textbook definition and it helped us increase the revenue and make our customers happier. Soon, I got promoted.

This story obviously never happened but if you google data science interview, this is the type of questions you’re going to come across 🤷.

How much does this “what is (some ML concept)” format tell you about one's skills?

Say, you're hiring for a Product Data Scientist and a part of their job is going to be designing experiments. On a stats exam, you can expect questions about p-values and distributions 🎓. But if a data scientist is expected to make data-driven decisions, why not provide them with the context and ask how they'd measure the impact of a new feature? Eventually, after a discussion about why it has to be an experiment and what success would be like, it'll come down to the duration and stats tests, among other criteria.

When working on a data science mock interview plot for a free interview preparation tool, I based it on my experiences while working at Uber and other startups. It includes some fond (and not so fond) memories such as A/B tests resulting in metrics going in opposite directions 🤯 or constantly balancing between working on long-term modelling projects and urgent ad-hoc requests.

After reviewing the first 500 mock interviews, it led me to a conversational format. It guides a candidate through these situations and lets them highlight their problem-solving skills.

I realised that a mix of open-ended hypothetical and behavioural questions works the best. It’s also a good way to help candidates demonstrate their business sense and communication skills, in addition to technical knowledge.

Five Rules of Data Science Interview Questions

🧐 Provide clear expectations and don't anticipate a single correct answer

Often, interviewers expect to hear a certain solution or narrative in an answer but fail to articulate it when phrasing the question or giving hints. Think of some vague hypothetical situation that can be addressed in many different ways (e.g. "how would you predict user churn?"). Depending on the candidate's background, they might focus on certain things they're the most familiar with but not what the interviewer has in mind.

🧑‍🏫 Let it be as close to real work as possible

Unless you're looking for someone to help you cheat in exams, the fact that someone memorized a complex formula doesn't imply their value to the team. Unfortunately, it became almost standard to leave just a few minutes at the end of an interview to let the candidate ask about the team and its projects. Interview questions relevant to day-to-day activities also help you provide more insight into the kind of work your team does.

🤯 Avoid questions that require a lot of context-setting

Interviewing time is limited and you don't want to waste it setting the stage. Asking about niche topics, relevant only to a specific industry might also give an unfair advantage to a candidate who happened to work on a similar problem recently. It's worth focusing on the skills that take a longer time to acquire rather than something that can be learned in the first week at work.

🥱 If it can be Googled, don't ask it

... because they'll have a chance to look up things at work. According to Glassdoor, there's a link between interview difficulty and candidate satisfaction. As an interviewer, you’re trying to get the most signal about a candidate’s skills in a short time. Ideally, each series of questions should be a mini-project that allows a candidate to demonstrate their business sense, clear communication and technical depth. This approach also allows you to give hints and unblock them without spoiling the entire answer.

🤺 Know what skills are important for the role and focus on them

The goal of an interview is to assess how well a person would perform in a certain role and not what they don't know. I used to ask some tricky SQL question that half the candidates failed until I realised that those who didn't pass it were familiar with the topic. They lacked attention to notice a certain edge case (which is normal during an interview) but it wasn't the goal of the assessment. In other words, have a clear picture of the skills and competencies critical for the role and focus on them.

At the end of the day, interviews are the most outcome-defining part of the hiring process. Keeping them fair and consistent is key to building a strong team. Good questions make interviews engaging and fun, and remind us why we chose to become data scientists in the first place.

bottom of page