Get Started With Data Science In 3 Steps

Data science is probably one of the hottest topics in the world of business today. It comes with promises of transforming our decision making process. But what is it exactly? How do you get started? And does data size matter?

How Do You Phrase The ‘Aha’ Question?

Data is all around us. Organizations collect information about their business on an unprecedented scale, and an increasing amount of data is publicly available if only you are equipped with a computer and an internet connection.

But how do you unleash the potential of the data? How do you phrase that single insightful question that makes you go “aha – that’s why…”. It’s easy: you don’t! At least not initially.

Extracting insight from data is a process in which you continuously:

1. phrase questions whose answers provide value to the business
2. transform, summarize, and visualize the data set to address the identified questions
3. evaluate your findings against the phrased questions and the data set available

Importantly, these steps are part of an iterative process – depicted in figure 1 below – where step 3 naturally leads you to integrate more data sources or refine your questions based on the insight obtained.

Ideally, this process converges to an insight that affects the way the business is operating. In the following, we will apply this methodology on a case study.

Figure 1: The steps applied iteratively to extract insight from data.

The Data Science Process: Identify And Bridge The Gaps

Consider yourself in the seat of a data scientist working with FreezeCorp – an organization specializing in cooling equipment. Over the years, they have accumulated temperature measurements related to their freezers, and they are interested in knowing if you are able to see “something” in the data.

As depicted in figure 2, one important aspect of being a data scientist is for you to identify and bridge the gaps between stakeholders internally at FreezeCorp.

Concretely, the business stakeholders have a cost-centric view on the freezers, wanting to reduce their operating costs. Simply accumulating data does not bring them closer to achieve this.

Grounded in their many years of experience, the domain experts have strong intuitions about how FreezeCorp freezers work in practice. Nevertheless, they are challenged when attempting to generalize this experience into knowledge that improves the existing product.

Finally, the primary concern of the IT specialists is how the freezers operate on a day-to-day basis; they fail to see how data-related insights may help them improve this.

Common to all groups is a skill gap in actually working with the data in a systematic fashion to transform it into actionable insight. This is where you – the data scientist – fit in.

Figure 2: Many different stakeholders are required to bring value from data. The data scientist is often the missing piece in the puzzle.

Step 1: Setup hypothesis

Your first course of action is to concretize that “something” that FreezeCorp is looking for. Specifically, you need to understand the business of FreezeCorp in order to assist them in narrowing down on finding answers to questions that provide the organization with actual value. If you don’t know what you are looking for, the chances of you discovering something valuable are slim.

Consequently, you facilitate a workshop with FreezeCorp stakeholders – cf. figure 2. The outcome of the workshop is a hypothesis – ” temperature measurements form patterns allowing for a grouping of the freezers” – rooted in a business case: “We may save money on maintenance if we can predict malfunctions before they occur”.

Step 2: Data analysis

Step 2 is to extract information from the data set, allowing us to accept or reject the hypothesis. Every data scientist has a favorite set of tools for achieving this. This step often requires a combination of hacking skills (being able to slice the data in ways that filters away the noise) and creativity (being able to present the insight in informative ways).

For this particular case, we learn that the majority of the temperature measurements may be assigned to one of three clusters. Therefore, we create informative visualizations depicting this.

Step 3: Evaluation

In step 3, we discuss our findings with the domain experts who are able to align the identity of each cluster with their understanding of how freezers work in practice. Further, we inform the stakeholders that we need other types of data to identify the remaining measurements – ideally data containing information about freezers that have malfunctioned in the past. The IT experts inform us of the availability of log files containing such information.

In this way, we transition into the next iteration where we return to step 1 to refine the hypothesis to reflect the importance of identifying freezers that are malfunctioning.

Closing thoughts: Size Doesn’t Matter

The story presented above should paint a picture of the way we like to work with data to provide actionable insight to a business. Especially, note the conspicuous absence of the celebrated “big data” term. Often, value from data does not come from having extreme amounts of data, but simply by integrating and unifying the many different data sources that already exist within the organization.

Importantly, data may come in many different shapes (database, log files, or maybe Twitter tweets), and the opportunities lie in aligning these sources in ways that provide value to your business.

To learn more about how Mjølner’s data science team can add value to your business, please contact data scientist Kristian Sneskov: