Risks of Analytics

Jorge C. Leitão
9 min readNov 8, 2018

--

An analysis to the risks of delivering analytics

In the past 10 years, there has been a major change in the perception of analytics and data science and their positive return on investment (ROI). To a large extend, the hypothesis that analytics has a positive ROI is true. This has motivated a huge investment in analytics departments, centers of excellence and the like. However, in the past 2 years, overwhelming evidence has emerged that the vast majority of investments in analytics have not delivered value — a Gartner analyst reports that 85% of analytics projects fail.

A positive ROI, by itself, is not a sufficient condition to invest, and the evidence just confirms this; an investment is a tradeoff between return on investment and risk, and the latter has been largely ignored in the decision-making process of investing in analytics.

This paper analyses the risks of delivering analytics and advocates to use risk-adjusted return on investment as the primary metric to analyse an investment in an analytics project.

Section 1 — the tradeoff of analytics

At its core, the problem that analytics solves is a decision problem: given existing information, what should I do? This applies to all levels of decision-making: from a CEO down to a highly specialised worker at a factory.

In many cases, the decision-making process can be systematised in a handful set of rules like “if it rains, bring an umbrella”. However, most specialised decisions are difficult because the rules are either unknown or very complex in nature. This is true even if you have information that no one else has (e.g. data about your manufacturing process efficiency and settings of a machine that is part of that process).

There are many methodologies to derive an action from existing information and here we categorise them in:

  • Business intelligence (BI): an action is derived by a qualified person interpreting curated information
  • Analytics: an action is derived from an automatic process built by qualified persons
The difference between BI and Analytics as methods to take decisions

Let us decompose these two: BI’s goal is to provide insight to a person for her to make a good decision. Regardless of whether it is a chart or a list of recommendations, there is a qualified person (a specialist in its field), that uses her experience and BI tools to take a decision.

Analytics is about having qualified persons design and monitor a system that takes decisions. This system can be improved by the team, but the team’s action is not to take these decisions by themselves; it is to build and monitor a system that takes good decisions.

From this simplified description, you can already see that, all things equal, analytics is more valuable, and riskier, than BI: the less a decision process relies on humans, the more it can be systematised (e.g. through computer code, through a robot) and therefore the better economies of scale it has. However, the less a decision-process relies on humans, the more assumptions it has to make about its input.

Section 2 — Assumptions and risks

From a project delivery’s perspective, the decision to use analytics comes at the cost of assuming that the hypothesis necessary for it to work are true, and these assumptions introduce risks. If, during implementation, any of these assumptions do not hold true, there will be need for extra work, which increases both the cost and the time to implementation. This is a traditional problem of project management, and analytics projects are, from this point of view, software projects with higher risk.

To understand why analytics has a higher risk than other software projects, we have to understand which extra assumptions analytics systems require. We split these assumptions in two types,

  • intrinsic assumptions: consequence of a methodological, lasting aspect of analytics
  • extrinsic assumptions: consequence of a social, transient aspect of analytics

and decompose them separately.

2.1 — Intrinsic assumptions

Software can be represented as an input-output system: an operation that receives an input and returns an output. Analytics is no different:

Representation of an analytics system as pure function. For example, the input can be [past data, data of a financial transaction] and the output is whether that transaction is fraudulent.

Most software makes assumptions about its input, and we will now show how an analytics system makes stronger assumptions about its input. To this end, we first provide an assumption that is common to BI and analytics, invariant data schema, and then describe two (strong) assumptions made by analytics, behaviour in case of unexpected input and data stationarity, that make analytics systems riskier to implement than BI.

Invariant data schema

A data schema is invariant when each datum has a type (e.g. it is an integer) that does not change. When the data schema changes, the software almost always requires a code change or otherwise it will not work as intended.

For example, the backend of a website almost always makes this assumption about a database. This assumption is necessary because data types have a strong impact on how a user is expected to have it represented in a browser (e.g. “10 €” vs “boat €”). BI need to represent data (e.g. a chart) and thus it also makes this assumption about the input data. Almost any analytics system in production needs this assumption because analytics models, that form a core part of analytics systems, require specific data types to work as expected.

Output in case of unexpected input

Almost every software aims to be resilient to unexpected input. Achieving this resilience comes with an important assumption: the software’s output in case of unexpected input.

If a BI tool outputs “error: unexpected input” in case of unexpected input, we, humans, can easily interpret that output in our decision-making and act accordingly, e.g. by providing our best guess without one of the BI tools. Analytics systems, on the other hand, have a much harder time encapsulating this behaviour because they do not rely on a human to “weight-in” whether the system’s output should be trusted or not. We can obviously analyse the decision’s usefulness a posteriori and try to improve the system, but this does not change the fact that an analytics system requires strong a priori assumptions about what it should decide in case of unexpected input.

Data stationary

Contrarily to other software, analytics models make assumptions about the stationary of the data-generating process. I.e. a specific summarisation of the data (i.e. a specific model in machine learning; the network architecture in AI; the state representation in reinforcement learning) that was useful in the past will remain useful in the future. This assumption is very demanding.

One example is when the empirical distribution of a datum changes from normal to skewed (see example below). In this situation, implementation of a previously performant analytics system will very likely start taking poor decisions.

Example of a quantity (x-axis) that changed its distribution (y-axis is the probability of each value), from being normally distributed to be log-normally distributed (the x-axis values changed dramatically)

In summary, analytics systems need stronger assumptions about their input and intended output because they by-pass the control that is normally done by specialists; these assumptions induce risks to its implementation. Regardless of whether we can mitigate these risks, their existence is sufficient to conclude that, all things equal, analytics software projects have a higher risk than BI projects.

2.2 — Extrinsic assumptions

Analytics shares the same adoption life-cycle of innovations, and this includes many of their hypes and counter-hypes. These are natural consequences of the uncertainty around the value of an innovation and when they should be adopted. The most relevant aspect of this cycle is that analytics is often oversold. While this is a natural process in the life-cycle of an innovation, from the value-delivery’s point of view, it introduces risks, mainly due to misaligned expectations. Below we provide a list of some of the riskier assumptions made when analytics is chosen to solve a specific use-case. We divide them in three categories:

  • business hypothesis: related to the applicability and valuation of analytics
  • management hypothesis: related to the organisational capabilities to realise value from analytics
  • technical hypothesis: related to the technical feasibility to realise value from analytics

This list can be thought of as a checklist that can help you questioning the risk-adjusted ROI of a proposal to use analytics on a given use-case.

Business hypothesis

  • The use-case requires analytics
  • The use-case’s value can only be achieved through analytics
  • There is a higher ROI of using analytics in the use-case than other methodologies

Management hypothesis

  • The organisation has people with the necessary skill-set to solve this use-case through analytics
  • The organisation has the necessary infrastructure for the use-case to be solved using analytics
  • The organisation has capacity and skill-set to maintain the delivered analytics solution

Technical hypothesis

  • There is sufficient, available and quality data about the use-case
  • There is an existing methodology or technology to solve the use-case, or the team will be able to develop one
  • There is a technological solution that fits the existing technology stack used by the organisation

Section 3 — Risk mitigation

As in any project, risks can be assessed and mitigated.

With a list of the risks associated when considering analytics, we now shift gears and focus on tools that help us to reduce such risks.

Prioritise risk-adjusted ROI

Prior to commit to a project, analyse the risk-adjusted ROI of delivering the use-case through different methodologies. The project’s main goal is to achieve a high ROI with a low risk, and, as the analysis above shows, analytics is a methodology with higher risks. Therefore, its ROI needs to offset these risks to be justified as the best method to use.

One compromise, specially in cases on which little is in place, is to have a BI solution first that helps specialists do better decisions (lower ROI, but much lower risk) and then spend time with these specialists in understanding what would be a analytics system that takes decisions for them (and migrate them to the maintainers of the system). This allows for a gradual migration from a traditional decision-making process to an analytics system, with all the advantages that this entails.

Use Big Data by necessity, not by design

Analytics is often positioned alongside with Big Data, IoT, Data Lake, and a panoply of technologies (e.g. Hadoop, Spark, Nifi, Airflow). This increases the risk of failure and is often unnecessary: firstly, analytics often does not require big data nor big data technologies as its ROI can be achieved with a good small datasets (less than 10 gigabytes of data). Secondly, big data technologies incur a large cost associated with their deployment and maintenance in production. Thirdly, a team that is delivering a Data Lake and analytics is working towards 2 different goals associated with their customers: the consumers of the Data Lake and the consumers of the analytics system. Big data technologies do impressive things and are exciting to work with, but, from a delivery’s point of view, they introduce a large amount of risks and complexity.

Be realist

An effective risk-mitigation tool is to be realist, open-minded and use common-sense. Analytics is an extremely useful, interesting and fun, and acknowledging its trade-offs is key. This amounts to question its usefulness for a specific use-case and understanding whether analytics is the right methodology, or if there aren’t methodologies that have a higher risk-adjusted ROI.

Section 4 — conclusions

Data science, machine-learning, deep learning, reinforced learning are tools that can dramatically change how an organisation realises its objectives. When used in the right use-case and delivered with excellence, they revolutionise how an organisation operates.

However, they are similar to space rockets: they are expensive, can easily blow up, and are most valuable when applied in a specific context.

This paper analysed analytics from a delivery’s perspective. Firstly, it introduced analytics as one of two methodologies used to realise value from information asymmetry and differentiated it as a methodology to deliver a decision-making process. Secondly, it analysed many of the risks that analytics projects incur, both from a technical and management point of view. Thirdly, it enumerated some of the tools and mind-set that you can use to mitigate these risks.

Analytics is extremely popular and valuable. Ensure that its risks are property mitigated to realise its return on investment.

Acknowledgements

This analysis would have been impossible to formulate without numerous discussions with the following great (ex-)colleagues: Laura Frølich, Jukka Ylitalo, David Balaban and Ben MacKenzie. My big thanks to them!

--

--

Jorge C. Leitão
Jorge C. Leitão

Written by Jorge C. Leitão

Leveraging information asymmetry

Responses (1)