When was the last time that we saw a product announcement for ‘data analysis’? Instead, we’ve seen...
I would love to say the future of analytics is here, all your problems solved with the new AI dashboards (or Augmented Analytics). But first, what’s wrong with what we have today? Is the change even needed?
The Challenges with Traditional BI
The evolution of BI has been divided into three generations: Gen One BI was essentially based on data managed, cleaned and extracted by IT. Data extracts were then further processed manually by data analysts to produce reports. Gen Two focused on self-service and data democratization in the hands of business users. Did it work? You tell me. There is Gen Three now, and the jury is still out there. But let us first understand what are the problems with the first two generations of BI.
- Backward looking
Self-serve BI applications typically provide retrograde oriented analysis. They present information about what has happened rather than what is happening. This approach may have been relevant in a less digitized and dynamic environment. But in today’s environment with constant flows of new information, this model of BI is not useful for making decisions.
- Complex GUI
Self-serve BI dashboards are also designed with overcrowded and complex GUIs with multiple filters, dimensions and conditions that need to be selected correctly. All this can be bewildering for a business user to navigate through, particularly if dashboards are only a small part of their day job. With increasing volumes of data from digital sources this problem only gets compounded.
- Require support from IT and Data Analysts for on-demand analysis
Require intervention from both data analysts and IT to facilitate the incorporation of new data. As a result, updating the reports can be time consuming and delays the use of data.
- Lack of insights and recommendations
The reports do not provide insights and recommendations, putting a lot of onus and effort on the user to work things out.
Clearly, if we are to have a BI that is more closely linked to the decision-making process, we need to have some, or all these problems addressed!
AI Powered Dashboards or Augmented Analytics
So, what do next gen BI dashboards look like? A 2021 Gartner report had identified Augmented Analytics in dashboards as one of the most important trends. We have now entered 2023 and the Augmented Analytics trend is very much a part of BI services offered by leading players.
So, what is Augmented Analytics? Gartner defines Augmented Analytics as being: “…the use of enabling technologies such as machine learning and AI to assist with data preparation, insight generation and insight explanation to augment how people explore and analyze data in analytics and BI platforms. It also augments the expert and citizen data scientists by automating many aspects of data science, machine learning, and AI model development, management and deployment.”
Essentially, Augmented Analytics involves a layer of AI and machine learning based algorithms to enhance and facilitate standard self-service BI. All the major vendors are racing to incorporate some of these functionalities into their BI services. As Forbes notes, “Augmented analytics platforms are not one-size-fits-all. They vary widely in terms of the types of data they can ingest, as well as their ability to connect to different storage platforms”. Given this, broadly these capabilities fall into 3 broad heads:
- Advanced predictive & diagnostic capabilities & recommendations
- Natural language querying and insight generation
- Augmented Data Prep
The extent to which these are available in different platforms can vary with some having a more comprehensive set of features than others in this space. Let’s understand what these mean from the BI perspective.
Advanced predictive & diagnostic capabilities & recommendations
Augmented Analytics aims to leverage data science and machine learning algorithms to facilitate business decisions. The addition of these can potentially transform a BI system from a backward-looking system to one that is forward looking. This breaks out into two broad related types of Augmented Analytics: firstly, advanced data science and machine learning algorithms for prediction and diagnosis of data; and secondly, recommendation systems.
Incorporation of advanced statistical or predictive/diagnostic analysis:
Most self-service BI reports focus on historical reporting or the ‘what has happened’. Since a massive volume of data is available from multiple sources in near real-time, it would be far more helpful to business users to have input on likely courses of action, given all this data. This brings in the need to incorporate predictive analytics, essentially the process of predicting what could happen to leverage historical data and data science techniques. In the context of Augmented Analytics in BI platforms, this requires AI technology to run at the back end, to automatically select forecasting, clustering, and prediction algorithms that fit the data best. The critical point is not that these advanced analytics features did not exist earlier. They did, but they required users to have some knowledge of which technique to use and when and to be able to interpret results and troubleshoot. Augmented Analytics aims to take away this burden from the business user by automating the process and having it run in the backend. In fact, a Tableau blog notes that, in some of the more advanced settings, the models would run automatically to surface deep insights about the data that may be the analyst had not even thought of.
Inclusion of a Recommendation System
Predictive analytics can tell the user what is likely to happen. A recommendation system would go further and give recommendations for next steps. A definition of recommender systems by Nvidia is: “A recommendation system (or recommender system) is a class of machine learning that uses data to help predict, narrow down, and find what people are looking for among an exponentially growing number of options”. Recommender systems are widely used by ecommerce companies such as Amazon to offer product suggestions to consumers under heads such as, “ you may also like..”. Similarly, the Netflix recommendation system is a classic example of a recommender system which has been used successfully to surface recommendations to movies from its vast libraries based on preferences and watching behavior of similar customers.
A recommender system essentially suggests to consumers new products and services that might be relevant to them based on various criteria, including past purchases, search history, demographic information, and other factors. This is definitely a marquee feature for Augmented Analytics systems, as deriving recommendations in BI can be tricky given how important context is in BI. A recommendation system, if implemented successfully, would be a strong way to democratize data and also to leverage data meaningfully for suggesting future courses of action.
All this sounds good in theory, but how does it work out in practice? This aspect of Augmented Analytics is being done by most of the leading BI platforms in this space. However, there is still some distance to cover. A 2022 Gartner review of the platforms on analytics capabilities finds that overall all BI platforms scored either average or below average on Augmented Analytics. Leading platforms such as Tableau and Power BI, though promising a lot, actually lag newer entrants such as Pyramid Analytics and Tellius in this area. Tellius, which is a Spark based platform, for example, has the strongest feedback in the area of automated advanced analytics. The Tellius AI-powered insights automatically test millions of rows of data to surface hidden key drivers, trends and segments.
For example, the image shows a predictive analysis of campaign performance done with a few clicks using Tellius.
Screen grab sourced from Intro to Tellius AI-Driven Analytics
Are we there yet?
The challenges in implementation of such systems are primarily related to getting a business model that makes sense. The technical feasibility is in place. However, as we all know the real world is a complex place and developing business driven solutions is not easy. This is really the play of the tech teams, business end users and consultancies to develop solutions that make sense.
Natural language query & automated insights
A second and very important capability and increasingly looking like it is very possible, given the advances in NLP, is natural language querying for the dashboard. Here the user can type or even speak a data related question. In fact, Gartner predicts that “dashboards will be replaced with automated, conversational, mobile and dynamically generated insights customized to a user’s needs and delivered to their point of consumption. This shifts the insight knowledge from a handful of data experts to anyone in the organization.”
There are 2 components to NLP in Augmented Analytics capabilities. Natural Language Query or NLQ is used to understand the user’s query which can be typed or spoken. The query then has to be parsed and mapped to various data tables and types of analysis. Natural language generation creates textual descriptions of insights from the data. This can include explanations of data visualizations, identification of key trends and drivers in a much more intuitive way for business users.
Well, that’s the vision. Where are we on this at a practical implementation level? Basic level of NLP querying is in place in most of the leading players. The 2022 Gartner review of Augmented Analytics capabilities of different platforms, finds Sisense to be the strongest on NLP query, automated insights and story-telling. The report says, “In particular, for natural language query, Sisense Fusion leverages its knowledge graph to understand the associations between all entities, helping it to learn how to map natural language queries to the underlying data model based on usage over time”. The image below shows the NLQ functionality which also allows for predictive questions.
Screen grab from Sisense Demo
In Tableau this query functionality is called ‘Ask Data’ which provides natural language search including semantic search. Further, Tableau’s acquisition of Narrative Science, a leader in the area of Natural Language Generation, has enabled them to add textual descriptions to data via the Data Stories service.
That said, the sophistication of the system in terms of analytical complexity of questions, data volumes and data types supported tends to be widely variable. The NLQ parses the keywords in a query, and matches them with elements in related databases. This is not as yet a problem that has been fully solved. The BI platforms struggle with linking what the user wants semantically with the data base and then solving the analytics query using data. Yellowfin, a visionary in the 2022 Gartner Magic Quadrant, notes that search based NLQs face challenges mapping back a textual query to data. Hence, often the questions supported tend to be too basic to be valuable.
Some of the players ( Yellowfin, Tableau’s Ask Data) have tried to advance NLQ capabilities by having a the option of guided queries or Guided NLQ. According to Yellowfin, “Guided NLQ, is an approach to natural language query that guides the user through the process of formulating their query, by dynamically providing several lists of relevant questions, and prompting the user with popular suggested dimensions and filters, such as ‘count’ or ‘compare’ that helps them ask better questions of their data, and get more accurate answers”.
What can happen in the future?
Clearly NLQ and NLG for insight are something which have some way to go before they achieve the vision of data democratization. However, this is an area where we can expect rapid progress given, in general, the advances currently going on in Large Language Models (LLMs) such as ChatGPT. However, the LLMs, though offering powerful capabilities in natural language understanding, are too large and expensive to deploy on BI platforms. Instead, a Techcrunch blog mentions that organizations are more likely to use fine-tuned models, which are LLMs that have been further finetuned for a specific task or domain. One example is Open AI Codex which powers Github Copilot for executing natural language coding commands. However, from the BI perspective, even fine-tuned models may be too large and expensive to run. Edge Language Models which are small, portable fine-tuned models that can run on device might actually be the way to go. These are smaller, more specialized language models, suited to a particular domain and are much cheaper to deploy as they function offline or on device. The future looks bright, though some distance away…
Augmented Data Prep
A third key trend is the changes and automations in data preparation. Data prep is a messy and time consuming and tricky phase of BI Analytics. Data has been talked about as the new ‘gold’. However, raw data is rarely usable directly into analytics tools. The entire data prep cycle of ‘Extract, Transform and Load’ (ETL) is essentially the process of collating datasets from multiple sources, unifying them and cleaning up obvious problems. This is a highly technical and time consuming process , handled by data engineers and data scientists. Augmented data prep is envisaged to cut this process using smart algorithms to detect schemas, profile and catalogue data, identify meta data and recommend best actions for cleaning and enriching the data. This has several obvious benefits: firstly, incorporation of new data sources, particularly digital ones, can be done more easily; secondly, it significantly frees up time of data engineers and data scientists. Thirdly, the process is a lot faster as it reduces the extent of involvement of busy data engineering teams, often a bottleneck, in changes to BI reports.
Augmented data prep in most of the platforms is enabled via sophisticated tools that make data prep easy and intuitive for non-technical users. Typically, augmented data preparation provides access to data that is integrated from multiple sources using drag and drop features. Pyramid, for example, has data flow tools for data preparation and cleaning using drag and drop. Semantic modeling tools allow the user to join data sets using AI based suggestions. Similar features are provided by Sisense and others.
Are we there yet?
Augmented data prep is far more user-friendly with drag drop type functionalities and tools for data cleaning. It still requires the user to know what they wish to achieve in terms of the data. The vision of smart algorithms detecting schemas, making recommendations still seems some distance away.
Augmented Analytics is clearly associated with significant benefits and is a step up from the self-serve BI models. It has the ability to advance the depth of analysis by using advanced data science modeling as well as making analytics far more accessible via natural language query, data stories, automated insights, data explanations, etc. Automated data prep tools make the processing of data much easier, though not as yet a ‘smart’ process. In terms of players, the traditional leaders, Power BI and Tableau are lagging behind more recent entrants such as Pyramid Analytics, Sisense, Yellowfin among others.