On Business Intelligence

By Radosław Miernik · Published on 28 April 2023 · Comment on Reddit

Intro

If there’s data, there’s a reason to look into it – that’s a fact. Business intelligence (BI for short) is virtually everything that helps your business thrive using the data you have (or can have). And even though hiring a psychic to analyze your database is also technically a BI technique, I would rather recommend starting off with a plain old and boring chart instead.

Before we dive into the topic, let’s make it clear – BI tools are just tools. That is, they can help you answer questions and navigate through the toughest decisions. But it is you who use them, and it is you who ask questions. Sure, you can outsource it or hire a professional to help you with that, but they won’t read your mind. (That’s where the psychic comes in.)

But let’s start with the basics.

Where to start

Honestly? Don’t do anything at first. As long as you’re able to somehow import your data into a spreadsheet (Excel, Google Sheets, whatever), you’ll be fine for quite a while. Whether you want a chart or a table, some numerical analysis (e.g., average, median, standard deviation), or even basic machine learning (e.g., linear regression) – it has you covered!

Next, I’d start small with only a few high-level metrics that’d help you or your users. It can be as trivial as an aggregated number of something grouped per type (e.g., tasks in a given state). Yes, it’s still technically BI and will most likely bring immense value – counting things can get tedious and error-prone even if there are only less than forty elements.

Once you define and implement a few of them, you’ll have time to look into “proper” BI tools. In my experience, these few metrics buy you literally years at the cost of dozens of hours and a tiny fraction of cost. Remember that your end users (mostly) don’t mind the tools you’re using.

Tools and their audiences

One of the key factors in choosing a tool is the technical level of your target audience. As a rule of thumb, the vast majority of your end users won’t know SQL or how to create dashboards using YAML. On the other hand, your data analysts and product team will, and will most likely even prefer that over clicking through fancy configurators.

That scale is strongly reflected in the tooling:

Some are (almost) entirely self-service, e.g., Microsoft Power BI or Salesforce Tableau. From my experience (as a developer), it’s enough to spend an hour or two on setting it up and then let your analysts – or even business owners – play with them.
Some require more setup and some technical expertise to operate, e.g., Looker. Most of the work is done via a graphical interface, but there’s plenty of SQL (and custom configuration) involved. I’ve been using Looker for a while now (before Google bought it in early 2020), and I have rather mixed feelings about it.
Some are (almost) entirely code-driven, e.g., AWS CloudWatch or Grafana. Sure, there are some graphical visualization builders, but they’re more oriented toward developers; even the documentation is more technical. I highly recommend having one of these, as they can serve other purposes as well (e.g., send notifications when something goes wrong).

Interestingly, this separation is reflected in a completely unrelated area – data privacy. Your end users should access their own data exclusively, but your data analysts and product team are often more interested in general trends. (Whether or not they should have access to the real and unanonimized data is a separate topic.) While some tools have such functionalities built-in (e.g., access_filter in Looker), others do not and require additional care.

Everyone uses SQL

Right? Well, not really. As the first project, I needed a full-blown BI tool for using MongoDB as the main database. I wanted to connect these two directly, too – a direct connection would be cheaper but also allow the BI tool to operate in real-time. There are two ways to do so: either the BI tool understands MongoDB, or I somehow expose an understandable (in most cases SQL) interface to it.

It turns out the former limits the available tools significantly. Sure, there are some that support MongoDB, but it’s often highly constrained, e.g., doesn’t support views, newer operators and pipeline stages, or non-trivial data (like arrays of objects). There are also limitations on the tool itself, e.g., joining data is highly inefficient.

The latter has a plug-and-play solution: MongoDB Connector for BI. We’d evaluated it for about a month and… It was terrible. Firstly, the SQL interface is generated by sampling your data – it’s a good idea, but there’s no way to configure (or skip) it. If you use views, modify only a part of the database, face an error, or restart your database – you’re in bad luck. It can’t even notify you about an error or show a progress bar nor list the completed samplings. The list goes on.

And then there’s performance… Let me put it straight – we’re not talking 3 seconds slow; we’re talking 3 minutes slow. Luckily, we can look into the generated aggregations (almost all SQL operations translate to an aggregation; some are simple enough to be covered with a query), but there’s no way to improve them. Sure, better SQL queries led to better aggregations, but most BI tools generate complex and truly awful SQL queries.

So, what can we do then? We’ve decided to use an ETL tool to replicate our data in a SQL database. We’ve tried a few (seven, to be precise) and ended up using Hevo (some others had far steeper pricing or had problems with complex structures MongoDB). Adding an ETL tool complicates the infrastructure, incurs additional costs (you have to pay for the tool and for the SQL database), and raises privacy concerns (one more company that can access your data). I’m not saying it’s a deal breaker, but keep that in mind.

When it comes to cost

While some BI tools have a low entry-level cost, they tend to cost quite a lot in practice. That’s because the cheaper plans have rather low data caps, like a few gigabytes of (processed) data. Some of them (e.g., Looker) have no publicly available pricing at all and will only give you a quote once contacted. We’ve reached out to a few of them, and if you’re interested – we’re talking thousands of dollars a month for starters (at least it’s not growing rapidly afterwards).

The second thing is the infrastructure cost. You most likely want an additional database (or at least a replica not to interfere with daily operations), data transfer bandwidth to the BI tool (it can be higher than your app’s), and maybe an ETL tool. It’s not doubling the cost of the BI tool itself, but can add up easily. It’s also a piece of your architecture and requires maintenance.

Then there’s portability. While the “custom code” (e.g., SQL pieces) are easy to be moved, often the visualizations or dashboards have no common format, and once you decide to try another tool, you’ll have to recreate all of them. It’s most likely not a lot of work, but it makes it harder to migrate. Such vendor locking is not a big deal but can cripple trying out other tools.

While doing research for this text, I found quite a few cheap or free (and open source) full-blown BI tools. Some of them look great, but I can’t tell if they’re any good without trying. Hopefully, I’ll have a chance to give them a shot in the following months. Check them yourself on the Awesome Business Intelligence list on GitHub.

Lastly, here’s one for free: make it a feature. For some reason, BI capabilities are considered “enterprise” and, as such, can come with a price tag. Whether it’s an additional feature focused on a limited group of users or the selling point of your app – it can be the next big thing. In other words: instead of lowering the costs, focus on increasing the revenue.

Closing thoughts

I really like working with data. Starting off by building charts, getting new insights from them, forging them into ideas… But it’s hard. And it takes time. That’s exactly where tools come in and bring immense value to your business. Whether it’s a spreadsheet or an enterprise-grade BI tool mastered by only a few people – these are only tools, and you have to operate them.

Artificial intelligence is playing a bigger and bigger role in data analysis, but we’re not yet in a place where a very smart tool can answer questions like “what should we do to earn more”. But hey, if you’d like to know “what is the busiest day of the week in my data”, some tools will do that in no time (and won’t hallucinate while answering).

Now go and chart the hell out of your data.