Fundamental problems with software user analytics

My entire career has been centered on using data to make decisions. From sales, inventory, product, financial, customer, and user data. Not only have I been a consumer of data and user of analytics products, I've built, purchased an implemented analytics products as well. So while I may not be the world's expert on analytics, I have a set of experiences that has led me to some strongly held beliefs, particularly in the area of software user analytics.

So, what is software user analytics? In short, I'm talking about data collected from web applications that tells exactly who, does exactly what in the product, and what the outcome was. This space has evolved a ton since the advent of the web. First we had simple web Analytics, which told us what content on our sites was popular and a bit about the visitors to our site (country, operating system, etc). Then, as web sites morphed into web applications, a new breed of analytics told us about usage, rather than just page loads. I call this User Analytics. Which users clicked which buttons, and what the result were.

While the differences between web analytics and user analytics may seem subtle, user analytics provide us 100x more value, in my estimation, than web analytics. That's because user analytics helps us understand user intent, and the value derived. If content is your product, such as a blog, then web analytics is probably fine for you. However, if software is your product, like social media, gaming, or banking, then you need user analytics. It can tell us where to put a button, which features to build/enhance/deprecate, what type of user gains the most value from the product, what data they desire, and so much more.

So, what's my beef with user analytics products? Honestly, I love them! Companies like Mixpanel, Google, Heap, and Segment make amazing products that I rely on. I've been successful in my career because of what they've allowed me to do as a user, and I've had my most fun at work when building analytics tools. They just aren't perfect, they leave a lot to be desired, and I believe there is an opportunity to deliver more to users today, while building for the next shift in software tomorrow.

I see 3 fundamental problems with User Analytics...

1. Most analytics providers want to own the entire solution stack

Most user analytics products do three things: they help collect data from an application, they store the data, and they offer visualization capabilities on that data. Naturally, they price for these benefits. When you buy a user analytics solution, like Heap or Mixpanel, you pay for all 3 things, whether you need all three or not. However, most businesses don't need all three for user analytics.

Any enterprise worth their salt has data as a central piece of their operations, and no piece of data they collect lives in a silo. These organizations are running a data warehouse, a collection of all the data used to run their business, including user analytics, sales data, financial, marketing, customer success and more.

While companies like Mixpanel and Heap offer best in class data capture for user analytics, they aren't a data warehouse and they don't offer best in class data visualization. In fact, its nearly impossible today to buy best in class user analytics data capture and pair it with your own data warehouse for visualization alongside your other data. Take my recent experience as an example. I recently led the selection and implementation of a software user analytics solution where I work. One of our many requirements was to reduce the effort necessary to collect usage data by at least 80%. Another of the requirements was that we could store the data in our own Redshift instance, and query it with our visualization tool of choice (Looker). So I needed to buy data capture, but not data storage or querying. We got what we needed, but we had no choice but to pay for more than we are using.

Best in class data capture for user analytics comes from Heap and Mixpanel, with their auto-track capabilities. However, Mixpanel won't let you use that data outside of their product, and Heap charges a premium for that capability. To get best in class data capture, you have to pay for data storage and data visualization, whether you want it or not. Sure, Segment makes it easy to put my user analytics data anywhere I need it, but without auto tracking, they fall short of the best in class label. My options were limited, to a single provider, and today we pay for more product than we need.

As we collect more data from more places, its not acceptable for data to live in walled gardens. We need to marry our marketing data with our sales data with our usage data. At the same time, enterprises expect control and ownership of their data. They are building Business Intelligence teams that require sophisticated data warehouses that offer flexibility in how the data can be used.

Unfortunately, buyers have to choose between enterprise grade business intelligence (Segment to Redshift, in my opinion), or effortless usage data from our web apps (Heap or Mixpanel for their auto track features). Why can't we have both, without pay for things we don't need?

2. Little value comes out of the box

In the early days of the internet, web analytics solutions offered out of the box value. Once the tracking code was dropped onto a site, web analytics tools like Google Analytics and Webtrends would tell you a whole lot about your users...no other effort required. No writing queries, no building reports, no curation of dashboards. Just login and immediately view how many visitors you had last week, which pages were popular, where your users came from and what type of device they were using.

Unfortunately, that level of data isn't good enough anymore. I need to know which form was filled out the most, what selection from my dropdown menus resulted in the most exports images, which sequence of steps result in the most purchases, and which of my sales reps have the most customers interacting with the new feature we launched in beta.

To get any of that, most user analytics solutions require the user has to put in significant effort. Start from scratch...literally a blank page. Learn a data structure and definitions. Learn a query language/system, then start creating charts, graphs, tables, and dashboards. This may sound easy, but you are looking at many, many hours of effort, only to get data you may not fully understand and may not even be correct.

The other day, I had to contact the customer support team of an analytics product I use, to get help on how to count the number of unique users my app had. With all my experience using, buying, implementing, and building analytics products...I had to contact support to get help with the most simple, fundamental of queries.

It doesn't have to be this way. It wasn't this way with early web analytics, and its not this way with software performance analytics. When I log into New Relic, I am immediately presented with data that provides value, answers questions, and leads me down a rabbit hole. Yet, when I engage with user analytics tools....I'm starting from nothing. This pain isn't isolated to getting started. Even once I've spent hours.....days setting up dashboards, reports, graphs, and charts, I'll still have to start from scratch at other times. Build a new feature? Start writing more quires to know how that feature is used! Want to know something different about your app? Get working on a new dashboard!

Nothing comes for free with user analytics. We have powerful access to data, but the data is meaningless without a significant investment to use that data. Why can't user analytics be more like New Relic software performance monitoring?

3. Users must know what they need to know

I find most user analytics tools to be incredibly limiting in the value they provide. The benefits are in theory unlimited, yet to reach much of that value, the user has to know what they need or want to know. With the amount of data we collect these days, and the rate at which our data collection grows, I believe there is significantly more value hidden in the data, and we don't even know how to extract it.

As discussed earlier, most use analytics tools require the user to write queries, create charts and graphs, and build dashboards. We do this with what we know. We know we want to see a count of user. I know I want to see what percentage of my total user base was active in the last month, I want to see which pages are the most popular, which option from the drop down menu was the most popular, and what percentage of people drop off at each of step of my ideal user flow through the product.

What if users forged their own path through my product, defying the path I thought they would take, and the path I created my reporting around? How would I know? How would I measure and see the impact? I likely wouldn't know. Something in the data would have to peek my interest and drive me to discover this funnel of user behavior, before updating my dashboard to capture my new learning.

How many more insights like this might be hidden in my user analytics data? I'm guessing there is a virtually unlimited number of observations that can be made on user analytics data, and through the noise is likely signal that we are missing, because we didn't know to look for it. I want the user analytics solution of tomorrow to tell me what I should know, instead of waiting on me to ask the question.


User analytics products aren't perfect, but they aren't all bad either. I'm a new user of Heap Analytics, and I'm really enjoying the product they have built...their auto track + retroactive data capture saved by butt the other day! Mixpanel offers a fantastic product, and is the best solution for funnel and cohort analysis on web or mobile apps. Google Analytics was a game changer and one of the best things that has happened to content producers.

What do you think, how do web and user analytics tools need to evolve in order to provide value tomorrow? Let me know on Twitter or comment where you were linked to this post.