Google Analytics API V4: Histogram Buckets

July 27, 2017

Back in April of last year, Google released version 4 of their reporting API. One of the new features they’ve added is the ability to request histogram buckets straight from Google, instead of binning the data yourself. Histograms allow you to examine the underlying frequency distribution of a set of data, which can help you make better decisions with your data. They’re perfect for answering questions like:

  • Do most sessions take about the same amount of time to complete, or are there distinct groups?
  • What is the relationship between session count and transactions per user?

How It Really Works

Here’s how to use this new Histogram feature yourself with the API.

Note: we’re assuming you’ve got the technical chops to handle authorizing access to your own data and issuing the requests to the API.

Here’s what a typical query looks like with the new version of the API:

{
  "reportRequests": [
    {
      "viewId": "VIEW_ID",
      "dateRanges": [
        {
          "startDate": "30daysAgo",
          "endDate": "yesterday"
        }
      ],
      "metrics": [
        {
          "expression": "ga:users"
        }
      ],
      "dimensions": [
        {
          "name": "ga:hour"
        }
      ],
      "orderBys": [
        {
          "fieldName": "ga:hour",
          "sortOrder": "ASCENDING"
        }
      ]
    }
  ]
}

This query will return a row for each hour, with the number of users that generated a session during that hour for each row; simplified, it’d be something like this:


[
  ['0', 100],
  ['1', 100],
  ['2', 100],
  ['3', 110],
  ['4', 120],
  ['5', 140],
  ['6', 220],
  ['7', 300],
  ...
]

Wouldn’t this data be more useful if it were dayparted? Let’s use the histogram feature to bucket our data into traditional TV dayparts:

Early Morning 6:00 AM – 10:00 AM
Daytime 10:00 AM – 5:00 PM
Early Fringe 5:00 PM – 8:00 PM
Prime Time 8:00 PM – 11:00 PM
Late News 11:00 PM – 12:00 PM
Late Fringe 12:00 PM – 1:00 AM
Post Late Fringe 1:00 AM – 2:00 AM
Graveyard 2:00 AM – 6:00 AM

To request our data be returned in these new buckets, we’ll need to make two modifications to our query from before. The first change we’ll make is to add a histogramBuckets array to the ga:hour object in our dimensions array. We’ll populate this with ["0", "2", "6", "10", "17", "20", "22", "23"]. Each number in this sequence marks the beginning of a new histogram bin.

The end of the bin is inferred by the number that follows it, and if values exist below the first bin’s minimum an additional bin will be tacked on for us at the beginning to contain those values. For example, if we had started our histogramBuckets with “2” instead of “0”, the API would add a new bucket to the beginning named “<2", and it would contain the values for matching rows where the ga:hour dimension was 0 or 1. The second change we need to make is to add the “orderType”: “HISTOGRAM_BUCKET” to the orderBys portion of our request.

{
  "reportRequests": [
    {
      "viewId": "70570703",
      "dateRanges": [
        {
          "startDate": "30daysAgo",
          "endDate": "yesterday"
        }
      ],
      "metrics": [
        {
          "expression": "ga:users"
        }
      ],
      "dimensions": [
        {
          "name": "ga:hour",
          "histogramBuckets": [
            "0",
            "2",
            "6",
            "10",
            "17",
            "20",
            "22",
            "24"
          ]
        }
      ],
      "orderBys": [
        {
          "fieldName": "ga:hour",
          "orderType": "HISTOGRAM_BUCKET",
          "sortOrder": "ASCENDING"
        }
      ]
    }
  ]
}

Here’s what the response for that query looks like for some data from a personal site:

{
  "reports": [
    {
      "columnHeader": {
        "dimensions": [
          "ga:hour"
        ],
        "metricHeader": {
          "metricHeaderEntries": [
            {
              "name": "ga:users",
              "type": "INTEGER"
            }
          ]
        }
      },
      "data": {
        "rows": [
          {
            "dimensions": [
              "0-1"
            ],
            "metrics": [
              {
                "values": [
                  "31"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "2-5"
            ],
            "metrics": [
              {
                "values": [
                  "113"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "6-9"
            ],
            "metrics": [
              {
                "values": [
                  "155"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "10-16"
            ],
            "metrics": [
              {
                "values": [
                  "247"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "17-19"
            ],
            "metrics": [
              {
                "values": [
                  "52"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "20-21"
            ],
            "metrics": [
              {
                "values": [
                  "25"
                ]
              }
            ]
          },
          {
            "dimensions": [
              "22-23"
            ],
            "metrics": [
              {
                "values": [
                  "21"
                ]
              }
            ]
          }
        ],
        "totals": [
          {
            "values": [
              "644"
            ]
          }
        ],
        "rowCount": 7,
        "minimums": [
          {
            "values": [
              "21"
            ]
          }
        ],
        "maximums": [
          {
            "values": [
              "247"
            ]
          }
        ],
        "isDataGolden": true
      }
    }
  ],
  "queryCost": 1
}

Some Downsides

As of this writing, the chief advantage of this feature is that it can save you a little logic and time when your own application wants to use histograms with your Google Analytics data. There’s no “give me X buckets” though – you have to know the range of your data ahead of time. Additionally, data is coerced into an integer, so floats are out.

That means if you want to generate bins dynamically (like we’re doing in our example), you need to first get the range of the data from Google Analytics, then calculate those buckets and send a second request. You may wish to simply request the raw data and calculate the histogram yourself.

Hopefully Google will add some more functionality to this feature to simplify dynamic binning, too. I’d also welcome the ability to create histograms within the Google Analytics interface! Hopefully this API feature is a sign that something like that is in the works.

There are a limited set of dimensions that can be queried in this manner; here’s a complete list:

Count of Sessions ga:sessionCount
Days Since Last Session ga:daysSinceLastSession
Session Duration ga:sessionDurationBucket
Days to Transaction ga:daysToTransaction
Year ga:year
Month of the year ga:month
Week of the Year ga:week
Day of the month ga:day
Hour ga:hour
Minute ga:minute
Month Index ga:nthMonth
Week Index ga:nthWeek
Day Index ga:nthDay
Minute Index ga:nthMinute
ISO Week of the Year ga:isoWeek
ISO Year ga:isoYear
Hour Index ga:nthHour
Any Custom Dimension ga:dimensionX (where X is the Custom Dimension index)

Great Example Use Cases

Wondering how you might use this feature? Here are some more examples to get your juices flowing:

  • Use Events to capture more accurate page load times and store the time in the label, then bin the times using the API.
  • Capture blog publish dates and see when blog posts peak in engagement
  • Look at months and transactions to identify seasonality
  • Compare Session Count and Revenue to see, in general, the number of sessions required to drive your highest revenue.

Have a clever use case of your own? Let me know about it the comments.