Structuring data
for tableau

This workshop will dive into how to successfully structure data sets for Tableau. We’ll discuss the five most common data structure and cleaning missteps and introduce basic pivots, unions, and joins.

Start
Here

Before we dive into restructuring data in Tableau, here’s a reminder about how to use this guide as well as a few notes about why data is messy.

How to use this guide

We move pretty quickly during the workshop. There’s a lot to cover in just 90 minutes! If you miss a step, don’t worry. You can find all the materials we discuss here. Use this guide to solidify your skills or as a reference this year as you dive into your own Tableau projects.

BALTIMORE’S 311 DATA

This workshop uses Baltimore’s 311 data from 2016. Click here to download the Excel spreadsheet we use during the workshop. The original data as well as its documentation can be found here. Thank you to Open Data Baltimore for making this data available to the public.

TABLEAU WORKBOOK

The complete Tableau workbook we use during the workshop can be found here. Just go to the bottom right hand corner and download your own copy. Then you can deconstruct each visualization we built with a fresh version.

STEP-BY-STEP GUIDE

The guide that follows documents the key steps that go into building each of the different chart types we create during the workshop. If you get stuck, download a copy of the original workbook and compare your worksheet to the ones there. Good luck!

Two Kinds of Messy Data

Prepping a data set for analysis is often much more time consuming than actually analyzing the data. When we talk about messy data, we’re usually talking about one of these two problems. The first one is easy to solve. The second one is much harder. To learn more, start by reading Lisa Gitelman and Virginia Jackson’s essay “Raw Data” is an Oxymoron.

STRUCTURAL

First we have structural challenges. In these scenarios, we have all the data we need for our analysis, but structure of the table (literally how the columns, rows, and cells are formatted) won’t play nicely with computer program we’re using. The problems we’ll work with in this workshop fall into this structural category, and much easier to fix than the second category of problems.

INFORMATIONAL

The second set of messy data problems have to do with the information itself: how the data was collected and what each cell actually represents. This is the hard stuff. Data, like art, doesn’t exist without us. We create data in our attempts to capture and better understand the world around us. In the act of observing and collecting, we shape the data with each decision we make.

Five
Problems

Let’s work through the five most common problems folks run into when they connect their own data to Tableau. These more technical solutions open up all the visualization possibilities we saw in the first workshop.

Concatenated Variables

When multiple variables have been shoved into a single column, we can’t work with any of them. Take a look at the Geo Location variable in our 311 data. This one column has four variables in it: the street address, the zip code, and the latitude and longitude coordinates. The solution here is to SPLIT that one column into four separate columns.

Data in the headers

When a data set has been maintained by hand, data often crops up in the headers. Take a look four columns titled: In-House, Phone, Email, and Mobile Apps. Instead of too much data in one column, we have one variable–Methods Received–spread across four columns. The solution is to PIVOT the data into a single column.

Too much data

When our data set has a bunch of extra data that’s not necessary for our analysis, it becomes cumbersome and just slows down our system. The solution is  to add a DATA SOURCE FILTER that will narrow down our data set before our data even hits the data pane in our worksheet building environment.

new records

When new records pop up with the same data structure as an existing data set, we often want to just tack those records onto our current data set without redoing any of our Tableau work. The solution here is to use a UNION these new records with the existing table in our dataset.

new columns

And finally, we often have new variables that we want utilize alongside our existing variables. A JOIN will allow us to tack on these new variables by using a common variable that exists in both tables. In our case, we’ll use the Agency variable to add the phone number and website for each department in our 311 data.

Now that you’re data is all set up, you can always go back to the first workshop and check out how to use Different Charts for Different Data.

YOUR FEEDBACK

Thank you so much for participating in this pair of workshops with me! I’m always trying to improve my content. If you’d be willing to answer just five quick questions, I’ll work to incorporate your feedback in future workshops. Thank you!

CLICK HERE TO TAKE A SHORT SURVEY

OUR MISSION

To help democratize quantitative knowledge by creating educational resources and experiences that make data visualization–and by extension, data itself–approachable, exciting, and meaningful.

DATA DOZEN, LLC

Copyright © 2018