How to visualize data

Script version 1.3

Aggregating data into a single image shows you the Big Picture, enabling insights that wouldn't be possible if you just pored over the numbers or words independently. This tutorial introduces you to some basic techniques for visualizing data, with an emphasis on tools that make it easy to turn numbers into charts and words into tag clouds.

Visualization types

There are dozens of types of charts and diagrams for visualizing numerical data, from bar, line, and pie charts to bubble diagrams to geographic and heat maps. But it is also possible to gain insights from visualizing the structure of written texts. If visualization has historically been more common to the sciences, the digital humanities have been invigorated by new techniques for understanding the relationships among words in texts, from the State of the Union to Shakespeare.

A tag cloud is one of the most common ways to visualize word frequency, and has become popular as a non-hierarchic means of navigating blogs and other textual compendiums. A typical tag cloud displays words used more often larger than words used only once or twice. Often clicking on a word in a tag cloud will bring up a list of pages or records that includes that tag. Some tag clouds allow you to select more than one tag and see only records that contain both.

Tag clouds are typically built in a scripting language such as PHP or JavaScript. Most of these programs exploit a list of so-called "stop words" that shouldn't count for the cloud. Typically these include common words such as the, is, like, and because, though the program may also exclude special words such as proper names.

More sophisticated tag clouds then apply a "stemming" algorithm to lump words with similar roots together. For example, a tag cloud might replace the words electing, elects, and elections with a similartandard word such as election. This prevents the tally of word counts from being split across all the variations and results in a more accurate representation of key ideas in the source text.

Data sources

Some applications will work with raw material (such as a text file of a politician's speech) while others required a dataset that is already pre-digested and structured. Common formats for the latter include CSV (comma-separated text), JSON (JavaScript Object Notation), RSS (news headlines), or data directly from a Google Sheet or public API (an online source of JSON).

Many of the applications listed here offer free online datasets you can use if you don't have a dataset of your own to upload.

Visualization apps

The applications below all have a free option and are ranked roughly from easiest to most complex.

WordClouds.com

WordClouds is a free Web service that generates tag clouds from text you paste into its online form. WordClouds offers a variety of colors and typographic styles, though the images are not interactive and thus cannot be manipulated by the end user.

👍 Helps you analyze data as well as visualize it.

👍 You can run it right in the browser.

👍 Variety of color, font, and layout options.

👎 No interactivity.

👎 No live updates.

Flourish Public

Flourish is a browser-based app focused on telling stories by presenting a series of charts and other visualizations.

👍 You can create animated slideshows with multiple visualizations, to view at Flourish.studio or embed in your website.

👍 You can run it right in the browser.

👍 You can upload Excel, comma delimited files, or JSON.

👎 Doesn't do the analysis from raw material; can only visualize pre-digested data.

👎 Can't load data from online spreadsheets (such as Google Sheets) in free version.

Other options for browser-based visualizations include Infogram (good for marketers) and Datawrapper (good for journalists).

Tableau Public

Tableau Public is a downloadable app that offers a robust suite of data visualization tools. Tableau has a significant learning curve but offers videos tutorials on how to use it.

👍 Helps you analyze data as well as visualize it.

👍 Wide variety of interactive charts.

👍 Works well with large datasets.

👎 No option to save locally.

👎 All creations must be shared publicly.

👎 Requires local installation of a 1.8 GB app.

Microsoft Power BI ("Business Intelligence")

Microsoft Power BI is geared toward small and mid-sized businesses.

👍 Helps you analyze data as well as visualize it.

👍 Can be used in the cloud or via a 300MB local download that connects to online sources.

👎 Some limits on size of datasets.

Spreadsheets (quantitative applications)

Several mainstream office applications make it relatively to produce bar and pie charts, line graphs, scatter plots, and many more visualizations of tabular data that you have in a spreadsheet. Microsoft Excel, Apple's Numbers, and to a lesser extent Google Sheets let you analyze data and create a variety of charts, though it can be a little tricky to export the charts as images to use outside the application. For more sophisticated users, MATLAB is a programming environment that can plot complex mathematical functions. All these permit analysis of data as well as visualization.

👍 Helps you analyze data as well as visualize it.

👎 Optimized for desktop rather than online version.

👎 Limited sharing options.

Code

Creators familiar with programming can generate dynamically updated charts. To post such charts on the web, several services offer an API that creates charts and graphs from snippets of code that you can embed in your own Web page.

Google Charts has a very compact JavaScript data notation, which makes it lightweight but also can make the code a little hard to read. Unlike some of the other image-generating services, you must be online to use Google Charts; making up for this disadvantage is the fact that any changes you make in the data in your page will be updated in real-time. Creators looking for more options can try the full Google Data Studio.

Plotly offers more types of charts and can work with various languages, including JavaScript, Python, and R. R is also a popular programming language for general data visualization.

Tagline

Tagline is a free PHP library that combines a tag cloud with a slider to show trends in texts that evolve over time. A great example is this US Presidential Speeches Tag Cloud, which shows both changing and recurrent themes for American presidents from 1776 to 2007. (War and the economy are always prominent, though the country the US is at war with changes every few decades.)

👍 Helps you analyze data as well as visualize it.

👍 Interactive timeline feature.

👎 Using Tagline requires some knowledge of PHP and your own server.

Infranodus

A successor to Textexture, Infranodus creates a network diagram indicating which words are most closely tied to others in the text. The result is analogous to a "small-world" graph that shows the degrees of separation between any two friends in a social network like Facebook.

👍 Helps you analyze data as well as visualize it.

👍 Creates attractive, interactive network visualizations.

👎 Using it online requires paid version.

👎 Open source version requires technical skill to install locally (node, Neo4J, Evernote and Twitter APIs).

Conclusion

I hope this tutorial has shown you how visualization can make the whole greater than the sum of its parts. You can find more free new media tutorials here.