How to visualize data

Script version 1.0

Aggregating data into a single image shows you the Big Picture, enabling insights that wouldn't be possible if you just pored over the numbers or words independently.

This tutorial introduces you to some basic techniques for visualizing data, with an emphasis on tools that make it easy to turn numbers into charts and words into tag clouds.

Visualizing numbers

Charts and graphs

Spreadsheets

Several mainstream office applications make it easy to produce bar and pie charts, line graphs, scatter plots, and many more visualizations of tabular data that you have in a spreadsheet. Microsoft Excel and Apple's Numbers include options to create a variety of charts, though it can be a little tricky to export the charts as images to use outside the application. For more sophisticated users, MATLAB is a programming environment that can plot complex mathematical functions.

Google Charts

Google Charts has an API that creates charts and graphs from snippets of JavaScript code that you can embed in your own Web page. It's very compact, which makes it lightweight but also can make the code a little hard to read. Unlike some of the other image-generating services, you must be online to use Google Charts; making up for this disadvantage is the fact that any changes you make in the data in your page will be updated in real-time.

Many Eyes

Many Eyes is a free service from IBM that offers special visualization types like maps of countries and treemaps, which nest topics in rectangles according to their proportions in the population you are studying. While you keep the data on your own page when using Google Charts, Many Eyes expects you to upload your own data to the site. As a side benefit, however, you can use data sets uploaded by other users, from US census data to John Lennon lyrics.

Visualizing words

If visualization has historically been more common to the sciences, the digital humanities have been invigorated by new techniques for understanding the relationships among words in texts, from the State of the Union to Shakespeare.

Diagrams

Textexture

Textexture creates a network diagram indicating which words are most closely tied to others in the text. The result is analogous to a "small-world" graph that shows the degrees of separation between any two friends in a social network like Facebook.

Scalar

Scalar is a publishing platform that does not assume any pre-ordained sequence among pages, but instead lets authors assign them to various "paths." These paths can be visualized using the D3 JavaScript library, which can display a fan diagram of connections between all the pages.

Tag clouds

A tag cloud is one of the most common ways to visualize word frequency, and has become popular as a non-hierarchic means of navigating blogs and other textual compendiums. A typical tag cloud displays words used more often larger than words used only once or twice. Often clicking on a word in a tag cloud will bring up a list of pages or records that includes that tag. Some tag clouds allow you to select more than one tag and see only records that contain both.

Tag clouds are typically built in a scripting language such as PHP or JavaScript. Most of these programs exploit a list of so-called "stop words" that shouldn't count for the cloud. Typically these include common words such as the, is, like, and because, though the program may also exclude special words such as proper names.

More sophisticated tag clouds then apply a "stemming" algorithm to lump words with similar roots together. For example, a tag cloud might replace the words electing, elects, and elections with a standard word such as election. This prevents the tally of word counts from being split across all the variations and results in a more accurate representation of key ideas in the source text.

Wordle

Wordle is a free Web service that generates tag clouds from text you paste into its online form. Wordle offers a variety of colors and typographic styles, though the images are not interactive and thus cannot be manipulated by the end user.

Tagline

Tagline is a free PHP library that combines a tag cloud with a slider to show trends in texts that evolve over time. A great example is this US Presidential Speeches Tag Cloud, which shows both changing and recurrent themes for American presidents from 1776 to 2007. (War and the economy are always prominent, though the country the US is at war with changes every few decades.)

Using Tagline requires some knowledge of PHP and your own server.

ThoughtMesh

ThoughtMesh is an unusual model for publishing and discovering scholarly papers online. It gives readers a tag-based navigation system that uses keywords to connect excerpts of essays published on different Web sites.

Add your essay to the mesh, and ThoughtMesh gives you a traditional navigation menu plus a tag cloud that enables nonlinear access to text excerpts. You can navigate across excerpts both within the original essay and from related essays distributed across the mesh.

So let's say you are reading an essay on Modern art. You can pick a single word out of that essay's tag cloud--say Picasso--and view a list of all the sections from that essay that relate to Picasso. Or you can view a list of sections of other articles tagged with Picasso, and jump right to one of those sections. You can also combine tags to narrow your search, such as Picasso + Cubism + 1900.

Conclusion

I hope this tutorial has shown you how visualization can make the whole greater than the sum of its parts. You can find more free new media tutorials here.