Tracking the trends via Google’s New Book Database
December 17, 2010 by allkindsofhistory
Anyone who suspects that Google, like Starbucks, is secretly planning to take over the world might well point to the search giant’s latest innovation and smile knowingly. That’s because Google has, with surprisingly little fanfare, released a new tool that exploits its unparalleled – and ever faster-growing – holdings of data, and promises to revolutionise the lives of linguists, lexicographers and English scholars, while simultaneously churning odd the odd bit of useful data for the rest of us. As today’s New York Times explains, the company’s latest launch is its New Book Database, containing 500 billion words culled from 5.2m digitised books. Quite a few of those words can already be accessed in their intended order via Google Books, but the NBD has another function – it allows users to search across time (the database covers the period 1800-2008) to track the changing popularity of individual words, and it allows them to compare the usage of several different words over the same period.
The NYT rather worthily put the new database to use comparing the frequency with which the likes of “men” and “women” feature (turns out the latter overtakes the former around 1986), but for our purposes it’s rather more revealing to track the progress of various Fortean topics. The results turn out to be informative. Take the frequency with which the phrase “Loch Ness Monster” appears, for example [top – you can click on all the graphs to see them in a much larger and more easily readable format].
Mentions of the LNM peak in the late 1930s – in fact surprisingly late in the 1930s, perhaps reflecting a delay in translating newspaper coverage into references in published books. The phrase then undergoes a sharp fall in popularity, only to revive in the 1950s and peak around 1977-78, at pretty much the time that optimism about the Rines underwater photos was at its height. What’s really striking is that the phrase continues to grow in popularity pretty much until 2000, despite a clear decline public interest in the subject. What does this indicate? That the words have passed into common currency, most probably, so that “Loch Ness Monster” is used as a metaphor nearly as often as it is as it is to refer to a – supposedly – living beast.
Here, anyway, are the results of some further searches. We can see how “UFO” swiftly overtook the earlier “Flying Saucer” [above], and or how the number of references to angels soared in the run-up to the Millennium. More interesting, perhaps, are searches that track the relative performance of terms against each other – witness the triumph of “Bigfoot” over “Abominable Snowman” and “Sasquatch” [right]. These can show up some quite significant long-term trends. Used intelligently, indeed, there’s probably a paper or two in the idea somewhere.
What is there to say, for example, about the ups and downs of this fairly random series of other Fortean phenomena [right]? What has caused the huge surge in the use of the word “teleportation”? Does this reflect nothing more than an abundance of borrowings in the science fiction literature (and a surfeit of Star Trek
movies)? Is it linked to popular belief in UFO abductions? Or is something else altogether going on? And, while the numbers are probably too low to be statistically significant, can it be that more people are actually writing about ley lines now, in the 2000s, even though the Old Straight Track strikes most young Forteans as about as unashamedly 1970s as Slade and spandex loon pants?
What, finally, of the word “Fortean” itself [left]? Well, here the news is not so great. The NBD reveals a peak just after the year 2000, followed by what looks suspiciously like the beginnings of a long, sharp and irreversible decline. Anyway, the tool is easy to use and pretty addictive to play with. Feel free to give it a whirl at the site homepage, here
UPDATE 24 October 2012. Google has released a significantly improved ngram viewer which makes more elaborate searches possible and smoothes out most of the data incongruities that marred the first release.