What has big data taught you about human culture?
I’ve learned about human culture and that the study of big data reveals the extent to which there are trends in cultural change that are completely invisible to us in our day-to-day lives. We don’t notice them and we aren’t aware of them, but they are completely measurable when we have access to the appropriate data sets. It’s interesting to think that people who lived through quantitative trends that can be very precisely described have sometimes experienced those trends without being conscious of them. And we can become conscious of those trends by using these kinds of powerful data approaches to dissect them.
Could you give an example of these invisible trends?
There are all kinds of very subtle changes, for instance in the grammar of a particular language. An example that I gave during my talk is: how do you say the past tense of a word, do you say: “I thrived yesterday” or “I throve”? Do you say: “I smote him with the sword” or “I smited him with the sword”? These types of changes actually take place in a fairly deterministic fashion. It might need centuries for such transitions to happen, so even though we live through them, we aren’t necessarily aware of them.
Was your research on human culture just the beginning?
When we started to investigate the frequency of words and phrases over time, for instance, we interrogated tens of thousands of such trajectories. By now hundreds of millions of people have used these tools, and it is remarkable how many research journeys have been kicked off by that.
I actually had a great time in the taxi talking to Kevin Ashton about the enneagram for creativity, about which he has written in really fascinating ways. He took that data and went in a direction I certainly would never have envisioned. And that is really what’s so exciting about these massive data sets. When you create them, you have your own ideas about where you are going with the data. But ultimately people will be able to go in directions that you never anticipated. That’s kind of baked into the process, baked into the nature of these kinds of data sets.
In what other areas will big data be an eye-opener?
Well, I think human culture is an example, but it’s going to be ubiquitous. In biomedicine, for instance, there’s this notion of patients, just people who in the course of their ordinary lives are going to be surrounded by sensors that do things such as measure their health. And it’s the patient who will say, “Some of the sensor values are wrong. I need to see a doctor and involve him in this healthcare process”. This ubiquitous biomedical sensing is going to become a day-to-day part of the genome revolution. Biology is another example of the notion that the instructions for creating any organism such as a human are encoded in our DNA, which is a massive data set, i.e. the human genome. This had a transformative impact in biology.
If you look at the economy, all of a sudden you can now study markets comprehensively rather than with an anecdote here or an anecdote there. But you’re going to have to comprehensively study every transaction that’s taken place in a particular market. This is changing the way in which we study the markets, and even the way in which the markets behave in a context of high-frequency trading, for example. Discipline after discipline, you can see that the notion that data can be rapidly aggregated, moved, analysed and acted upon is a transformative principle of a contemporary experience.
Why did you switch from human culture to the human genome?
At any given moment in time, you need to ask what you want to do by studying human culture. We’ve continued to be active in this field, but we’ve been joined by hundreds of millions of people who are using some of the tools that we’ve created, some of the tools that others have created to interrogate human culture and changes in human culture. That’s incredibly exciting. For me, it also says: hey, there are other problems that are waiting to be cracked open when no one’s looking. But you can imagine that massive numbers of people will be thinking about these data sets if the critical experimental hurdles can be overcome, if the critical initial activation energy was put in. In the age of big data, you will have people who are a bit nomadic, who can move from one discipline to another, and the unifying theme will be thinking about massive data sets, thinking about how massive data sets can organise your understanding of a whole discipline rather than coming from a more disciplined central focus.
What does a 3D map of the human genome teach us?
The human genome is physically a large molecule. People don’t realise that the genome which is inside every one of your cells is stretched out from end to end. It is taller than you are. It’s in a single cell and fits into the nucleus which is only a few microns wide. Clearly, it must fold up somehow. What’s remarkable is that we are learning that the way in which the genome folds varies from one cell type to another. Your genome is the same in your heart cells which beat and your brain cells which think, but it is folded in very different ways. This has raised questions about how the human genome is folded and how the folding facilitates the function of these various types of cells, what role the folding plays in health and disease. This is what we are starting to learn from these maps.
To what extent has this helped to accelerate the research in genomics?
One area that wasn’t deeply appreciated prior to some of my work was how to interrogate genome folding in practice and the insights this might yield for the mechanisms that shaped DNA and the patterns of gene activity, and thus patterns of cell function. I think that my work in genomics really helped to accelerate that. The nature of the scientific enterprise is such that every generation builds on the work of previous generations. We were able to combine some technological factors that had been developed decades ago with our work. I want to emphasise that, especially when you look at something like genomics, it is really the integration of ideas that have been developed over a century or more that has enabled us to do things like map genome folding in various cell types and in health and disease, and to understand its relationship to the kinds of clinical outcomes that we care about.