Since the historical newspaper Computational Social Sciences was published in 2009. A new generation of data analysis tools has given researchers insight into fundamental questions about how we communicate, who we are and what we value.
For example, by analyzing the relative frequency of certain words in historical texts, researchers can identify important changes in our use of language over time.
In some cases, these shifts will be obvious, such as the use of archaic words being replaced by more contemporary words. But in other cases, they can reflect more subtle but widespread social and cultural changes. Below are some of the most influential data-centric discoveries of the past 10 years.
How we communicate
Over the past decade, a growing number of global open data sources have helped researchers discover patterns in what we read, write, and pay attention to. google books, world cat and Project Gutenberg are just a few examples.
The release of Google Books n-gram viewer in the early 2010s was a game changer on this front. This tool leverages the entire Google Books database and shows you the relative frequency of a specific term or phrase as it has been used over hundreds of years. Researchers have used this data to investigate the systematic suppression of mention of Jewish painters, such as Marc Chagall, in German books during World War II.
Data analysis can also reveal patterns in the expression of human emotions over time. CSIROs We feel tracks emotions in communities around the world. It does this by analyzing and mapping the language people use on social media in real time.
The tool can be used to determine the general mood over time (hour by hour, day by day) within certain cities and countries. Patterns in this data can then be examined in conjunction with other information, such as weather, vacations, and economic fluctuations.
Some research findings even claim to represent fundamental changes in people’s social values, sense of community, and how we think (for example, the rise and fall of words associated with rationality, such as “method,” “analysis,” and “determine”).
Here are some key findings in this space:
- Cultural turnover accelerates A Harvard University-led analysis more than a century of data from millions of books provides evidence that society’s attention span to historical events decreases as the hunger for new material increases. In other words, we forget the past faster. You can see this in the chart below, which tracks how often three specific years are mentioned in a wide variety of literature over time. As time goes by, each year’s “half-life” (the point where it only gets half the attention it had at its peak) comes faster.
- Human language diversity and biodiversity are correlated By mapping linguistic and animal species diversity, researchers shown these two worlds are geographically correlated – both increase with temperature and proximity to the equator. So the closer you get to the equator, the more variety there is in spoken language and the greater the variety of species. provides more complex and interactive environments for both animals and humans – nourishing in a cycle where “diversity breeds more diversity”.
- There have been societal shifts in language use over the past century In an article published in December, researchers used machine learning to demonstrate long-lasting, consistent changes in our language use. Specifically, they reveal an inflection point in the 1980s where there is a shift towards more egocentric, emotional and supposedly less rational language. The authors suggest (although not without competition) this could be the beginning of a “post-truth era”.
Who we are
In the field of psychology, the same data analysis tools have shown that people’s personalities can be measured using the “Big 5” characteristics, which are largely stable in adulthood†
This was made possible thanks to extensive data sets such as HILDA in Australia, the German Socio-Economic Panel in Germany and the British Household Panel Survey in the UK.
Robust studies have also shown that personality traits can be reliably and accurately predicted from a variety of data sources, including voice recordings† mobile phone usage patterns even portrait photos†
In turn, some notable associations have been found on scale between personality and:
- Elevation A study published in 2020, based on data from more than three million people, shows people who live in the mountains tend to have different personality traits than people who live at sea level. They are generally more open to new experiences and more emotionally stable.
- Place Another previous study shows that people living in the United States can be divided into: three clear and measurable clusters of personality types, linked to associated geographic footprints. New Yorkers and Texans (who are in the same cluster) tend to be temperamental and uninhibited.
- activity In our own research published in 2019 with colleagues, we analyzed the personality traits of people in more than 1,000 different occupations. We found people in the same role have similar traits. Scientists are even more open to new ideas ready to arguewhile tennis professionals are generally friendly and outgoing. The study used machine learning to infer the personality traits of more than 100,000 people based on the language used on social media.
What we appreciate
In economics, we see that data analysis opens up major research frontiers, including in:
- network science When it comes to success, we’ve learned that performance matters most when it can be measured (as in sports). But in other areas where it can’t be measured easily (such as in the art world), networking matter most†
- behavioral economics We can now see how we behave as individuals en masse, revealing valuable clues to effective employment, tax and education policy interventions. For example, a large-scale study revealed that those who returned to the labor market the fastest exhibited certain key behaviors. These include getting up early and being geographically mobile (perhaps meaning they are more willing to travel further or relocate for work).
Some have argued that data science poses a fundamental challenge to the traditional sciences, with the emergence of “post-theory science† This is the concept that machines understand the relationship between data and reality better than the traditional scientific method of hypothesise, predict and test†
However, reports from the death of theory may be greatly exaggerated. Data is not perfect. And data science based on incomplete or biased data has the potential to miss or mask important patterns in human activity. This can only be addressed through critical thinking and theory.