Quantifying Crowd Size with Mobile Phone and Twitter Data

When a protest takes place, estimating the number of people present poses problems for authorities and organisers alike. We suggest that data generated simply through our use of mobile phones and Twitter might offer surprisingly accurate estimates of crowd size. We analysed data from Milan, Italy, and found that we could estimate attendance numbers for football matches in San Siro stadium, as well as the number of people at Linate airport at any given time. Our results could be of value in a range of emergency situations, such as evacuations and crowd disasters.

The results of this analysis have been featured in:

Science: Measuring the mobs

BBC: Crowds ‘could be counted’ with phone and Twitter data Si può calcolare la quantità di una folla grazie a Twitter Smartphones, Twitter help gauge crowd size

Business Insider: Here’s how Twitter can help first responders in an emergency

Have a look at the original journal article if you are interested!

I presented the results of this work at the CS-DC’15 world e-conference. The presentation was recorded and is now available for everyone to access.

Or watch a live interview on BBC World News about this work:

You can also listen to a live radio interview on BBC Radio Scotland!

Or to another live radio interview on BBC Radio Wales!


News bulletins reported our results in several countries, including: UK, France, U.S.A., China, Italy, Russia, Germany, India, Kenya, Pakistan, Malaysia, Jamaica, Australia, Indonesia, Bangladesh, Brazil

I would like to acknowledge that this project would have not been possible without the Big Data Challenge 2014 set up by Telecom Italia, which provided us with most of the data (with the exception of the attendees figures and the flights schedule). All of the data released for the challenge (with the exception of the Twitter dataset) are now also open access for everyone to download: Open Big Data


Finding network communities using modularity density

The analysis of increasingly complex data sets requires complex methodologies. A powerful tool which has gained increasing importance in the last two decades is that of networks. Networks are ubiquitous in society. Computers are connected together through the Internet; pages on the web have hyperlinks that allow users to navigate from page to page; cities are connected by airports and train stations; people are linked to each other on various levels, such as kinship, friendship, and work relationship; scientific discoveries build on previous work, thus creating links between scientists. Network thinking allows to develop a framework to model and understand the properties of these systems.

Many real-world complex networks exhibit a community structure, in which the modules correspond to actual functional units. Identifying these communities is a key challenge for scientists. A common approach is to search for the network partition that maximizes a quality function. Here, we present a detailed analysis of a recently proposed function, namely modularity density. We show that it does not incur in the drawbacks suffered by traditional modularity, and that it can identify networks without ground-truth community structure, deriving its analytical dependence on link density in generic random graphs. In addition, we show that modularity density allows an easy comparison between networks of different sizes. Finally, we introduce an efficient community detection algorithm based on modularity density maximization, validating its accuracy against theoretical predictions and on a set of benchmark networks.

Quantifying Stock Return Distributions in Financial Markets

Being able to quantify the probability of large price changes in stock markets is of crucial importance in understanding financial crises that affect the lives of people worldwide. Large changes in stock market prices can arise abruptly, within a matter of minutes, or develop across much longer time scales. Here, we analyze a dataset comprising the stocks forming the Dow Jones Industrial Average at a second by second resolution in the period from January 2008 to July 2010 in order to quantify the distribution of changes in market prices at a range of time scales. We find that the tails of the distributions of logarithmic price changes, or returns, exhibit power law decays for time scales ranging from 300 seconds to 3600 seconds. For larger time scales, we find that the distributions tails exhibit exponential decay. Our findings may inform the development of models of market behavior across varying time scales.