Complex Networks VIII - Proceedings of the 8th Conference on Complex Networks CompleNet 2017
B. Gonçalves, R. Menezes, R. Sinatra, V. Zlatik (Editors)
Springer (2017) ISBN: 9783319542416
Complex Networks VII - Proceedings of the 7th Workshop on Complex Networks CompleNet 2016
H. Cherifi, B. Gonçalves, R. Menezes, R. Sinatra (Editors)
Springer (2016) ISBN: 9783319305684
Social Phenomena: From Data To Models
B. Gonçalves, N. Perra (Editors)
Springer (2015) ISBN: 9783319140100
Table of Contents


Modeling and predicting human infectious diseases
N. Perra, B. Gonçalves
In Social Phenomena: From Data To Models 59, (2015)
B. Gonçalves, N. Perra (Editors)
Springer (2015) ISBN: 9783319140100
Social networks, contagion processes and the spreading of infectious diseases
B. Gonçalves, N. Perra, A. Vespignani
In Handbook of Systems Biology: Concepts and Insights
Job Dekker, Marc Vidal and A.J. Marian Walhout (Editors)
Springer (2012) ISBN: 9780123859440
Abuse of social media and political manipulation
B. Gonçalves, M. Connover, F. Menczer
In The Death of The Internet
M. Jakobsson (Editor)
Wiley (2012) ISBN: 9781118062418


† ``Equally contributing first author''

Immigrant community integration in world cities
F. Lamanna, M. Lenormand, M. Henar Salas-Olmedo, G. Romanillos, B. Gonçalves, J. J. Ramasco
arXiv: 1611.01056
Migrant and hosting communities face long-term challenges in the integration process. Immigrants must adapt to new laws and ways of life, while hosts need to adjust to multicultural societies. Integration impacts many facets of life such as access to jobs, real state and public services and can be well approximated by the extent of spatial segregation of minority group residence. Here we conduct an extensive study of immigrant integration in 53 world cities by using Twitter language detection and by introducing metrics of spatial segregation. In this way, we quantify the Power of Integration of cities (their capacity to integrate diverse cultures), and characterize the relations between cultures when they act in the role of hosts and immigrants.
Interplay of homophily and communication in online social networks: Wikipedia-based semantic metric application on Twitter
S. Sćepanović, I. Mishkovski, B. Gonçalves, N, Trung Hieu, P. Hui
arXiv: 1606.08207
People are observed to assortitavely connect on a set of traits. Uncovering the reasons for people exhibiting this strong assortative mixing in social networks is of great interest to researchers and in practice. A popular case application exploiting the insights about social correlation in social networks is in marketing and product promotion. Suggested tendencies to induce observed social correlation are homophily and social influence. However, clearly identifying the causal relationship between these tendencies has proven to be a hard task. In this study we present the interplay between communication happening on Twitter (represented by user mentions) and different semantical aspects of user communication content (tweets). As a semantic relatedness metric we employ a database built from the English Wikipedia corpus according to the Explicit Semantic Analysis method. Our work, to the best of our knowledge, is the first to offer an in-depth analysis on semantic homophily on communication in social networks. Moreover, we quantify diverse levels of homophily and/or social influence, identify the semantic traits as the foci of such homophily, show insights in the temporal evolution of the homophily and influence and finally, we present their intricate interplay with the communication in Twitter.
OSoMe: The IUNI Observatory on Social Media
C. A. Davis, G. L. Ciampaglia, L. M. Aiello, K. Chung, M. Conover, E. Ferrara, A. Flammini, G. Fox, X. Gao, B. Gonçalves, P. Grabowicz, A. Hong, P.-M. Hui, S. McCaulay, K. McKelvey, M. Meiss, S. Patil, C. P. Kankanamalage, V. Pentchev, J. Qiu, J. Ratkiewicz, A. Rudnick, B. Serrette, P. Shiralkar, O. Varol, L. Weng, T.-L. Wu, A. Younge, and F. Menczer
arXiv: 1606.08207
The study of social phenomena is becoming increasingly reliant on big data from on-line social networks. Broad access to social media data, however, requires software development skills that not all researchers possess. Here we present the IUNI Observatory on Social Media, an open analytics platform designed to facilitate computational social science. The system leverages a historical, ongoing collection of over 70 billion public messages from Twitter. We illustrate a number of interactive open-source tool to retrieve, visualize, and analyze derived data from this collection. The Observatory, now available at osome.iuni.iu.edu, is the result of a large, six-year collaborative effort coordinated by the Indiana University Network Science Institute.
The happiness paradox: your friends are happier than you
J. Bolen, B. Gonçalves, I. van de Leemput, G. Ruan
arXiv: 1602.02665
Most individuals in social networks experience a so-called Friendship Paradox: they are less popular than their friends on average. This effect may explain recent findings that widespread social network media use leads to reduced happiness. However the relation between popularity and happiness is poorly understood. A Friendship paradox does not necessarily imply a Happiness paradox where most individuals are less happy than their friends. Here we report the first direct observation of a significant Happiness Paradox in a large-scale online social network of 39,110 Twitter users. Our results reveal that popular individuals are indeed happier and that a majority of individuals experience a significant Happiness paradox. The magnitude of the latter effect is shaped by complex interactions between individual popularity, happiness, and the fact that users cluster assortatively by level of happiness. Our results indicate that the topology of online social networks and the distribution of happiness in some populations can cause widespread psycho-social effects that affect the well-being of billions of individuals.
Everyday the Same Picture: Popularity and Content Diversity
A. Bessi, F. Zollo, M. D. Vicario, A. Scala, G. Caldarelli, F. Petroni, B. Gonçalves, W. Quattrociocchi
arXiv: 1501.07201
Facebook is flooded by diverse and heterogeneous content, from kittens up to music and news, passing through satirical and funny stories. Each piece of that corpus reflects the heterogeneity of the underlying social background. In the Italian Facebook we have found an interesting case: a page having more than 40K followers that every day posts the same picture of Toto Cutugno, a popular Italian singer. In this work, we use such a page as a benchmark to study and model the effects of content heterogeneity on popularity. In particular, we use that page for a comparative analysis of information consumption patterns with respect to pages posting science and conspiracy news. In total, we analyze about 2M likes and 190K comments, made by approximately 340K and 65K users, respectively. We conclude the paper by introducing a model mimicking users selection preferences accounting for the heterogeneity of contents.


38. Learning Spanish dialects through Twitter
B. Gonçalves, D. Sanchez
To Appear RILI (2016) arXiv: 1511.04970
We map the large-scale variation of the Spanish language by employing a corpus based on geographically tagged Twitter messages. Lexical dialects are extracted from an analysis of variants of tens of concepts. The resulting maps show linguistic variations on an unprecedented scale across the globe. We discuss the properties of the main dialects within a machine learning approach and find that varieties spoken in urban areas have an international character in contrast to country areas where dialects show a more regional uniformity.
37. The dynamics of information-driven coordination phenomena: a transfer entropy analysis
J. Borge-Holthoefer, N. Perra, B. Gonçalves, S. González-Bailón, A. Arenas, Y. Moreno, A. Vespignani
Science Advances 2, E1501158 (2016) arXiv: 1507.06106
Data from social media are providing unprecedented opportunities to investigate the processes that rule the dynamics of collective social phenomena. Here, we consider an information theoretical approach to define and measure the temporal and structural signatures typical of collective social events as they arise and gain prominence. We use the symbolic transfer entropy analysis of micro-blogging time series to extract directed networks of influence among geolocalized sub-units in social systems. This methodology captures the emergence of system-level dynamics close to the onset of socially relevant collective phenomena. The framework is validated against a detailed empirical analysis of five case studies. In particular, we identify a change in the characteristic time-scale of the information transfer that flags the onset of information-driven collective phenomena. Furthermore, our approach identifies an order-disorder transition in the directed network of influence between social sub-units. In the absence of a clear exogenous driving, social collective phenomena can be represented as endogenously-driven structural transitions of the information transfer network. This study provides results that can help define models and predictive algorithms for the analysis of societal events based on open source data.
36. Touristic site attractiveness seen through Twitter
A. Bassolas, M. Lenormand, A. Tugores, B. Gonçalves, J. J. Ramasco
EPJ Data Science 5, 12 (2016) arXiv: 1601.07741
Tourism is a significant contributor to medium and long range travels in an increasingly globalized world. Leisure travel has an important impact on the local and global economy and on the environment as well. The study of touristic trips is thus raising a considerable interest. In this work, we apply a method to assess the attractiveness of 20 of the most popular touristic sites worldwide using geolocated tweets as a proxy for human mobility. We first rank the touristic sites according to the spatial distribution of their visitors' place of residence. The Taj Mahal, the Pisa Tower and the Eiffel Tower appear consistently in the top 5 in these rankings. We then consider a coarser scale and classify the travelers by country of residence. Touristic site's visiting figures are then studied by country of residence showing that the Eiffel Tower, Times Square and the London Tower welcome the majority of the visitors of each country. Finally, we build a network linking sites whenever a user has been detected in more than one site. This allows us to unveil relations between touristic sites and find which ones are more tightly interconnected.
35. Human diffusion and city influence
M. Lenormand, B. Gonçalves, A. Tugores, J. Ramasco
J. R. Soc. Interface 12, 20150473 (2015) arXiv: 1501.07788
Cities are characterized by concentrating population, economic activity and services. However, not all cities are equal and hierarchy in terms of influence at local, regional or global scales naturally emerges. Traditionally, there have been important efforts to describe this hierarchy by indirect measures such the sharing of company headquarters, traffic by air, train or boats or economical exchanges. In this work, we take a different approach and introduce a method that uses geolocated Twitter information to quantify the impact of cities on rural or other urban areas. Since geolocated tweets are becoming a global phenomenon, the method can be applied at a world-wide scale. We focus on 58 cities and analyze the mobility patterns of people after visiting them for the first time. Cities such as Rome and Paris appear consistently as those with largest area covered by Twitter users after their visit and as those attracting visitors most diverse in origin. The study is also performed discerning users mobility by the contribution of locals and non-locals, which shows the relevance of the mixing ratio between them to have a global city. Finally, we focus on the mobility of users between cities and construct a network with the users flows between them. The network allows to analyze centrality defining it at a global and regional scale. The hierarchy of cities dramatically changes when referred only to urban users, with New York and London playing a predominant role.
34. Reply to Biersteker: When methods matter
S. Ronen, B. Gonçalves, K. Z. Hu, A. Vespignani, S. Pinker, C. Hidalgo
Proc. Natl. Acad. Sci. 112, E1815 (2015)
We appreciate Biersteker’s comments on our research. Moreover, we agree with many of her points so wholeheartedly that our paper addresses them in detail: We devote whole sections in the main text and supporting information to the incompleteness of the Index Translationum, the imperfect quality of the language detector, and the limitations of the Wikipedia dataset, among others.
33. Links that speak: the global language network and its association with global fame
S. Ronen, B. Gonçalves, K. Z. Hu, A. Vespignani, S. Pinker, C. Hidalgo
Proc. Natl. Acad. Sci. 111, E5616 (2014)
Languages vary enormously in global importance because of historical, demographic, political, and technological forces. Yet, beyond simple measures of population and economic power, there has been no rigorous quantitative way to define the global influence of languages. Here we use the structure of the network connecting multilingual speakers and translated texts, as expressed in book translations, multiple language editions of the Wikipedia, and Twitter, to provide a concept of language importance that goes beyond simple economic or demographic measures. We find that the structure of the three GLNs is centered on English as a global hub, and also, around a handful of intermediate hub languages, which include Spanish, German, French, Russian, Portuguese and Chinese. We validate the measure of a language’s centrality in the three GLNs by showing that they exhibit a strong correlation with two independent measures of the number of famous people born in the countries associated with that language. These results suggest that the position of a language in the Global Language Network contributes to the visibility of its speakers and the global popularity of the cultural content they produce.
32. Crowdsourcing Dialect Characteriation Through Twitter
B. Gonçalves, David Sánchez
PLoS One 9, E112074 (2014) arXiv: 1407.7094 
We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.
31. Entangling mobility and interactions in social media
P. A. Grabowicz, J. J. Ramasco, B. Gonçalves, V. M. Eguiluz
PLoS One 9, E92196 (2014) arXiv: 1307.5304
Daily interactions naturally define social circles. Individuals tend to be friends with the people they spend time with and they choose to spend time with their friends, inextricably entangling physical location and social relationships. As a result, it is possible to predict not only someone's location from their friends' locations but also friendship from spatial and temporal co-occurrence. While several models have been developed to separately describe mobility and the evolution of social networks, there is a lack of studies coupling social interactions and mobility. In this work, we introduce a new model that bridges this gap by explicitly considering the feedback of mobility on the formation of social ties. Data coming from three online social networks (Twitter, Gowalla and Brightkite) is used for validation. Our model reproduces various topological and physical properties of these networks such as: i) the size of the connected components, ii) the distance distribution between connected users, iii) the dependence of the reciprocity on the distance, iv) the variation of the social overlap and the clustering with the distance. Besides numerical simulations, a mean-field approach is also used to study analytically the main statistical features of the networks generated by the model. The robustness of the results to changes in the model parameters is explored, finding that a balance between friend visits and long-range random connections is essential to reproduce the geographical features of the empirical networks.
30. Human mobility and the worldwide impact of intentional localized highly pathogenic virus release
B. Gonçalves D. Balcan, A. Vespignani
Nature Scientific Reports 3, 810 (2013)
The threat of bioterrorism and the possibility of accidental release have spawned a growth of interest in modeling the course of the release of a highly pathogenic agent. Studies focused on strategies to contain local outbreaks after their detection show that timely interventions with vaccination and contact tracing are able to halt transmission. However, such studies do not consider the effects of human mobility patterns. Using a large-scale structured metapopulation model to simulate the global spread of smallpox after an intentional release event, we show that index cases and potential outbreaks can occur in different continents even before the detection of the pathogen release. These results have two major implications: i) intentional release of a highly pathogenic agent within a country will have global effects; ii) the release event may trigger outbreaks in countries lacking the health infrastructure necessary for effective containment. The presented study provides data with potential uses in defining contingency plans at the National and International level.
29. The Twitter of Babel: Mapping World Languages through Microblogging Platforms
D. Mocanu, A. Baronchelli, N. Perra, B. Gonçalves, A. Vespignani
PLoS One 8, E61981 (2013) arXiv: 1212.5238 Supplementary Site
Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data "proxies" of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the linguistic homogeneity of different countries, the touristic seasonal patterns within countries and the geographical distribution of different languages in multilingual regions. This work highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities.
28. Characterizing scientific production and consumption in Physics
Q. Zhang, N. Perra, B. Gonçalves, F. Ciulla, A. Vespignani
Nature Scientific Reports 3, 1640 (2013) arXiv: 1302.6569
We analyze the entire publication database of the American Physical Society generating longitudinal (50 years) citation networks geolocalized at the level of single urban areas. We define the knowledge diffusion proxy, and scientific production ranking algorithms to capture the spatio-temporal dynamics of Physics knowledge worldwide. By using the knowledge diffusion proxy we identify the key cities in the production and consumption of knowledge in Physics as a function of time. The results from the scientific production ranking algorithm allow us to characterize the top cities for scholarly research in Physics. Although we focus on a single dataset concerning a specific field, the methodology presented here opens the path to comparative studies of the dynamics of knowledge across disciplines and research areas
27. Emergence of influential spreaders in modified rumor models
J. Borge-Holthoefer, S. Meloni, B. Gonçalves, Y. Moreno
J. Stat. Phys. 151, 383 (2013) arXiv: 1209.1351
The burst in the use of online social networks over the last decade has provided evidence that current rumor spreading models miss some fundamental ingredients in order to reproduce how information is disseminated. In particular, recent literature has revealed that these models fail to reproduce the fact that some nodes in a network have an influential role when it comes to spread a piece of information. In this work, we introduce two mecha- nisms with the aim of filling the gap between theoretical and experimental results. The first model introduces the assumption that spreaders are not always active whereas the second model considers the possibility that an ignorant is not interested in spreading the rumor. In both cases, results from numerical simulations show a higher adhesion to real data than clas- sical rumor spreading models. Our results shed some light on the mechanisms underlying the spreading of information and ideas in large social systems and pave the way for more realistic diffusion models.
26. Walking and searching in time-varying networks
N. Perra, A. Baronchelli, D. Mocanu, B. Gonçalves, R. Pastor-Satorras, A. Vespignani
Phys. Rev. Lett. 109, 238701 (2012) arXiv: 1206.2858
The random walk process underlies the description of a large number of real world phenomena. Here we provide the study of random walk processes in time varying networks in the regime of time-scale mixing; i.e. when the network connectivity pattern and the random walk process dynamics are unfolding on the same time scale. We consider a model for time varying networks created from the activity potential of the nodes, and derive solutions of the asymptotic behavior of random walks and the mean first passage time in undirected and directed networks. Our findings show striking differences with respect to the well known results obtained in quenched and annealed networks, emphasizing the effects of dynamical connectivity patterns in the definition of proper strategies for search, retrieval and diffusion processes in time-varying networks
25. Real time numerical forecast of global epidemic spreading: case study of 2009 A/H1N1pdm
M. Tizzoni, P. Bajardi, C. Poletto, J. J. Ramasco, D. Balcan, B. Gonçalves, N. Perra, V. Colizza, A. Vespignani
BMC Medicine 10, 165 (2012) 
Background: Mathematical and computational models for infectious diseases are increasingly used to support public-health decisions; however, their reliability is currently under debate. Real-time forecasts of epidemic spread using data-driven models have been hindered by the technical challenges posed by parameter estimation and validation. Data gathered for the 2009 H1N1 influenza crisis represent an unprecedented opportunity to validate real-time model predictions and define the main success criteria for different approaches.
Methods: We used the Global Epidemic and Mobility Model to generate stochastic simulations of epidemic spread worldwide, yielding (among other measures) the incidence and seeding events at a daily resolution for 3,362 subpopulations in 220 countries. Using a Monte Carlo Maximum Likelihood analysis, the model provided an estimate of the seasonal transmission potential during the early phase of the H1N1 pandemic and generated ensemble forecasts for the activity peaks in the northern hemisphere in the fall/winter wave. These results were validated against the real-life surveillance data collected in 48 countries, and their robustness assessed by focusing on 1) the peak timing of the pandemic; 2) the level of spatial resolution allowed by the model; and 3) the clinical attack rate and the effectiveness of the vaccine. In addition, we studied the effect of data incompleteness on the prediction reliability.
Results: Real-time predictions of the peak timing are found to be in good agreement with the empirical data, showing strong robustness to data that may not be accessible in real time (such as pre-exposure immunity and adherence to vaccination campaigns), but that affect the predictions for the attack rates. The timing and spatial unfolding of the pandemic are critically sensitive to the level of mobility data integrated into the model.
Conclusions: Our results show that large-scale models can be used to provide valuable real-time forecasts of influenza spreading, but they require high-performance computing. The quality of the forecast depends on the level of data integration, thus stressing the need for high-quality data in population-based models, and of progressive updates of validated available empirical knowledge to inform these models
24. Beating the news using Social Media: the case study of American Idol
F. Ciulla, D. Mocanu, A. Baronchelli, B. Gonçalves, N. Perra, A. Vespignani
EPJ Data Science 1, 8 (2012) arXiv: 1205.4467
We present a contribution to the debate on the predictability of social events using big data analytics. We focus on the elimination of contestants in the American Idol TV shows as an example of a well defined electoral phenomenon that each week draws millions of votes in the USA. We provide evidence that Twitter activity during the time span defined by the TV show airing and the voting period following it, correlates with the contestants ranking and allows the anticipation of the voting outcome. Furthermore, the fraction of Tweets that contain geolocation information allows us to map the fanbase of each contestant, both within the US and abroad, showing that strong regional polarizations occur. Although American Idol voting is just a minimal and simplified version of complex societal phenomena such as political elections, this work shows that the volume of information available in online systems permits the real time gathering of quantitative indicators anticipating the future unfolding of opinion formation events.
23. Activity driven modeling of time varying networks
N. Perra, B. Gonçalves, R. Pastor-Satorras, A. Vespignani
Nature Scientific Reports 2, 469 (2012) arXiv: 1203.5351
Network modeling plays a critical role in identifying statistical regularities and structural principles common to many systems. The large majority of recent modeling approaches are connectivity driven. The structural patterns of the network are at the basis of the mechanisms ruling the network formation. Connectivity driven models necessarily provide a time-aggregated representation that may fail to describe the instantaneous and fluctuating dynamics of many networks. We address this challenge by defining the activity potential, a time invariant function characterizing the agents' interactions and constructing an activity driven model capable of encoding the instantaneous time description of the network dynamics. The model provides an explanation of structural features such as the presence of hubs, which simply originate from the heterogeneous activity of agents. Within this framework, highly dynamical networks can be described analytically, allowing a quantitative discussion of the biases induced by the time-aggregated representations in the analysis of dynamical processes.
22. Partisan Asymmetries in Online Political Activity
M. Conover, B. Gonçalves, A. Flammini, F. Menczer
EPJ Data Science 1, 6 (2012) arXiv: 1205.1010
We examine partisan differences in the behavior, communication patterns and social interactions of more than 18,000 politically-active Twitter users to produce evidence that points to changing levels of partisan engagement with the American online political landscape. Analysis of a network defined by the communication activity of these users in proximity to the 2010 midterm congressional elections reveals a highly segregated, well clustered partisan community structure. Using cluster membership as a high-fidelity (87% accuracy) proxy for political affiliation, we characterize a wide range of differences in the behavior, communication and social connectivity of left- and right-leaning Twitter users. We find that in contrast to the online political dynamics of the 2008 campaign, right-leaning Twitter users exhibit greater levels of political activity, a more tightly interconnected social structure, and a communication network topology that facilitates the rapid and broad dissemination of political information.
21. Towards a characterization of behavior-disease models
N. Perra, D. Balcan, B. Gonçalves, A. Vespignani
PLoS One 6, e23084 (2011), arXiv: 1107.0997
The last decade saw the advent of increasingly realistic epidemic models that leverage on the availability of highly detailed census and human mobility data. Data-driven models aim at a granularity down to the level of households or single individuals. However, relatively little systematic work has been done to provide coupled behavior-disease models able to close the feedback loop between behavioral changes triggered in the population by an individual's perception of the disease spread and the actual disease spread itself. While models lacking this coupling can be extremely successful in mild epidemics, they obviously will be of limited use in situations where social disruption or behavioral alterations are induced in the population by knowledge of the disease. Here we propose a characterization of a set of prototypical mechanisms for self-initiated social distancing induced by local and non-local prevalence-based information available to individuals in the population. We characterize the effects of these mechanisms in the framework of a compartmental scheme that enlarges the basic SIR model by considering separate behavioral classes within the population. The transition of individuals in/out of behavioral classes is coupled with the spreading of the disease and provides a rich phase space with multiple epidemic peaks and tipping points. The class of models presented here can be used in the case of data-driven computational approaches to analyze scenarios of social adaptation and behavioral change.
20. Modeling Users’ Activity on Twitter Networks: Validation of Dunbar’s Number
B. Gonçalves, N. Perra, A. Vespignani
PLoS One 6, e22656 (2011), arXiv: 1105.5170
Modern society's increasing dependency on online tools for both work and recreation opens up unique opportunities for the study of social interactions. A large survey of online exchanges or conversations on Twitter, collected across six months involving 1.7 million individuals is presented here. We test the theoretical cognitive limit on the number of stable social relationships known as Dunbar's number. We find that users can entertain a maximum of 100-200 stable relationships in support for Dunbar's prediction. The "economy of attention" is limited in the online world by cognitive and biological constraints as predicted by Dunbar's theory. Inspired by this empirical evidence we propose a simple dynamical mechanism, based on finite priority queuing and time resources, that reproduces the observed social behavior.
19. Happiness is assortative in online social networks
J. Bollen, B. Gonçalves, G. Ruan, H. Mao
ALIFE 17, 237 (2011), arXiv: 1103.0784
Social networks tend to disproportionally favor connections between individuals with either similar or dissimilar characteristics. This propensity, referred to as assortative mixing or homophily, is expressed as the correlation between attribute values of nearest neighbour vertices in a graph. Recent results indicate that beyond demographic features such as age, sex and race, even psychological states such as "loneliness" can be assortative in a social network. In spite of the increasing societal importance of online social networks it is unknown whether assortative mixing of psychological states takes place in situations where social ties are mediated solely by online networking services in the absence of physical contact. Here, we show that general happiness or Subjective Well-Being (SWB) of Twitter users, as measured from a 6 month record of their individual tweets, is indeed assortative across the Twitter social network. To our knowledge this is the first result that shows assortative mixing in online networks at the level of SWB. Our results imply that online social networks may be equally subject to the social mechanisms that cause assortative mixing in real social networks and that such assortative mixing takes place at the level of SWB. Given the increasing prevalence of online social networks, their propensity to connect users with similar levels of SWB may be an important instrument in better understanding how both positive and negative sentiments spread through online social ties. Future research may focus on how event-specific mood states can propagate and influence user behavior in "real life".

Access the recommendation on F1000Prime

18. The GLEaMviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale
W. Van den Broeck, C. Gioannini , B. Gonçalves, M. Quaggiotto, V. Colizza, A Vespignani
BMC Infectious Diseases 11, 37 (2011) 
Background: Computational models play an increasingly important role in the assessment and control of public health crises, as demonstrated during the 2009 H1N1 influenza pandemic. Much research has been done in recent years in the development of sophisticated data-driven models for realistic computer-based simulations of infectious disease spreading. However, only a few computational tools are presently available for assessing scenarios, predicting epidemic evolutions, and managing health emergencies that can benefit a broad audience of users including policy makers and health institutions.
Results: We present “GLEaMviz”, a publicly available software system that simulates the spread of emerging human-to-human infectious diseases across the world. The GLEaMviz tool comprises three components: the client application, the proxy middleware, and the simulation engine. The latter two components constitute the GLEaMviz server. The simulation engine leverages on the Global Epidemic and Mobility (GLEaM) framework, a stochastic computational scheme that integrates worldwide high-resolution demographic and mobility data to simulate disease spread on the global scale. The GLEaMviz design aims at maximizing flexibility in defining the disease compartmental model and configuring the simulation scenario; it allows the user to set a variety of parameters including: compartment-specific features, transition values, and environmental effects. The output is a dynamic map and a corresponding set of charts that quantitatively describe the geo-temporal evolution of the disease. The software is designed as a client-server system. The multi-platform client, which can be installed on the user’s local machine, is used to set up simulations that will be executed on the server, thus avoiding specific requirements for large computational capabilities on the user side.
Conclusions: The user-friendly graphical interface of the GLEaMviz tool, along with its high level of detail and the realism of its embedded modeling approach, opens up the platform to simulate realistic epidemic scenarios. These features make the GLEaMviz computational tool a convenient teaching/training tool as well as a first step toward the development of a computational tool aimed at facilitating the use and exploitation of computational models for the policy making and scenario analysis of infectious disease outbreaks
17. Modeling the spatial spread of infectious diseases: the GLobal Epidemic and Mobility computational model
D. Balcan, B. Gonçalves, H. Hu, V. Colizza, J. J. Ramasco, A. Vespignani
J. of Computational Science 1, 132 (2010) 
Here we present the Global Epidemic and Mobility (GLEaM) model that integrates sociodemographic and population mobility data in a spatially structured stochastic disease approach to simulate the spread of epidemics at the worldwide scale. We discuss the flexible structure of the model that is open to the inclusion of different disease structures and local intervention policies. This makes GLEaM suitable for the computational modeling and anticipation of the spatio-temporal patterns of global epidemic spreading, the understanding of historical epidemics, the assessment of the role of human mobility in shaping global epidemics, and the analysis of mitigation and containment scenarios.
16. Comparing large-scale computational approaches to epidemic modeling: Agent based versus structured metapopulation models
M. Ajelli, B. Gonçalves †, D. Balcan, V. Colizza, H. Hu, J. J. Ramasco, S. Merler, A. Vespignani
BMC Infectious Diseases 10, 190 (2010) 
15. Modeling the critical care demand and antibiotics resources needed during the Fall 2009 wave of influenza A(H1N1) pandemic.
D. Balcan, V. Colizza, A. C. Singer, C. Chouaid, H. Hu, B. Gonçalves, P. Bajardi, C. Poletto, J. J. Ramasco, N. Perra, M. Tizzoni, D. Paolotti, W. Van den Broeck, A. J. Valleron, A. Vespignani.
PLoS Currents: Influenza Dec 4, RRN1133 (2009) 
14. Multiscale mobility networks and the spatial spreading of infectious diseases.
D. Balcan, V. Colizza, B. Gonçalves, H. Hu, J. J. Ramasco, A. Vespignani
Proc. Natl. Acad. Sci. 106, 21484-21489 (2009), arXiv: 0907.3304
Among the realistic ingredients to be considered in the computational modeling of infectious diseases, human mobility represents a crucial challenge both on the theoretical side and in view of the limited availability of empirical data. In order to study the interplay between small-scale commuting flows and long-range airline traffic in shaping the spatio-temporal pattern of a global epidemic we i) analyze mobility data from 29 countries around the world and find a gravity model able to provide a global description of commuting patterns up to 300 kms; ii) integrate in a worldwide structured metapopulation epidemic model a time-scale separation technique for evaluating the force of infection due to multiscale mobility processes in the disease dynamics. Commuting flows are found, on average, to be one order of magnitude larger than airline flows. However, their introduction into the worldwide model shows that the large scale pattern of the simulated epidemic exhibits only small variations with respect to the baseline case where only airline traffic is considered. The presence of short range mobility increases however the synchronization of subpopulations in close proximity and affects the epidemic behavior at the periphery of the airline transportation infrastructure. The present approach outlines the possibility for the definition of layered computational approaches where different modeling assumptions and granularities can be used consistently in a unifying multi-scale framework.
13. Estimate of Novel Influenza A/H1N1 cases in Mexico at the early stage of the pandemic with a spatially structured epidemic model
V. Colizza, A. Vespignani, N. Perra, C. Poletto, B. Gonçalves, H. Hu, D. Balcan, D. Paolotti, W. Van den Broeck, M. Tizzoni, P. Bajardi, J.J. Ramasco
PLoS Currents: Influenza Nov 11, RRN1129 (2009) 
Access the recommendation on F1000Prime 12. Seasonal transmission potential and activity peaks of the new influenza A(H1N1): a Monte Carlo likelihood analysis based on human mobility
D. Balcan, H. Hu, B. Gonçalves†, P. Bajardi, C. Poletto, J. J. Ramasco, D. Paolotti, N. Perra, M. Tizzoni, W. Van den Broeck, V. Colizza, A. Vespignani
BMC Medicine 7, 45 (2009), arXiv: 0909.2417
On 11 June the World Health Organization officially raised the phase of pandemic alert (with regard to the new H1N1 influenza strain) to level 6. We use a global structured metapopulation model integrating mobility and transportation data worldwide in order to estimate the transmission potential and the relevant model parameters we used the data on the chronology of the 2009 novel influenza A(H1N1). The method is based on the maximum likelihood analysis of the arrival time distribution generated by the model in 12 countries seeded by Mexico by using 1M computationally simulated epidemics. An extended chronology including 93 countries worldwide seeded before 18 June was used to ascertain the seasonality effects. We found the best estimate R0 = 1.75 (95% CI 1.64 to 1.88) for the basic reproductive number. Correlation analysis allows the selection of the most probable seasonal behavior based on the observed pattern, leading to the identification of plausible scenarios for the future unfolding of the pandemic and the estimate of pandemic activity peaks in the different hemispheres. We provide estimates for the number of hospitalizations and the attack rate for the next wave as well as an extensive sensitivity analysis on the disease parameter values. We also studied the effect of systematic therapeutic use of antiviral drugs on the epidemic timeline. The analysis shows the potential for an early epidemic peak occurring in October/November in the Northern hemisphere, likely before large-scale vaccination campaigns could be carried out. We suggest that the planning of additional mitigation policies such as systematic antiviral treatments might be the key to delay the activity peak inorder to restore the effectiveness of the vaccination programs
11. Modeling vaccination campaigns and the Fall/Winter 2009 activity of the new A(H1N1) influenza in the Northern Hemisphere
P. Bajardi, C. Poletto, D. Balcan, H. Hu, B. Gonçalves, J. J. Ramasco, D. Paolotti, N. Perra, M. Tizzoni, W. Van den Broeck, V. Colizza, A. Vespignani
Emerging Health Threats Journal 2 e11 (2009) arXiv: 1002.0876
The unfolding of pandemic influenza A(H1N1) for Fall 2009 in the Northern Hemisphere is still uncertain. Plans for vaccination campaigns and vaccine trials are underway, with the first batches expected to be available early October. Several studies point to the possibility of an anticipated pandemic peak that could undermine the effectiveness of vaccination strategies. Here we use a structured global epidemic and mobility metapopulation model to assess the effectiveness of massive vaccination campaigns for the Fall/Winter 2009. Mitigation effects are explored depending on the interplay between the predicted pandemic evolution and the expected delivery of vaccines. The model is calibrated using recent estimates on the transmissibility of the new A(H1N1) influenza. Results show that if additional intervention strategies were not used to delay the time of pandemic peak, vaccination may not be able to considerably reduce the cumulative number of cases, even when the mass vaccination campaign is started as early as mid-October. Prioritized vaccination would be crucial in slowing down the pandemic evolution and reducing its burden.
10. The Peculiar Phase Structure of Random Graph Bisection
A. G. Percus, G. Istrate, B. Gonçalves, R. Z. Sumi, S. Boettcher
J. Math. Phys. 49, 125219 (2008), arXiv:0808.1549
The mincut graph bisection problem involves partitioning the n vertices of a graph into disjoint subsets, each containing exactly n/2 vertices, while minimizing the number of "cut" edges with an endpoint in each subset. When considered over sparse random graphs, the phase structure of the graph bisection problem displays certain familiar properties, but also some surprises. It is known that when the mean degree is below the critical value of 2 log 2, the cutsize is zero with high probability. We study how the minimum cutsize increases with mean degree above this critical threshold, finding a new analytical upper bound that improves considerably upon previous bounds. Combined with recent results on expander graphs, our bound suggests the unusual scenario that random graph bisection is replica symmetric up to and beyond the critical threshold, with a replica symmetry breaking transition possibly taking place above the threshold. An intriguing algorithmic consequence is that although the problem is NP-hard, we can find near-optimal cutsizes (whose ratio to the optimal value approaches 1 asymptotically) in polynomial time for typical instances near the phase transition.
9. Anomalous Diffusion on the Hanoi Networks
S. Boettcher, B. Gonçalves
Euro Phys. Lett. 84, 30002 (2008), arXiv:0802.2757
Diffusion is modeled on the recently proposed Hanoi networks by studying the mean- square displacement of random walks with time, <r^2>~t^{2/d_w}. It is found that diffusion - the quintessential mode of transport throughout Nature - proceeds faster than ordinary, in one case with an exact, anomalous exponent dw = 2-log_2(\phi) = 1.30576 . . .. It is an instance of a physical exponent containing the "golden ratio" \phi=(1+\sqrt{5})/2 that is intimately related to Fibonacci sequences and since Euclid's time has been found to be fundamental throughout geometry, architecture, art, and Nature itself. It originates from a singular renormalization group fixed point with a subtle boundary layer, for whose resolution \phi is the main protagonist. The origin of this rare singularity is easily understood in terms of the physics of the process. Yet, the connection between network geometry and the emergence of \phi in this context remains elusive. These results provide an accurate test of recently proposed universal scaling forms for first passage times.
8. Geometry and Dynamics for Hierarchical Regular Networks
S. Boettcher, B. Gonçalves, J. Azaret
J. Phys. A: Math. Theor. 41, 335003 (2008), arXiv: 0805.3013
The recently introduced hierarchical regular networks HN3 and HN4 are analyzed in detail. We use renormalization group arguments to show that HN3, a 3-regular planar graph, has a diameter growing as \sqrt{N} with the system size, and random walks on HN3 exhibit super-diffusion with an anomalous exponent d_w = 2 - \log_2\phi = 1.306..., where \phi = (\sqrt{5} + 1)/2 = 1.618... is the "golden ratio." In contrast, HN4, a non-planar 4-regular graph, has a diameter that grows slower than any power of N, yet, fast than any power of \ln N . In an annealed approximation we can show that diffusive transport on HN4 occurs ballistically (d_w = 1). Walkers on both graphs possess a first- return probability with a power law tail characterized by an exponent \mu = 2 -1/d_w . It is shown explicitly that recurrence properties on HN3 depend on the starting site.
7. Hierarchical, Regular Small-World Networks
S. Boettcher, B. Gonçalves, H. Guclu
J. Phys. A: Math. Theor. 41, 252001 (2008), arXiv:0712.1259
Two new classes of networks are introduced that resemble small-world properties. These networks are recursively constructed but retain a fixed, regular degree. They consist of a one-dimensional lattice backbone overlayed by a hierarchical sequence of long-distance links. Both types of networks, one 3-regular and the other 4-regular, lead to distinct behaviors, as revealed by renormalization group studies. The 3-regular networks are planar, have a diameter growing as \sqrt{N} with the system size N, and lead to super-diffusion with an exact, anomalous exponent d_w=1.3057581..., but possesses only a trivial fixed point T_c=0 for the Ising ferromagnet. In turn, the 4-regular networks are non-planar, have a diameter growing as ~2^[\sqrt(\log_2 N^2)], exhibit "ballistic" diffusion (d_w=1), and a non-trivial ferromagnetic transition, T_c>0. It suggest that the 3-regular networks are still quite "geometric", while the 4-regular networks qualify as true small-world networks with mean-field properties. As an example of an application we discuss synchronization of processors on these networks.
6. Human Dynamics Revealed Through Web Analytics
B. Gonçalves, J. J. Ramasco
Phys. Rev. E 78, 026123 (2008), arXiv:0803.4108
When the World Wide Web was first conceived as a way to facilitate the sharing of scientific information at the CERN (European Center for Nuclear Research) few could have imagined the role it would come to play in the following decades. Since then, the increasing ubiquity of Internet access and the frequency with which people interact with it raise the possibility of using the Web to better observe, understand, and monitor several aspects of human social behavior. Web sites with large numbers of frequently returning users are ideal for this task. If these sites belong to companies or universities, their usage patterns can furnish information about the working habits of entire populations. In this work, we analyze the properly anonymized logs detailing the access history to Emory University's Web site. Emory is a medium size university located in Atlanta, Georgia. We find interesting structure in the activity patterns of the domain and study in a systematic way the main forces behind the dynamics of the traffic. In particular, we show that both linear preferential linking and priority based queuing are essential ingredients to understand the way users navigate the Web.
5. Hysteretic Optimization In Spin Glasses
B. Gonçalves, S. Boettcher
J. Stat. Mech. P01003 (2008), arXiv:0710.2138
The recently proposed Hysteretic Optimization (HO) procedure is applied to the 1D Ising spin chain with long range interactions. To study its effectiveness, the quality of ground state energies found as a function of the distance dependence exponent, \sigma, is assessed. It is found that the transition from an infinite-range to a long-range interaction at \sigma=0.5 is accompanied by a sharp decrease in the performance . The transition is signaled by a change in the scaling behavior of the average avalanche size observed during the hysteresis process. This indicates that HO requires the system to be infinite-range, with a high degree of interconnectivity between variables leading to large avalanches, in order to function properly. An analysis of the way auto-correlations evolve during the optimization procedure confirm that the search of phase space is less efficient, with the system becoming effectively stuck in suboptimal configurations much earlier. These observations explain the poor performance that HO obtained for the Edwards-Anderson spin glass on finite-dimensional lattices, and suggest that its usefulness might be limited in many combinatorial optimization problems.
4. Magnetic Reversal Time in Open Long Range Systems
F. Borgonovi, G. L. Celardo, B. Gonçalves, L.Spadafora
Phys. Rev. E 77, 061119 (2008), arXiv:0710.3935
Topological phase space disconnection has been recently found to be a general phenomenon in isolated anisotropic spin systems. It sets a general framework to understand the emergence of ferromagnetism in finite magnetic systems starting from microscopic models without phenomenological on-site barriers. Here we study its relevance for finite systems with long range interacting potential in contact with a thermal bath. We show that, even in this case, the induced magnetic reversal time is exponentially large in the number of spins, thus determining {\it stable} (to any experimental observation time) ferromagnetic behavior. Moreover, the explicit temperature dependence of the magnetic reversal time obtained from the microcanonical results, is found to be in good agreement with numerical simulations. Also, a simple and suggestive expression, indicating the Topological Energy Threshold at which the disconnection occurs, as a real energy barrier for many body systems, is obtained analytically for low temperature.
3. Transport on Weighted Networks: When Correlations are Independent of Degree
J. Ramasco, B. Gonçalves
Phys. Rev. E 76, 066106 (2007), arXiv:cond-mat/0609776
Most real-world networks are weighted graphs with the weight of the edges reflecting the relative importance of the connections. In this work, we study non degree dependent correlations between edge weights, generalizing thus the correlations beyond the degree dependent case. We propose a simple method to introduce weight-weight correlations in topologically uncorrelated graphs. This allows us to test different measures to discriminate between the different correlation types and to quantify their intensity. We also discuss here the effect of weight correlations on the transport properties of the networks, showing that positive correlations dramatically improve transport. Finally, we give two examples of real-world networks (social and transport graphs) in which weight-weight correlations are present.
2. Ensemble Inequivalence in Random Graphs
J. Barré, B. Gonçalves
Physica A 386, 212 (2007), arXiv:cond-mat/0705.2385
We present a complete analytical solution of a system of Potts spins on a random k-regular graph in both the canonical and microcanonical ensembles, using the Large Deviation Cavity Method (LDCM). The solution is shown to be composed of three different branches, resulting in an non-concave entropy function.The analytical solution is confirmed with numerical Metropolis and Creutz simulations and our results clearly demonstrate the presence of a region with negative specific heat and, consequently, ensemble inequivalence between the canonical and microcanonical ensembles.
1. Monte Carlo Study of the Elastic Interaction in Heteroepitaxial Growth
B. Gonçalves, J.F.F. Mendes
Phys. Rev. E 65, 061602 (2002), arXiv:cond-mat/0204253
We have studied the island size distribution and spatial correlation function of an island growth model under the effect of an elastic interaction of the form 1/r3. The mass distribution Pn(t) that was obtained presents a pronounced peak that widens with the increase of the total coverage of the system, θ. The presence of this peak is an indication of the self-organization of the system, since it demonstrates that some sizes are more frequent than others. We have treated exactly the energy of the system using periodic boundary conditions which were used in the Monte-Carlo simulations. A discussion about the effect of different factors is presented.


13. Urban Pulse: Capturing the Rhythm of Cities
F. Miranda, H. Doraiswamy, M. Lage, K. Zhao, B. Gonçalves, L. Wilson, M. Hsieh, C. Silva
To Appear IEEE Transactions on Visualization and Computer Graphics (IEEE SciVis ’16), (2016) Video
Cities are inherently dynamic. Interesting patterns of behavior typically manifest at several key areas of a city over multiple temporal resolutions. Studying these patterns can greatly help a variety of experts ranging from city planners and architects to human behavioral experts. Recent technological innovations have enabled the collection of enormous amounts of data that can help in these studies. However, techniques using these data sets typically focus on understanding the data in the context of the city, thus failing to capture the dynamic aspects of the city. The goal of this work is to instead understand the city in the context of multiple urban data sets. To do so, we define the concept of an “urban pulse” which captures the spatio-temporal activity in a city across multiple temporal resolutions. The prominent pulses in a city are obtained using the topology of the data sets, and are characterized as a set of beats. The beats are then used to analyze and compare different pulses. We also design a visual exploration framework that allows users to explore the pulses within and across multiple cities under different conditions. Finally, we present three case studies carried out by experts from two different domains that demonstrate the utility of our framework.
12. Topical differences between Chinese language Twitter and Sina Weibo
Q. Zhang, B. Gonçalves
To Appear NLPIT'16, (2016) arXiv: 1512.07281
Sina Weibo, China's most popular microblogging platform, is currently used by over 500M users and is considered to be a proxy of Chinese social life. In this study, we contrast the discussions occurring on Sina Weibo and on Chinese language Twitter in order to observe two different strands of Chinese culture: people within China who use Sina Weibo with its government imposed restrictions and those outside that are free to speak completely anonymously. We first propose a simple ad-hoc algorithm to identify topics of Tweets and Weibo. Different from previous works on micro-message topic detection, our algorithm considers topics of the same contents but with different \#tags. Our algorithm can also detect topics for Tweets and Weibos without any \#tags. Using a large corpus of Weibo and Chinese language tweets, covering the period from January 1 to December 31, 2012, we obtain a list of topics using clustered \#tags that we can then use to compare the two platforms. Surprisingly, we find that there are no common entries among the Top 100 most popular topics. Furthermore, only 9.2% of tweets correspond to the Top 1000 topics on Sina Weibo platform, and conversely only 4.4% of weibos were found to discuss the most popular Twitter topics. Our results reveal significant differences in social attention on the two platforms, with most popular topics on Sina Weibo relating to entertainment while most tweets corresponded to cultural or political contents that is practically non existent in Sina Weibo.
11. The Role of Information Diffusion in the Evolution of Social Networks
L. Weng, J. Ratkiewicz, N. Perra, B. Gonçalves, C. Castillo, F. Bonchi, R. Schifanella, F. Menczer, A. Flammini
KDD'13, 356 (2013)arXiv: 1302.6276
Every day millions of users are connected through online social networks, generating a rich trove of data that allows us to study the mechanisms behind human interactions. Triadic closure has been treated as the major mechanism for creating social links: if Alice follows Bob and Bob follows Charlie, Alice will follow Charlie. Here we present an analysis of longitudinal micro-blogging data, revealing a more nuanced view of the strategies employed by users when expanding their social circles. While the network structure affects the spread of information among users, the network is in turn shaped by this communication activity. This suggests a link creation mechanism whereby Alice is more likely to follow Charlie after seeing many messages by Charlie. We characterize users with a set of parameters associated with different link creation strategies, estimated by a Maximum-Likelihood approach. Triadic closure does have a strong effect on link formation, but shortcuts based on traffic are another key factor in interpreting network evolution. However, individual strategies for following other users are highly heterogeneous. Link creation behaviors can be summarized by classifying users in different categories with distinct structural and behavioral characteristics. Users who are popular, active, and influential tend to create traffic-based shortcuts, making the information diffusion process more efficient in the network.
10. Dynamical Classes of Collective Attention in Twitter
J. Lehmann, B. Gonçalves, J. J. Ramasco, C. Cattuto
WWW 2012, 251 arXiv: 1111.1896
Micro-blogging systems such as Twitter expose digital traces of social discourse with an unprecedented degree of resolution of individual behaviors. They offer an opportunity to investigate how a large-scale social system responds to exogenous or endogenous stimuli, and to disentangle the temporal, spatial and topical aspects of users' activity. Here we focus on spikes of collective attention in Twitter, and specifically on peaks in the popularity of hashtags. Users employ hashtags as a form of social annotation, to define a shared context for a specific event, topic, or meme. We analyze a large-scale record of Twitter activity and find that the evolution of hastag popularity over time defines discrete classes of hashtags. We link these dynamical classes to the events the hashtags represent and use text mining techniques to provide a semantic characterization of the hastag classes. Moreover, we track the propagation of hashtags in the Twitter social network and find that epidemic spreading plays a minor role in hastag popularity, which is mostly driven by exogenous factors.
9. Predicting the Political Alignment of Twitter Users
M. Conover, B. Gonçalves, J. Ratkiewicz, A. Flammini, F. Menczer
Third IEEE International Conference on Social Computing, 192 (2011)
8. Detecting and Tracking Political Abuse in Social Media
J. Ratkiewicz, M. Conover, M. Meiss, B. Gonçalves, A. Flammini, F. Menczer
Fifth International AAAI Conference on Weblogs and Social Media, 297 (2011)
7. Political Polarization on Twitter
M. Conover, J. Ratkiewicz, , J. M. Francisco, B. Gonçalves, A. Flammini, F. Menczer
Fifth International AAAI Conference on Weblogs and Social Media, 89 (2011)
6. Truthy: Mapping the Spread of Astroturf in Microblog Streams
J. Ratkiewicz, M. Conover, M. Meiss, B. Gonçalves, S. Patil, A. Flammini, F. Menczer
WWW 2011, 249 (2011), arXiv: 1011.3768 (Long Version)
Online social media are complementing and in some cases replacing person-to-person social interaction and redefining the diffusion of information. In particular, microblogs have become crucial grounds on which public relations, marketing, and political battles are fought. We introduce an extensible framework that will enable the real-time analysis of meme diffusion in social media by mining, visualizing, mapping, classifying, and modeling massive streams of public microblogging events. We describe a Web service that leverages this framework to track political memes in Twitter and help detect astroturfing, smear campaigns, and other misinformation in the context of U.S. political elections. We present some cases of abusive behaviors uncovered by our service. Finally, we discuss promising preliminary results on the detection of suspicious memes via supervised learning based on features extracted from the topology of the diffusion networks, sentiment analysis, and crowdsourced annotations.
5. Modeling Traffic on the Web Graph
M. Meiss, B. Gonçalves, J. J. Ramasco, A. Flammini, F. Menczer
7th Workshop on Algorithms and Models for the Web Graph, LNCS 6516, 50 (2010)
4. Agents, Bookmarks and Clicks: A topical model of Web traffic
M. Meiss, B. Gonçalves, J. J. Ramasco, A. Flammini, F. Menczer
Proceedings of the 21th ACM conference on Hypertext and hypermedia, 229 (2010), arXiv:1003.5327
Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual statistics, such as entropy and session size. No model currently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branching mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by visiting novel pages of topical interest, with adjacent pages being more topically related to each other than distant ones. This modulates the probability that an agent continues to browse or starts a new session, allowing us to recreate heterogeneous session lengths. The resulting model is capable of reproducing the collective and individual behaviors we observe in the empirical data, reconciling the narrowly focused browsing patterns of individual users with the extreme heterogeneity of aggregate traffic measurements. This result allows us to identify a few salient features that are necessary and sufficient to interpret the browsing patterns observed in our data. In addition to the descriptive and explanatory power of such a model, our results may lead the way to more sophisticated, realistic, and effective ranking and crawling algorithms.
3. What's in a Session: Tracking Individual Behavior on the Web
M. Meiss, J. Duncan, B. Gonçalves, J. J. Ramasco, F. Menczer
Proceedings of the 20th ACM conference on Hypertext and hypermedia 173 (2009), arXiv:1003.5325
We examine the properties of all HTTP requests generated by a thousand undergraduates over a span of two months. Preserving user identity in the data set allows us to discover novel properties of Web traffic that directly affect models of hypertext navigation. We find that the popularity of Web sites -- the number of users who contribute to their traffic -- lacks any intrinsic mean and may be unbounded. Further, many aspects of the browsing behavior of individual users can be approximated by log-normal distributions even though their aggregate behavior is scale-free. Finally, we show that users' click streams cannot be cleanly segmented into sessions using timeouts, affecting any attempt to model hypertext navigation using statistics of individual sessions. We propose a strictly logical definition of sessions based on browsing activity as revealed by referrer URLs; a user may have several active sessions in their click stream at any one time. We demonstrate that applying a timeout to these logical sessions affects their statistics to a lesser extent than a purely timeout-based mechanism.
2. Remembering what we like: Toward an agent-based model of Web traffic
B. Gonçalves, M. R. Meiss, J. J. Ramasco, A. Flammini, F. Menczer
WSDM 2009 Late Breaking Results (2009), arXiv:0901.3839
Analysis of aggregate Web traffic has shown that PageRank is a poor model of how people actually navigate the Web. Using the empirical traffic patterns generated by a thousand users over the course of two months, we characterize the properties of Web traffic that cannot be reproduced by Markovian models, in which destinations are independent of past decisions. In particular, we show that the diversity of sites visited by individual users is smaller and more broadly distributed than predicted by the PageRank model; that link traffic is more broadly distributed than predicted; and that the time between consecutive visits to the same site by a user is less broadly distributed than predicted. To account for these discrepancies, we introduce a more realistic navigation model in which agents maintain individual lists of bookmarks that are used as teleportation targets. The model can also account for branching, a traffic property caused by browser features such as tabs and the back button. The model reproduces aggregate traffic patterns such as site popularity, while also generating more accurate predictions of diversity, link traffic, and return time distributions. This model for the first time allows us to capture the extreme heterogeneity of aggregate traffic measurements while explaining the more narrowly focused browsing patterns of individual users.
1. Towards the characterization of individual users through Web analytics
B. Gonçalves, J. Ramasco
Complex Sciences 2247 (2009), arXiv:0901.0498
We perform an analysis of the way individual users navigate in the Web. We focus primarily in the temporal patterns of they return to a given page. The return probability as a function of time as well as the distribution of time intervals between consecutive visits are measured and found to be independent of the level of activity of single users. The results indicate a rich variety of individual behaviors and seem to preclude the possibility of defining a characteristic frequency for each user in his/her visits to a single site.