An Algorithm for Representing a Large Online Relationship Network: Focusing on a Small Dynamic Activity Graph

Research output: Contribution to conferencePaperpeer-review

Abstract

To access and analyze a large online social network (OSN) with millions of users, such as a relationship network (e.g. Facebook), efficient computational methods must be utilized. Network size or data accessibility (e.g. API limitations), however, make analysis of a relationship network difficult if not impossible. To cope with this problem, we propose an algorithm for analyzing the dynamic activity graph of user interactions online (e.g., content sharing or tweeting) that refelects the relationship network, instead of the large relationship network (represented as a graph). To make the computations required for OSN analysis feasible and manageable, the proposed algorithm generates a representative sub-graph of the large relationship network based on the dynamic activity graph. Thus, a complicated analysis of the large relationship graph is reduced to a simpler analysis of the representative sub-graph. Recently-published reasearch suggest the use of graph sampling algorithms to cope with analyses of large graphs. However, these algorithms assume that access to the relationship graph is feasible and, hence, direct sampling is possible. In this research, instead of assuming feasibility of access to the relationship graph, our algorithm utilizes the smaller dynamic activity graph to generate a representative and unbiased sub-graph of the large relationship graph. The datasets used to evaluate the proposed algorithm are based on two Facebook (FB) networks and two Twitter (TW) networks. The first FB network describes a friendship relationship network which is represented by a static directed graph, with 63,731 nodes and 1,545,686 edges. In FB, a user can interact with friends by posting comments to their wall, and the second FB network is thus an activity network which represents the dynamic wall activity during 52 months of users in the first FB network, with 13,478 nodes and 16,624 edges. The first TW network is a static relationship network with 456,626 nodes and 14,855,842 edges represented as a directed graph, where each node is a tweet author and each edge is a representative of follower or being-followed relationship. The second TW network is a dynamic activity network which describes retweet, mention, and reply user interactions in the first TW network, with a total of 304,691 nodes and 461,192 edges, after splitting by one-hour intervals that yielded 168 observations. After defining a set of graphs properties, performance of the proposed algorithm was tested, in terms of preserving the following node average and distribution statistics, for different parameters and varying sample sizes: the degree distribution, the clustering coefficient, and the path-length. Overall, forest fire sampling (FFS) performed best among all algorithms that must access the relationship graph (average D-statistic 0.27 in TW, and 0.29 in FB). Whereas our algorithm closely followed FFS in TW (average D-statistic of 0.29) and outperformed FFS (average D-statistic of 0.23) on FB.
Original languageAmerican English
StatePublished - 28 Jun 2018
EventXXXVIII Sunbelt 2018, Utrecht - Utrecht University, Utrecht, Netherlands
Duration: 27 Jun 20181 Jul 2018
https://na.eventscloud.com/ehome/288996/703997

Conference

ConferenceXXXVIII Sunbelt 2018, Utrecht
Country/TerritoryNetherlands
CityUtrecht
Period27/06/181/07/18
Internet address

Fingerprint

Dive into the research topics of 'An Algorithm for Representing a Large Online Relationship Network: Focusing on a Small Dynamic Activity Graph'. Together they form a unique fingerprint.

Cite this