TY - GEN

T1 - To sample or to smash? Estimating reachability in large time-varying graphs

AU - Basu, Prithwish

AU - Yu, Feng

AU - Bar-Noy, Amotz

AU - Rawitz, Dror

N1 - Publisher Copyright: Copyright © SIAM.

PY - 2014

Y1 - 2014

N2 - Time-varying graphs (T-graph) consist of a time-evolving set of graph snapshots (or graphlets). A T-graph property with potential applications in both computer and social network forensics is T-reachability, which identifies the nodes reachable from a source node using the T-graph edges over time period T. In this paper, we consider the problem of estimating the T-reachable set of a source node in two different settings - when a time-evolution of a T-graph is specified by a probabilistic model, and when the actual T-graph snapshots are known and given to us offline ("data aware" setting). Since the value of T could be large in many applications, we propose two simple techniques, namely T-graph sampling and T-graph smashing for significantly reducing the complexity of this computation, while minimizing the estimation error. We show that for the data-aware case, both T-graph sampling and smashing problems are NP-hard, but they are amenable to reasonably good approximations. We also showthat for the probabilistic setting where each graphlet in a T-graph is an Erdos-Renyi random graph, sampling yields a loose lower bound for the T-reachable set, while different styles of smashing yield more useful upper and lower bounds. Finally, we show that our algorithms (both dataaware and data-oblivious) can estimate the T-reachable set in real world time-varying networks within reasonable accuracy using less than 0.5% of the number of graphlets.

AB - Time-varying graphs (T-graph) consist of a time-evolving set of graph snapshots (or graphlets). A T-graph property with potential applications in both computer and social network forensics is T-reachability, which identifies the nodes reachable from a source node using the T-graph edges over time period T. In this paper, we consider the problem of estimating the T-reachable set of a source node in two different settings - when a time-evolution of a T-graph is specified by a probabilistic model, and when the actual T-graph snapshots are known and given to us offline ("data aware" setting). Since the value of T could be large in many applications, we propose two simple techniques, namely T-graph sampling and T-graph smashing for significantly reducing the complexity of this computation, while minimizing the estimation error. We show that for the data-aware case, both T-graph sampling and smashing problems are NP-hard, but they are amenable to reasonably good approximations. We also showthat for the probabilistic setting where each graphlet in a T-graph is an Erdos-Renyi random graph, sampling yields a loose lower bound for the T-reachable set, while different styles of smashing yield more useful upper and lower bounds. Finally, we show that our algorithms (both dataaware and data-oblivious) can estimate the T-reachable set in real world time-varying networks within reasonable accuracy using less than 0.5% of the number of graphlets.

UR - http://www.scopus.com/inward/record.url?scp=84959883523&partnerID=8YFLogxK

U2 - https://doi.org/10.1137/1.9781611973440.112

DO - https://doi.org/10.1137/1.9781611973440.112

M3 - منشور من مؤتمر

T3 - SIAM International Conference on Data Mining 2014, SDM 2014

SP - 983

EP - 991

BT - SIAM International Conference on Data Mining 2014, SDM 2014

A2 - Zaki, Mohammed

A2 - Obradovic, Zoran

A2 - Ning-Tan, Pang

A2 - Banerjee, Arindam

A2 - Kamath, Chandrika

A2 - Parthasarathy, Srinivasan

PB - Society for Industrial and Applied Mathematics Publications

T2 - 14th SIAM International Conference on Data Mining, SDM 2014

Y2 - 24 April 2014 through 26 April 2014

ER -