TY - GEN
T1 - To sample or to smash? Estimating reachability in large time-varying graphs
AU - Basu, Prithwish
AU - Yu, Feng
AU - Bar-Noy, Amotz
AU - Rawitz, Dror
N1 - Publisher Copyright: Copyright © SIAM.
PY - 2014
Y1 - 2014
N2 - Time-varying graphs (T-graph) consist of a time-evolving set of graph snapshots (or graphlets). A T-graph property with potential applications in both computer and social network forensics is T-reachability, which identifies the nodes reachable from a source node using the T-graph edges over time period T. In this paper, we consider the problem of estimating the T-reachable set of a source node in two different settings - when a time-evolution of a T-graph is specified by a probabilistic model, and when the actual T-graph snapshots are known and given to us offline ("data aware" setting). Since the value of T could be large in many applications, we propose two simple techniques, namely T-graph sampling and T-graph smashing for significantly reducing the complexity of this computation, while minimizing the estimation error. We show that for the data-aware case, both T-graph sampling and smashing problems are NP-hard, but they are amenable to reasonably good approximations. We also showthat for the probabilistic setting where each graphlet in a T-graph is an Erdos-Renyi random graph, sampling yields a loose lower bound for the T-reachable set, while different styles of smashing yield more useful upper and lower bounds. Finally, we show that our algorithms (both dataaware and data-oblivious) can estimate the T-reachable set in real world time-varying networks within reasonable accuracy using less than 0.5% of the number of graphlets.
AB - Time-varying graphs (T-graph) consist of a time-evolving set of graph snapshots (or graphlets). A T-graph property with potential applications in both computer and social network forensics is T-reachability, which identifies the nodes reachable from a source node using the T-graph edges over time period T. In this paper, we consider the problem of estimating the T-reachable set of a source node in two different settings - when a time-evolution of a T-graph is specified by a probabilistic model, and when the actual T-graph snapshots are known and given to us offline ("data aware" setting). Since the value of T could be large in many applications, we propose two simple techniques, namely T-graph sampling and T-graph smashing for significantly reducing the complexity of this computation, while minimizing the estimation error. We show that for the data-aware case, both T-graph sampling and smashing problems are NP-hard, but they are amenable to reasonably good approximations. We also showthat for the probabilistic setting where each graphlet in a T-graph is an Erdos-Renyi random graph, sampling yields a loose lower bound for the T-reachable set, while different styles of smashing yield more useful upper and lower bounds. Finally, we show that our algorithms (both dataaware and data-oblivious) can estimate the T-reachable set in real world time-varying networks within reasonable accuracy using less than 0.5% of the number of graphlets.
UR - http://www.scopus.com/inward/record.url?scp=84959883523&partnerID=8YFLogxK
U2 - 10.1137/1.9781611973440.112
DO - 10.1137/1.9781611973440.112
M3 - منشور من مؤتمر
T3 - SIAM International Conference on Data Mining 2014, SDM 2014
SP - 983
EP - 991
BT - SIAM International Conference on Data Mining 2014, SDM 2014
A2 - Zaki, Mohammed
A2 - Obradovic, Zoran
A2 - Ning-Tan, Pang
A2 - Banerjee, Arindam
A2 - Kamath, Chandrika
A2 - Parthasarathy, Srinivasan
PB - Society for Industrial and Applied Mathematics Publications
T2 - 14th SIAM International Conference on Data Mining, SDM 2014
Y2 - 24 April 2014 through 26 April 2014
ER -