Abstract
Target Oriented Network Intelligence Collection (TONIC) is a crawling process whose goal is to find social network profiles that contain information about a given target. Such profiles are called leads and the TONIC problem is how to minimize crawling costs incurred while finding them. We model this problem as a search problem in an unknown graph and present a best-first search approach for solving it. Three key challenges are (1) which profiles to consider crawling to, (2) how to prioritize the crawling order, and (3) when additional crawling is not worthwhile. For the first challenge, we propose two frameworks: the Restricted TONIC Framework (RTF), that restricts the search to immediate neighbors of previously found leads, and the Extended TONIC Framework (ETF), that extends the scope of the search to a wider neighborhood. Guidelines for when to choose which framework are provided. For the second challenge, we propose a set of effective topology-based heuristics that guide the search towards profiles that are more likely to be leads. For the third challenge, we propose to use data collected in previously executed crawls to learn when additional crawling is expected to be useful.
Original language | American English |
---|---|
Pages (from-to) | 1447-1480 |
Number of pages | 34 |
Journal | World Wide Web |
Volume | 22 |
Issue number | 4 |
DOIs | |
State | Published - 15 Jul 2019 |
Keywords
- Artificial intelligence
- Heuristic search
- Online social networks
All Science Journal Classification (ASJC) codes
- Software
- Hardware and Architecture
- Computer Networks and Communications