This paper proposes a method and empirical pieces of evidence to investigate the claim commonly made that proxy services used by web scraping bots have millions of residential IPs at their disposal. Using a real-world setup, we have had access to the logs of close to 20 heavily targeted websites and have carried out an experiment over a two months period. Based on the gathered empirical pieces of evidence, we propose mathematical models that indicate that the amount of IPs is likely 2 to 3 orders of magnitude smaller than the one claimed. This finding suggests that an IP reputation-based blocking strategy could be effective, contrary to what operators of these websites think today.
|Original language||English (US)|
|Title of host publication||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|Publisher||Springer Science and Business Media Deutschland GmbH|
|Number of pages||16|
|State||Published - Jan 1 2021|