Abstract
This paper proposes a method and empirical pieces of evidence to investigate the claim commonly made that proxy services used by web scraping bots have millions of residential IPs at their disposal. Using a real-world setup, we have had access to the logs of close to 20 heavily targeted websites and have carried out an experiment over a two months period. Based on the gathered empirical pieces of evidence, we propose mathematical models that indicate that the amount of IPs is likely 2 to 3 orders of magnitude smaller than the one claimed. This finding suggests that an IP reputation-based blocking strategy could be effective, contrary to what operators of these websites think today.
Original language | English (US) |
---|---|
Title of host publication | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 596-611 |
Number of pages | 16 |
ISBN (Print) | 9783030763510 |
DOIs | |
State | Published - Jan 1 2021 |
Externally published | Yes |