Python Web Scraping Tools: A Survey

  • 2018-04-27 02:56 AM
  • 179

There are myriad web scraping tools available in Python spanning a broad range of use cases. At the same time there are many surprising gaps in coverage. Further complicating matters, differences which look innocuous in a browser can have an outsized impact on the design of an automated browsing system. In this talk we survey a collection of common web scraping frameworks and work out a mapping from real-world use cases to packages. Along the way we address common questions like: How do I choose among content parsers? What if a page is dominated by JavaScript or HTML5? If I'm going to control a browser which one should I choose? Can I run this in the cloud with no access to a display? Can I download files?