Selenium is a cool toolkit to drive the browser from your favorite programming language. Born for testing, it's perfect for scraping.
So, when I hit a dynamic page this is what I do
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.common.by import By from selenium import webdriver # Start the WebDriver and load the page wd = webdriver.Firefox() wd.get(URL) # Wait for the dynamically loaded elements to show up WebDriverWait(wd, 10).until( EC.visibility_of_element_located((By.CLASS_NAME, "pricerow"))) # And grab the page HTML source html_page = wd.page_source wd.quit() # Now you can use html_page as you like from bs4 import BeautifulSoup soup = BeautifulSoup(html_page)
You could even do the scraping with Selenium, but I load the HTML into BeautifulSoup because:
- I'm a BeautifulSoup junkie
- Selenium API is pragmatic, a bit too much, and not Pytonic at all. Yeah, you pass a tuple to
- Selenium docs are... umh, enterprisey