修改scrapy-selenium支持docker-selenium

scrapy-selenium是一个Scrapy的中间件,用于让Scrapy支持通过selenium driver访问网页。但默认情况下只支持本地的WebDriver。

scrapy-selenium在和selenium的会话管理上相对完善和稳定,所以,派生一个SeleniumMiddleware的子类,重写即可。

class RemoteSeleniumMiddleware(SeleniumMiddleware):

    def __init__(self, command_executor, desired_capabilities):
        self.driver = driver = webdriver.Remote(
            command_executor=command_executor,
            desired_capabilities=desired_capabilities)

    @classmethod
    def from_crawler(cls, crawler):
        command_executor = crawler.settings.get('SELENIUM_COMMAND_EXECUTOR')
        desired_capabilities = crawler.settings.get('SELENIUM_DESIRED_CAPABILITIES')

        middleware = cls(
            command_executor=command_executor,
            desired_capabilities=desired_capabilities,
        )

        crawler.signals.connect(middleware.spider_closed, signals.spider_closed)

        return middleware

对应的settings.py中,添加连接参数即可:

SELENIUM_COMMAND_EXECUTOR = "http://docker-host:4444/wd/hub"

SELENIUM_DESIRED_CAPABILITIES = DesiredCapabilities.CHROME

发表评论