practise_spider.py 文件源码-python代码片段

practise_spider.py 文件源码

python

阅读 24 收藏 0 点赞 0 评论 0

def start_requests(self):
        """Makes the initial request to the page you want to scrape.
        Returns an iterable of Requests, which the Spider can crawl.
        More requests will be generated successively from initial requests."""
        urls = [
            'https://www.dice.com/jobs/detail/Etl%26%2347Informatica-Production-Support-%26%2347Developer-Pyramid-Consulting%2C-Inc.-Bellevue-WA-98006/pyrmid/16-32835?icid=sr1-1p&q=pyramid&l=Seattle,%20WA',
        ]

        for url in urls:
            """For each url you're sending the spider to, make a request.
            Run parse() on the response object you get back."""
            yield scrapy.Request(url=url, callback=self.parse)