design_topic_spider.py 文件源码

python

阅读 24 收藏 0 点赞 0 评论 0

项目：decoration-design-crawler 作者: imflyn 项目源码文件源码

def parse_list(self, response):
        selector = Selector(response)
        items_selector = selector.xpath('//div[@class="xgt_topic"]')
        for item_selector in items_selector:
            # /topic/7334.html
            href = item_selector.xpath('div//a/@href').extract()[0]
            href = href.strip()
            # http://xiaoguotu.to8to.com/topic/7334.html
            next_url = (constant.PROTOCOL_HTTP + self.start_url_domain + href)
            if self.design_topic_service.is_duplicate_url(next_url):
                continue
            yield scrapy.Request(next_url, self.parse_content)

评论列表正在加载评论...

文章目录

提
问题

写
面经

写
文章

微信
公众号

扫码关注公众号