lianjia_bj_zufang.py 文件源码

python
阅读 17 收藏 0 点赞 0 评论 0

项目:Crawler-Of-Lianjia 作者: tonywangcn 项目源码 文件源码
def parse(self, response):
        #l = ItemLoader(item = LianjiaItem(),response=response)
        for i in range(0,len(response.xpath("//div[@class='info-panel']/h2/a/text()").extract())):
            l = ItemLoader(item = LianjiaItem(),response=response)
            info = response.xpath("//div[@class='info-panel']/h2/a/text()").extract()[i].encode('utf-8')
            local = response.xpath("//div[@class='info-panel']").xpath(".//span[@class='region']/text()").extract()[i].encode('utf-8')
            house_layout = response.xpath("//div[@class='info-panel']").xpath(".//span[@class='zone']//text()").extract()[i].encode('utf-8')
            house_square = response.xpath("//div[@class='info-panel']").xpath(".//span[@class='meters']/text()").extract()[i].encode('utf-8')
            house_orientation = response.xpath("//div[@class='info-panel']").xpath(".//div[@class='where']//span/text()").extract()[(i + 1) * 4 - 1].encode('utf-8')
            district = response.xpath("//div[@class='info-panel']").xpath(".//div[@class='con']/a/text()").extract()[i].encode('utf-8')[:-6]
            floor = response.xpath("//div[@class='info-panel']").xpath(".//div[@class='con']//text()").extract()[(i + 1) * 5 - 3].encode('utf-8')
            building_year = response.xpath("//div[@class='info-panel']").xpath(".//div[@class='con']//text()").extract()[(i + 1) * 5 - 1].encode('utf-8')
            price_month = response.xpath("//div[@class='info-panel']").xpath(".//span[@class='num']//text()").extract()[(i + 1) * 2 - 2].encode('utf-8')
            person_views = response.xpath("//div[@class='info-panel']").xpath(".//span[@class='num']//text()").extract()[(i + 1) * 2 - 1].encode('utf-8')
            tags = []
            for j in range(0,len(response.xpath("//div[@class='view-label left']")[i].xpath(".//span//text()").extract())):
                tags.append(response.xpath("//div[@class='view-label left']")[i].xpath(".//span//text()").extract()[j].encode("utf-8"))
            l.add_value('info',info)
            l.add_value('local',local)
            l.add_value('house_layout',house_layout)
            l.add_value('house_square',house_square)
            l.add_value('house_orientation',house_orientation)
            l.add_value('district',district)
            l.add_value('floor',floor)
            l.add_value('building_year',building_year)
            l.add_value('price_month',price_month)
            l.add_value('person_views',person_views)
            l.add_value('tags',tags)
            print l
            yield l.load_item()
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号