pys.py 文件源码

python
阅读 24 收藏 0 点赞 0 评论 0

项目:ProxyYourSpider 作者: rafacheng 项目源码 文件源码
def fillProxyPool(self):
        global offset
        while self.llen < self.size:
            url = self.url + '&offset=' + str(offset)
            offset += 50
            ua = UserAgent()
            headers = {'User-Agent' : ua.random}
            response = requests.get(url, headers=headers)
            soup = BeautifulSoup(response.text, 'lxml')
            lists = soup.find('tbody').find_all('tr')
            for ls in lists:
                tds = ls.find_all('td')
                proxy = ''.join(tds[0].text.split())
                _type = ''.join(tds[1].text.split()).lower()
                validity = self.checkValidity(_type, proxy)
                if validity == True:
                    self.r.lpush(_type, proxy)
                    print '1 proxy added: %s. http: %d; https: %s.' \
                            %(proxy, self.r.llen('http'), self.r.llen('https'))
            self.__class__.llen += self.r.llen('http') + self.r.llen('https')
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号