craw_ptt.py 文件源码

python
阅读 26 收藏 0 点赞 0 评论 0

项目:Crawler_and_Share 作者: f496328mm 项目源码 文件源码
def craw_last_index(ptt_class_name):   
    #ptt_class_name = 'Soft_Job'
    index_url = 'https://www.ptt.cc/bbs/' + ptt_class_name + '/index.html'
    res = requests.get(index_url,verify = True)
    soup3 = BeautifulSoup(res.text, "lxml")   

    x = soup3('',{'class':"btn wide"},text = re.compile('??'))
    last_index = x[0]['href']
    last_index = last_index.replace('/bbs/' + ptt_class_name + '/index','')
    last_index = int( last_index.replace('.html','') )+1

    return last_index
#--------------------------------------------------------------------------------- 
# ?? ubuntu - crontab-e, ????, ??????? data 
# ?? PTT ????, ???????, ??????, 
# ??????DATA, ???? index ??????, ??????? data,
# ?????, ??????
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号