USTC_Today3.0.py 文件源码

python
阅读 22 收藏 0 点赞 0 评论 0

项目:USTC-Today 作者: HengRuiZ 项目源码 文件源码
def search(key_word):
    global x
    search_url='http://news.sogou.com/news?ie=utf8&p=40230447&interV=kKIOkrELjboMmLkEkLoTkKIMkLELjb8TkKIMkrELjboImLkEk74TkKILmrELjbgRmLkEkLY=_485898072&query=%E4%B8%AD%E7%A7%91%E5%A4%A7&'
    req=urllib2.urlopen(search_url.replace('key_word',key_word))
    real_visited=0
    html=req.read()
    soup=BeautifulSoup(html)
    #print soup
    content  = soup.findAll(name="a",attrs={"href":True,"data-click":True,"target":True}) #resultset object
    num = len(content)
    #print num
    for i in range(9):
        #???????????????????url
        p_str= content[2*i] #if no result then nontype object
        tit[i]=p_str.renderContents()
        tit[i]=tit[i].decode('utf-8', 'ignore')#need it
        tit[i]= re.sub("<[^>]+>","",tit[i])
        print(tit[i])
        url[i]=str(p_str.get("href"))
        print(url[i])
        #???????url???
        img[i]=getimg(url[i])
        w, h = img[i].size
        img[i]=resize(w,h, w_box, h_box,img[i])
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号