nhsWebScraper.py 文件源码

python
阅读 19 收藏 0 点赞 0 评论 0

项目:nhs-crawler 作者: snava10 项目源码 文件源码
def run(args):
    elasticsearchServer = args[0] if len(args) else 'localhost:9200'
    indexName = 'nhs_conditions'
    docType = 'condition'

    es = Elasticsearch(elasticsearchServer)
    es.indices.delete(index=indexName, ignore=[400,404])

    f = open('nhsPageContent','w')
    f.write('[')
    for model in get_pages_info_models('http://www.nhs.uk/Conditions/Pages/hub.aspx'):
        json = model.to_json()
        es.index(index=indexName, doc_type=docType, body=json)
        f.write(json + ",\n")

    f.write(']')
    f.close()
    es.indices.refresh(index=indexName)
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号