clean_text.py 文件源码

python
阅读 31 收藏 0 点赞 0 评论 0

项目:glassdoor-analysis 作者: THEdavehogue 项目源码 文件源码
def multi_scrub_text(reviews):
    '''
    Function to lemmatize text - utilizes multiprocessing for parallelization

    INPUT:
        reviews: array-like, pandas DataFrame column containing review texts

    OUTPUT:
        lemmatized: pandas DataFrame column with cleaned texts
    '''
    lemmatized = []
    cpus = cpu_count() - 1
    pool = Pool(processes=cpus)
    lemmatized = pool.map(lemmatize_text, reviews)
    pool.close()
    pool.join()
    return lemmatized
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号