SoftmaxRegression.py 文件源码

python
阅读 22 收藏 0 点赞 0 评论 0

项目:Text-Classifier 作者: daniellaah 项目源码 文件源码
def words_extract(news_folder):
    """??????????
    Args:
        news_folder/
            ??/
            ??/
            ??/
    """
    subfolder_list = [subfolder for subfolder in os.listdir(news_folder) \
                        if os.path.isdir(os.path.join(news_folder, subfolder))]
    data_list = [] # element: ([word1, word2, ...], "??")

    jieba.enable_parallel(4)
    # ??????????
    for subfolder in subfolder_list:
        news_class = subfolder
        subfolder = os.path.join(news_folder, subfolder)
        news_list = [os.path.join(subfolder, news) for news in os.listdir(subfolder) \
                        if os.path.isfile(os.path.join(subfolder, news))]
        for news in news_list:
            with open(news, 'r') as f:
               content = f.read()
            word_list = jieba.lcut(content)
            data_list.append((word_list,news_class)) # element: ([word1, word2, ...], "??")
    jieba.disable_parallel()
    return data_list
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号