word_segment.py 文件源码

python
阅读 24 收藏 0 点赞 0 评论 0

项目:finance_news_analysis 作者: pskun 项目源码 文件源码
def word_segment(line, stop=False, remain_number=True):
    '''
    ???????
    stop ??????
    '''
    if STOP_WORDS is None:
        load_stopwords()
    seg_list = jieba.cut(line, HMM=True)
    sl = []
    for word in seg_list:
        word = word.strip()
        if len(word) > 0 and word not in PUNCT:
            if stop:
                if word in STOP_WORDS:
                    word = None
            if word is not None and not remain_number:
                if util_func.atof(word) is not None:
                    word = None
            if word is not None:
                sl.append(word)
    return sl
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号