RussianTextPreprocessing.py 文件源码

python

阅读 32 收藏 0 点赞 0 评论 0

项目：keras-textgen 作者: kenoma 项目源码文件源码

def get_continuous_chunks(self, text):
         chunked = nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(text)))
         prev = None
         continuous_chunk = []
         current_chunk = []
         for i in chunked:
                 if type(i) == nltk.Tree:
                         current_chunk.append(" ".join([token for token, pos in i.leaves()]))
                 elif current_chunk:
                         named_entity = " ".join(current_chunk)
                         if named_entity not in continuous_chunk:
                                 continuous_chunk.append(named_entity)
                                 current_chunk = []
                 else:
                         continue
         return continuous_chunk

评论列表正在加载评论...

文章目录

提
问题

写
面经

写
文章

微信
公众号

扫码关注公众号