araby.py 文件源码

python

阅读 32 收藏 0 点赞 0 评论 0

项目：tashaphyne 作者: linuxscout 项目源码文件源码

def tokenize(text = u""):
    """
    Tokenize text into words
    @param text: the input text.
    @type text: unicode.
    @return: list of words.
    @rtype: list.
    """
    if text == u'':
        return []
    else:
        #split tokens
        mylist = TOKEN_PATTERN.split(text)
        # don't remove newline \n
        mylist = [TOKEN_REPLACE.sub('',x) for x in mylist if x]            
        # remove empty substring
        mylist = [x for x in mylist if x]
        return mylist

评论列表正在加载评论...

文章目录

提
问题

写
面经

写
文章

微信
公众号

扫码关注公众号