util.py 文件源码

python
阅读 20 收藏 0 点赞 0 评论 0

项目:oadoi 作者: Impactstory 项目源码 文件源码
def normalize_title(title):
    if not title:
        return ""

    # just first n characters
    response = title[0:500]

    # lowercase
    response = response.lower()

    # deal with unicode
    response = unidecode(unicode(response))

    # has to be before remove_punctuation
    # the kind in titles are simple <i> etc, so this is simple
    response = clean_html(response)

    # remove articles and common prepositions
    response = re.sub(ur"\b(the|a|an|of|to|in|for|on|by|with|at|from)\b", u"", response)

    # remove everything except alphas
    response = remove_everything_but_alphas(response)

    return response
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号