treeBinarizer.py 文件源码

python
阅读 16 收藏 0 点赞 0 评论 0

项目:koalaNLP 作者: yuchenz 项目源码 文件源码
def binarize(line, lan = "en"):
    assert lan in ['en', 'ch'], "illegal language (en or ch): %s" % lan

    root = nltk.Tree(line)
    stack = [root]
    while stack:
        curNode = stack.pop()
        if len(curNode) > 2:
            if curNode.node == 'NP':
                rightBinarize(curNode)
            elif curNode.node == 'VP':
                if lan == 'en':
                    vvBinarize(curNode)
                elif lan == 'ch':
                    if curNode[0].node in vvTags: 
                        leftBinarize(curNode)
                    elif curNode[-1].node in vvTags: 
                        rightBinarize(curNode)
                    else:
                        vvBinarize(curNode)

        for child in curNode:
            #print >> sys.stderr, child
            if child.height() > 2:
                stack.append(child)
        continue

    return ' '.join(root.pprint().split()) + '\n'
评论列表
文章目录


问题


面经


文章

微信
公众号

扫码关注公众号