NLTK将实体识别命名为Python列表

发布于 2021-01-29 19:29:16

我使用NLTKne_chunk从文本中提取命名实体:

my_sent = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement."


nltk.ne_chunk(my_sent, binary=True)

但是我不知道如何将这些实体保存到列表中?例如–

print Entity_list
('WASHINGTON', 'New York', 'Loretta', 'Brooklyn', 'African')

谢谢。

关注者
0
被浏览
102
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    nltk.ne_chunk返回嵌套nltk.tree.Tree对象,因此您必须遍历该Tree对象才能到达网元。

    看看带有正则表达式的命名实体识别:NLTK

    >>> from nltk import ne_chunk, pos_tag, word_tokenize
    >>> from nltk.tree import Tree
    >>> 
    >>> def get_continuous_chunks(text):
    ...     chunked = ne_chunk(pos_tag(word_tokenize(text)))
    ...     continuous_chunk = []
    ...     current_chunk = []
    ...     for i in chunked:
    ...             if type(i) == Tree:
    ...                     current_chunk.append(" ".join([token for token, pos in i.leaves()]))
    ...             if current_chunk:
    ...                     named_entity = " ".join(current_chunk)
    ...                     if named_entity not in continuous_chunk:
    ...                             continuous_chunk.append(named_entity)
    ...                             current_chunk = []
    ...             else:
    ...                     continue
    ...     return continuous_chunk
    ... 
    >>> my_sent = "WASHINGTON -- In the wake of a string of abuses by New York police officers in the 1990s, Loretta E. Lynch, the top federal prosecutor in Brooklyn, spoke forcefully about the pain of a broken trust that African-Americans felt and said the responsibility for repairing generations of miscommunication and mistrust fell to law enforcement."
    >>> get_continuous_chunks(my_sent)
    ['WASHINGTON', 'New York', 'Loretta E. Lynch', 'Brooklyn']
    
    
    >>> my_sent = "How's the weather in New York and Brooklyn"
    >>> get_continuous_chunks(my_sent)
    ['New York', 'Brooklyn']
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看