pos_tagging_data.py 文件源码-python代码片段

pos_tagging_data.py 文件源码

python

阅读 31 收藏 0 点赞 0 评论 0

项目：Deep-Learning-with-Keras 作者: PacktPublishing 项目源码文件源码

def stream_reuters_documents(reuters_dir):
    """ Iterate over documents of the Reuters dataset.

    The Reuters archive will automatically be downloaded and uncompressed if
    the `data_path` directory does not exist.

    Documents are represented as dictionaries with 'body' (str),
    'title' (str), 'topics' (list(str)) keys.

    """
    parser = ReutersParser()
    for filename in glob(os.path.join(reuters_dir, "*.sgm")):
        for doc in parser.parse(open(filename, 'rb')):
            yield doc


##################### main ######################