使用特殊的分隔线将文本文件拆分为部分-python

发布于 2021-01-29 14:10:05

我有这样的输入文件:

This is a text block start
This is the end

And this is another
with more than one line
and another line.

所需的任务是按由特殊行分隔的部分读取文件,在这种情况下,该行为空行,例如[out]:

[['This is a text block start', 'This is the end'],
['And this is another','with more than one line', 'and another line.']]

通过这样做,我一直在获得所需的输出:

def per_section(it):
    """ Read a file and yield sections using empty line as delimiter """
    section = []
    for line in it:
        if line.strip('\n'):
            section.append(line)
        else:
            yield ''.join(section)
            section = []
    # yield any remaining lines as a section too
    if section:
        yield ''.join(section)

但是,如果特殊行是以#例如以下开头的行:

# Some comments, maybe the title of the following section
This is a text block start
This is the end
# Some other comments and also the title
And this is another
with more than one line
and another line.

我必须这样做:

def per_section(it):
    """ Read a file and yield sections using empty line as delimiter """
    section = []
    for line in it:
        if line[0] != "#":
            section.append(line)
        else:
            yield ''.join(section)
            section = []
    # yield any remaining lines as a section too
    if section:
        yield ''.join(section)

如果我允许per_section()拥有分隔符参数,则可以尝试以下操作:

def per_section(it, delimiter== '\n'):
    """ Read a file and yield sections using empty line as delimiter """
    section = []
    for line in it:
        if line.strip('\n') and delimiter == '\n':
            section.append(line)
        elif delimiter= '\#' and line[0] != "#":
            section.append(line)
        else:
            yield ''.join(section)
            section = []
    # yield any remaining lines as a section too
    if section:
        yield ''.join(section)

但是有没有办法我不对所有可能的分隔符进行硬编码?

关注者
0
被浏览
156
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    传递谓词怎么样?

    def per_section(it, is_delimiter=lambda x: x.isspace()):
        ret = []
        for line in it:
            if is_delimiter(line):
                if ret:
                    yield ret  # OR  ''.join(ret)
                    ret = []
            else:
                ret.append(line.rstrip())  # OR  ret.append(line)
        if ret:
            yield ret
    

    用法:

    with open('/path/to/file.txt') as f:
        sections = list(per_section(f))  # default delimiter
    
    with open('/path/to/file.txt.txt') as f:
        sections = list(per_section(f, lambda line: line.startswith('#'))) # comment
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看