使用Python获取文件的最后n行,类似于tail

发布于 2021-02-02 23:20:26

我正在为Web应用程序编写日志文件查看器,为此,我想在日志文件的各行中进行分页。文件中的项目是基于行的,底部是最新的项目。

因此,我需要一种tail()可以n从底部读取行并支持偏移量的方法。我想到的是这样的:

def tail(f, n, offset=0):
    """Reads a n lines from f with an offset of offset lines."""
    avg_line_length = 74
    to_read = n + offset
    while 1:
        try:
            f.seek(-(avg_line_length * to_read), 2)
        except IOError:
            # woops.  apparently file is smaller than what we want
            # to step back, go to the beginning instead
            f.seek(0)
        pos = f.tell()
        lines = f.read().splitlines()
        if len(lines) >= to_read or pos == 0:
            return lines[-to_read:offset and -offset or None]
        avg_line_length *= 1.3

这是合理的方法吗?建议使用带偏移量尾部日志文件的推荐方式是什么?

关注者
0
被浏览
152
1 个回答
  • 面试哥
    面试哥 2021-02-02
    为面试而生,有面试问题,就找面试哥。
    def tail(f, n, offset=None):
        """Reads a n lines from f with an offset of offset lines.  The return
        value is a tuple in the form ``(lines, has_more)`` where `has_more` is
        an indicator that is `True` if there are more lines in the file.
        """
        avg_line_length = 74
        to_read = n + (offset or 0)
    
        while 1:
            try:
                f.seek(-(avg_line_length * to_read), 2)
            except IOError:
                # woops.  apparently file is smaller than what we want
                # to step back, go to the beginning instead
                f.seek(0)
            pos = f.tell()
            lines = f.read().splitlines()
            if len(lines) >= to_read or pos == 0:
                return lines[-to_read:offset and -offset or None], \
                       len(lines) > to_read or pos > 0
            avg_line_length *= 1.3
    


  • 面试哥
    面试哥 2021-02-02
    为面试而生,有面试问题,就找面试哥。

    这可能比你的要快。不假设行长。一次返回一个文件块,直到找到正确数量的'\ n'字符为止。

    def tail( f, lines=20 ):
        total_lines_wanted = lines
    
        BLOCK_SIZE = 1024
        f.seek(0, 2)
        block_end_byte = f.tell()
        lines_to_go = total_lines_wanted
        block_number = -1
        blocks = [] # blocks of size BLOCK_SIZE, in reverse order starting
                    # from the end of the file
        while lines_to_go > 0 and block_end_byte > 0:
            if (block_end_byte - BLOCK_SIZE > 0):
                # read the last block we haven't yet read
                f.seek(block_number*BLOCK_SIZE, 2)
                blocks.append(f.read(BLOCK_SIZE))
            else:
                # file too small, start from begining
                f.seek(0,0)
                # only read what was not read
                blocks.append(f.read(block_end_byte))
            lines_found = blocks[-1].count('\n')
            lines_to_go -= lines_found
            block_end_byte -= BLOCK_SIZE
            block_number -= 1
        all_read_text = ''.join(reversed(blocks))
        return '\n'.join(all_read_text.splitlines()[-total_lines_wanted:])
    

    我不喜欢关于行长的棘手假设,实际上,你永远都不知道那样的事情。

    通常,这将在循环的第一遍或第二遍中定位最后20行。如果你的74个字符实际上是准确的,则将块大小设置为2048,并且几乎立即尾随20行。

    另外,我不会消耗大量的大脑卡路里来尝试与物理OS块进行精确对齐。使用这些高级I / O程序包,我怀疑你会发现尝试在OS块边界上对齐会对性能产生任何影响。如果使用较低级别的I / O,则可能会看到加速。



知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看