我如何找到生产系统中Python进程中正在使用内存的内容?

发布于 2021-01-29 15:10:29

我的生产系统偶尔会出现内存泄漏,而这是我在开发环境中无法复制的。我在开发环境中使用了Python内存事件探查器(特别是Heapy),但取得了一些成功,但是它无法帮助我解决无法重现的问题,并且我不愿意使用Heapy来检测生产系统需要花点时间来完成它的工作,并且它的线程化远程接口在我们的服务器中无法正常工作。

我想我想要的是一种转储生产Python进程(或至少gc.get_objects)快照,然后离线分析快照以查看其在哪里使用内存的方法。
我如何获得像这样的python进程的核心转储?
一旦有了一个,我该如何做些有用的事情?

关注者
0
被浏览
88
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    我将从最近的经历中进一步了解布雷特的回答。推土机包
    很好的维护,尽管进步,像添加tracemalloc在Python 3.4
    STDLIB,其gc.get_objects计数图是我去到的工具来解决内存泄漏。在下面,我使用dozer > 0.7在撰写本文时尚未发布的内容(好吧,因为我最近在那里做了一些修复)。

    让我们看一个不平凡的内存泄漏。我将在此处使用Celery
    4.4,并最终揭示导致泄漏的功能(由于这是错误/功能,因此可以将其称为纯粹的错误配置,由无知引起)。所以这是一个Python 3.6 VENV
    在哪里pip install celery < 4.5。并具有以下模块。

    演示

    import time
    
    import celery
    
    
    redis_dsn = 'redis://localhost'
    app = celery.Celery('demo', broker=redis_dsn, backend=redis_dsn)
    
    @app.task
    def subtask():
        pass
    
    @app.task
    def task():
        for i in range(10_000):
            subtask.delay()
            time.sleep(0.01)
    
    
    if __name__ == '__main__':
        task.delay().get()
    

    基本上是一个计划一堆子任务的任务。有什么问题吗?

    我将用于procpath分析Celery节点的内存消耗。pip install procpath。我有4个终端:

    1. procpath record -d celery.sqlite -i1 "$..children[?('celery' in @.cmdline)]" 记录Celery节点的进程树统计信息
    2. docker run --rm -it -p 6379:6379 redis 运行Redis,它将充当Celery经纪人和结果后端
    3. celery -A demo worker --concurrency 2 用2个工人运行节点
    4. python demo.py 最后运行示例

    (4)将在2分钟内完成。

    Then I use sqliteviz (pre-built
    version
    ) to visualise what procpath
    has recorder. I drop the celery.sqlite there and use this query:

    SELECT datetime(ts, 'unixepoch', 'localtime') ts, stat_pid, stat_rss / 256.0 rss
    FROM record
    

    And in sqliteviz I create a line chart trace with X=ts, Y=rss, and add
    split transform By=stat_pid. The result chart is:

    Celery node
leak

    This shape is likely pretty familiar to anyone who fought with memory leaks.

    Finding leaking objects

    Now it’s time for dozer. I’ll show non-instrumented case (and you can
    instrument your code in similar way if you can). To inject Dozer server into
    target process I’ll use Pyrasite. There
    are two things to know about it:

    • To run it, ptrace has to be configured as “classic ptrace permissions”: echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope, which is may be a security risk
    • There are non-zero chances that your target Python process will crash

    With that caveat I:

    • pip install https://github.com/mgedmin/dozer/archive/3ca74bd8.zip (that’s to-be 0.8 I mentioned above)
    • pip install pillow (which dozer uses for charting)
    • pip install pyrasite

    After that I can get Python shell in the target process:

    pyrasite-shell 26572
    

    And inject the following, which will run Dozer’s WSGI application using
    stdlib’s wsgiref‘s server.

    import threading
    import wsgiref.simple_server
    
    import dozer
    
    
    def run_dozer():
        app = dozer.Dozer(app=None, path='/')
        with wsgiref.simple_server.make_server('', 8000, app) as httpd:
            print('Serving Dozer on port 8000...')
            httpd.serve_forever()
    
    threading.Thread(target=run_dozer, daemon=True).start()
    

    Opening http://localhost:8000 in a browser there should see something like:

    dozer

    After that I run python demo.py from (4) again and wait for it to finish.
    Then in Dozer I set “Floor” to 5000, and here’s what I see:

    dozer shows Celery
leak

    Two types related to Celery grow as the subtask are scheduled:

    • celery.result.AsyncResult
    • vine.promises.promise

    weakref.WeakMethod has the same shape and numbers and must be caused by the
    same thing.

    Finding root cause

    At this point from the leaking types and the trends it may be already clear
    what’s going on in your case. If it’s not, Dozer has “TRACE” link per type,
    which allows tracing (e.g. seeing object’s attributes) chosen object’s
    referrers (gc.get_referrers) and referents (gc.get_referents), and
    continue the process again traversing the graph.

    But a picture says a thousand words, right? So I’ll show how to use
    objgraph to render chosen object’s
    dependency graph.

    • pip install objgraph
    • apt-get install graphviz

    Then:

    • I run python demo.py from (4) again
    • in Dozer I set floor=0, filter=AsyncResult
    • and click “TRACE” which should yield

    trace

    Then in Pyrasite shell run:

    objgraph.show_backrefs([objgraph.at(140254427663376)], filename='backref.png')
    

    The PNG file should contain:

    backref
chart

    Basically there’s some Context object containing a list called _children
    that in turn is containing many instances of celery.result.AsyncResult,
    which leak. Changing Filter=celery.*context in Dozer here’s what I see:

    Celery
context

    So the culprit is celery.app.task.Context. Searching that type would
    certainly lead you to Celery task
    page
    .
    Quickly searching for “children” there, here’s what it says:

    trail = True

    If enabled the request will keep track of subtasks started by this task, and
    this information will be sent with the result (result.children).

    Disabling the trail by setting trail=False like:

    @app.task(trail=False)
    def task():
        for i in range(10_000):
            subtask.delay()
            time.sleep(0.01)
    

    Then restarting the Celery node from (3) and python demo.py from (4) yet
    again, shows this memory consumption.

    solved

    Problem solved!



知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看