Python-在共享内存中使用numpy数组进行多处理

发布于 2021-02-02 23:20:08

我想在共享内存中使用一个numpy数组,以便与多处理模块一起使用。困难是像numpy数组一样使用它,而不仅仅是ctypes数组。

from multiprocessing import Process, Array
import scipy

def f(a):
    a[0] = -a[0]

if __name__ == '__main__':
    # Create the array
    N = int(10)
    unshared_arr = scipy.rand(N)
    arr = Array('d', unshared_arr)
    print "Originally, the first two elements of arr = %s"%(arr[:2])

    # Create, start, and finish the child processes
    p = Process(target=f, args=(arr,))
    p.start()
    p.join()

    # Printing out the changed values
    print "Now, the first two elements of arr = %s"%arr[:2]

这将产生如下输出:

Originally, the first two elements of arr = [0.3518653236697369, 0.517794725524976]
Now, the first two elements of arr = [-0.3518653236697369, 0.517794725524976]

可以ctypes方式访问该数组,例如arr[i]说得通。但是,它不是一个numpy数组,因此我无法执行-1*arr,或arr.sum()。我想一个解决方案是将ctypes数组转换为numpy数组。但是(除了无法完成这项工作外),我不相信会再共享它。

对于必须解决的常见问题,似乎将有一个标准解决方案。

关注者
0
被浏览
144
1 个回答
  • 面试哥
    面试哥 2021-02-02
    为面试而生,有面试问题,就找面试哥。

    添加到@unutbu(不再可用)和@Henry Gomersall的答案中。你可以shared_arr.get_lock()在需要时使用来同步访问:

    
    shared_arr = mp.Array(ctypes.c_double, N)
    # ...
    def f(i): # could be anything numpy accepts as an index such another numpy array
        with shared_arr.get_lock(): # synchronize access
            arr = np.frombuffer(shared_arr.get_obj()) # no data copying
            arr[i] = -arr[i]
    

    import ctypes
    import logging
    import multiprocessing as mp
    
    from contextlib import closing
    
    import numpy as np
    
    info = mp.get_logger().info
    
    def main():
        logger = mp.log_to_stderr()
        logger.setLevel(logging.INFO)
    
        # create shared array
        N, M = 100, 11
        shared_arr = mp.Array(ctypes.c_double, N)
        arr = tonumpyarray(shared_arr)
    
        # fill with random values
        arr[:] = np.random.uniform(size=N)
        arr_orig = arr.copy()
    
        # write to arr from different processes
        with closing(mp.Pool(initializer=init, initargs=(shared_arr,))) as p:
            # many processes access the same slice
            stop_f = N // 10
            p.map_async(f, [slice(stop_f)]*M)
    
            # many processes access different slices of the same array
            assert M % 2 # odd
            step = N // 10
            p.map_async(g, [slice(i, i + step) for i in range(stop_f, N, step)])
        p.join()
        assert np.allclose(((-1)**M)*tonumpyarray(shared_arr), arr_orig)
    
    def init(shared_arr_):
        global shared_arr
        shared_arr = shared_arr_ # must be inherited, not passed as an argument
    
    def tonumpyarray(mp_arr):
        return np.frombuffer(mp_arr.get_obj())
    
    def f(i):
        """synchronized."""
        with shared_arr.get_lock(): # synchronize access
            g(i)
    
    def g(i):
        """no synchronization."""
        info("start %s" % (i,))
        arr = tonumpyarray(shared_arr)
        arr[i] = -1 * arr[i]
        info("end   %s" % (i,))
    
    if __name__ == '__main__':
        mp.freeze_support()
        main()
    

    如果不需要同步访问或创建自己的锁,则mp.Array()没有必要。mp.sharedctypes.RawArray在这种情况下,你可以使用。



知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看