从一个数组中删除另一个数组中的元素

发布于 2021-01-29 19:06:27

说我有这些二维数组A和B。

如何从B中删除A中的元素。(集合论中的补语:AB)

A=np.asarray([[1,1,1], [1,1,2], [1,1,3], [1,1,4]])
B=np.asarray([[0,0,0], [1,0,2], [1,0,3], [1,0,4], [1,1,0], [1,1,1], [1,1,4]])
#output = [[1,1,2], [1,1,3]]

更准确地说,我想做这样的事情。

data = some numpy array
label = some numpy array
A = np.argwhere(label==0) #[[1 1 1], [1 1 2], [1 1 3], [1 1 4]]
B = np.argwhere(data>1.5) #[[0 0 0], [1 0 2], [1 0 3], [1 0 4], [1 1 0], [1 1 1], [1 1 4]]
out = np.argwhere(label==0 and data>1.5) #[[1 1 2], [1 1 3]]
关注者
0
被浏览
161
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    基于this solution对Find the row indexes of several values in a numpyarray,这里是用更少的内存占用与NumPy基础的解决方案,并与大型阵列工作时,可能是有益的-

    dims = np.maximum(B.max(0),A.max(0))+1
    out = A[~np.in1d(np.ravel_multi_index(A.T,dims),np.ravel_multi_index(B.T,dims))]
    

    样品运行-

    In [38]: A
    Out[38]: 
    array([[1, 1, 1],
           [1, 1, 2],
           [1, 1, 3],
           [1, 1, 4]])
    
    In [39]: B
    Out[39]: 
    array([[0, 0, 0],
           [1, 0, 2],
           [1, 0, 3],
           [1, 0, 4],
           [1, 1, 0],
           [1, 1, 1],
           [1, 1, 4]])
    
    In [40]: out
    Out[40]: 
    array([[1, 1, 2],
           [1, 1, 3]])
    

    在大型阵列上的运行时测试-

    In [107]: def in1d_approach(A,B):
         ...:     dims = np.maximum(B.max(0),A.max(0))+1
         ...:     return A[~np.in1d(np.ravel_multi_index(A.T,dims),\
         ...:                     np.ravel_multi_index(B.T,dims))]
         ...:
    
    In [108]: # Setup arrays with B as large array and A contains some of B's rows
         ...: B = np.random.randint(0,9,(1000,3))
         ...: A = np.random.randint(0,9,(100,3))
         ...: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)
         ...: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)
         ...: A[A_idx] = B[B_idx]
         ...:
    

    具有broadcasting基础解决方案的时间-

    In [109]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
    100 loops, best of 3: 4.64 ms per loop # @Kasramvd's soln
    
    In [110]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]
    100 loops, best of 3: 3.66 ms per loop
    

    基于更少内存占用量的定时解决方案-

    In [111]: %timeit in1d_approach(A,B)
    1000 loops, best of 3: 231 µs per loop
    

    进一步提升性能

    in1d_approach通过将每一行视为索引元组来减少每一行。通过使用引入矩阵乘法np.dot,我们可以更有效地完成上述操作,例如-

    def in1d_dot_approach(A,B):
        cumdims = (np.maximum(A.max(),B.max())+1)**np.arange(B.shape[1])
        return A[~np.in1d(A.dot(cumdims),B.dot(cumdims))]
    

    让我们在更大的数组上与以前的版本进行测试-

    In [251]: # Setup arrays with B as large array and A contains some of B's rows
         ...: B = np.random.randint(0,9,(10000,3))
         ...: A = np.random.randint(0,9,(1000,3))
         ...: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)
         ...: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)
         ...: A[A_idx] = B[B_idx]
         ...:
    
    In [252]: %timeit in1d_approach(A,B)
    1000 loops, best of 3: 1.28 ms per loop
    
    In [253]: %timeit in1d_dot_approach(A, B)
    1000 loops, best of 3: 1.2 ms per loop
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看