通过索引将numpy数组中的值设置为NaN

发布于 2021-01-29 15:00:59

我想在numpy数组中设置特定值NaN(以将它们从按行均值计算中排除)。

我试过了

import numpy

x = numpy.array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0]])
cutoff = [5, 7]
for i in range(len(x)):
    x[i][0:cutoff[i]:1] = numpy.nan

看着x,我只会看到-9223372036854775808我的期望NaN

我想到了一个替代方案:

for i in range(len(x)):
    for k in range(cutoff[i]):
        x[i][k] = numpy.nan

没发生什么事。我究竟做错了什么?

关注者
0
被浏览
103
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    将适当元素设置为NaN的矢量化方法

    @unutbu的解决方案必须摆脱您得到的值错误。如果您希望vectorize获得性能,可以这样使用boolean indexing-

    import numpy as np
    
    # Create mask of positions in x (with float datatype) where NaNs are to be put
    mask = np.asarray(cutoff)[:,None] > np.arange(x.shape[1])
    
    # Put NaNs into masked region of x for the desired ouput
    x[mask] = np.nan
    

    样品运行-

    In [92]: x = np.random.randint(0,9,(4,7)).astype(float)
    
    In [93]: x
    Out[93]: 
    array([[ 2.,  1.,  5.,  2.,  5.,  2.,  1.],
           [ 2.,  5.,  7.,  1.,  5.,  4.,  8.],
           [ 1.,  1.,  7.,  4.,  8.,  3.,  1.],
           [ 5.,  8.,  7.,  5.,  0.,  2.,  1.]])
    
    In [94]: cutoff = [5,3,0,6]
    
    In [95]: x[np.asarray(cutoff)[:,None] > np.arange(x.shape[1])] = np.nan
    
    In [96]: x
    Out[96]: 
    array([[ nan,  nan,  nan,  nan,  nan,   2.,   1.],
           [ nan,  nan,  nan,   1.,   5.,   4.,   8.],
           [  1.,   1.,   7.,   4.,   8.,   3.,   1.],
           [ nan,  nan,  nan,  nan,  nan,  nan,   1.]])
    

    向量化方法可直接计算适当元素的按行平均值

    如果要获取掩盖的平均值,则可以修改较早提出的矢量化方法,以避免NaNs完全处理,更重要的是保留x整数值。这是修改后的方法-

    # Get array version of cutoff
    cutoff_arr = np.asarray(cutoff)
    
    # Mask of positions in x which are to be considered for row-wise mean calculations
    mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])
    
    # Mask x, calculate the corresponding sum and thus mean values for each row
    masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] -  cutoff_arr)
    

    这是这种解决方案的示例运行-

    In [61]: x = np.random.randint(0,9,(4,7))
    
    In [62]: x
    Out[62]: 
    array([[5, 0, 1, 2, 4, 2, 0],
           [3, 2, 0, 7, 5, 0, 2],
           [7, 2, 2, 3, 3, 2, 3],
           [4, 1, 2, 1, 4, 6, 8]])
    
    In [63]: cutoff = [5,3,0,6]
    
    In [64]: cutoff_arr = np.asarray(cutoff)
    
    In [65]: mask1 = cutoff_arr[:,None] <= np.arange(x.shape[1])
    
    In [66]: mask1
    Out[66]: 
    array([[False, False, False, False, False,  True,  True],
           [False, False, False,  True,  True,  True,  True],
           [ True,  True,  True,  True,  True,  True,  True],
           [False, False, False, False, False, False,  True]], dtype=bool)
    
    In [67]: masked_mean_vals = (mask1*x).sum(1)/(x.shape[1] -  cutoff_arr)
    
    In [68]: masked_mean_vals
    Out[68]: array([ 1.        ,  3.5       ,  3.14285714,  8.        ])
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看