Python

用索引填充数组中的一维numpy数组

发布于 2021-01-29 15:05:18

背景

我有一个零初始化的一维NumPy数组。

import numpy as np
section = np.zeros(1000)

然后我有一个Pandas DataFrame，其中两列都有索引：

d= {'start': {0: 7200, 1: 7500, 2: 7560, 3: 8100, 4: 11400},
    'end': {0: 10800, 1: 8100, 2: 8100, 3: 8150, 4: 12000}}

df = pd.DataFrame(data=d, columns=['start', 'end'])

对于每对索引，我想将numpy数组中相应索引的值设置为True。

我目前的解决方案

我可以通过将一个函数应用于DataFrame来做到这一点：

def fill_array(row):
    section[row.start:row.end] = True

df.apply(fill_array, axis=1)

我想向量化此操作

这可以按我期望的方式工作，但是出于乐趣，我想对操作进行矢量化处理。我对此不是很熟练，而且我在网上搜索并没有使我走上正确的道路。

如果有可能，我将非常感谢有关如何将其转换为向量操作的任何建议。

关注者

被浏览

113

1 个回答

面试哥 2021-01-29

为面试而生，有面试问题，就找面试哥。

实现的诀窍是，我们将1s在每个起始点和-1s每个结束点处将零初始化为int数组。接下来是实际的技巧，因为我们会对其进行累计求和，从而为bin（开始－停止对）边界所覆盖的位置提供非零数字。因此，最后一步是寻找非零值，以将最终输出作为布尔数组。因此，我们将有两个矢量化解决方案，其实现如下所示-

def filled_array(start, end, length):
    out = np.zeros((length), dtype=int)
    np.add.at(out,start,1)
    np.add.at(out,end,-1)
    return out.cumsum()>0

def filled_array_v2(start, end, length): #Using @Daniel's suggestion
    out =np.bincount(start, minlength=length) - np.bincount(end, minlength=length)
    return out.cumsum().astype(bool)

样品运行-

In [2]: start
Out[2]: array([ 4,  7,  5, 15])

In [3]: end
Out[3]: array([12, 12,  7, 17])

In [4]: out = filled_array(start, end, length=20)

In [7]: pd.DataFrame(out) # print as dataframe for easy verification
Out[7]: 
        0
0   False
1   False
2   False
3   False
4    True
5    True
6    True
7    True
8    True
9    True
10   True
11   True
12  False
13  False
14  False
15   True
16   True
17  False
18  False
19  False

知识点

Python

面圈网VIP题库全新上线，海量真题题库资源。 90大类考试，超10万份考试真题开放下载啦

去下载看看