对沿轴的给定2D概率数组矢量化``numpy.random.choice''

发布于 2021-01-29 17:25:10

Numpy具有该random.choice功能,可让您从分类分布中进行采样。您如何在轴上重复此操作?为了说明我的意思,这是我当前的代码:

categorical_distributions = np.array([
    [.1, .3, .6],
    [.2, .4, .4],
])
_, n = categorical_distributions.shape
np.array([np.random.choice(n, p=row)
          for row in categorical_distributions])

理想情况下,我想消除for循环。

关注者
0
被浏览
63
1 个回答
  • 面试哥
    面试哥 2021-01-29
    为面试而生,有面试问题,就找面试哥。

    这是一种获取每一行随机索引的矢量化方法,其概率a2D数组-

    (a.cumsum(1) > np.random.rand(a.shape[0])[:,None]).argmax(1)
    

    概括覆盖2D数组的行和列-

    def random_choice_prob_index(a, axis=1):
        r = np.expand_dims(np.random.rand(a.shape[1-axis]), axis=axis)
        return (a.cumsum(axis=axis) > r).argmax(axis=axis)
    

    让我们通过运行一百万次来验证给定的样本-

    In [589]: a = np.array([
         ...:     [.1, .3, .6],
         ...:     [.2, .4, .4],
         ...: ])
    
    In [590]: choices = [random_choice_prob_index(a)[0] for i in range(1000000)]
    
    # This should be close to first row of given sample
    In [591]: np.bincount(choices)/float(len(choices))
    Out[591]: array([ 0.099781,  0.299436,  0.600783])
    

    运行时测试

    原始的循环方式-

    def loopy_app(categorical_distributions):
        m, n = categorical_distributions.shape
        out = np.empty(m, dtype=int)
        for i,row in enumerate(categorical_distributions):
            out[i] = np.random.choice(n, p=row)
        return out
    

    更大数组上的时间-

    In [593]: a = np.array([
         ...:     [.1, .3, .6],
         ...:     [.2, .4, .4],
         ...: ])
    
    In [594]: a_big = np.repeat(a,100000,axis=0)
    
    In [595]: %timeit loopy_app(a_big)
    1 loop, best of 3: 2.54 s per loop
    
    In [596]: %timeit random_choice_prob_index(a_big)
    100 loops, best of 3: 6.44 ms per loop
    


知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看