将具有相似行值的值相加
发布于 2021-01-29 16:37:24
我有一个像这样的熊猫数据集
city difference
NY 6
SF 8
LA 8
NY 9
SF 10
我想difference
基于该city
列总结该列的值,以便最终数据集看起来像
city difference total difference
NY 6 15
NY 9
LA 8 8
SF 10 10
我试过了
df['total difference'] = df.groupby('city')['difference'].sum()
但这没用。我什至尝试过如何对熊猫中特定行的值求和?但获得NaN
了新列的值。请帮忙!
关注者
0
被浏览
45
1 个回答
-
我认为您需要
transform
:df['total difference'] = df.groupby('city')['difference'].transform(sum) print (df) city difference total difference 0 NY 6 15 1 SF 8 18 2 LA 8 8 3 NY 9 15 4 SF 10 18
并且如果还需要排序列:
df['total difference'] = df.groupby('city')['difference'].transform('sum') df = df.sort_values('city') print (df) city difference total difference 2 LA 8 8 0 NY 6 15 3 NY 9 15 1 SF 8 18 4 SF 10 18
我对功能上的差异和时序非常相似感兴趣:
#[10000000 rows x 2 columns] np.random.seed(100) df = pd.DataFrame(np.random.randint(1000, size=(10000000,2)), columns=['city','difference']) #print (df) In [293]: %timeit (df.groupby('city')['difference'].transform('sum')) 1 loop, best of 3: 570 ms per loop In [294]: %timeit (df.groupby('city')['difference'].transform(sum)) 1 loop, best of 3: 567 ms per loop In [295]: %timeit (df.groupby('city')['difference'].transform(np.sum)) 1 loop, best of 3: 561 ms per loop