熊猫:过去n天的平均值
我有一个这样的Pandas
数据框:
test = pd.DataFrame({ 'Date' : ['2016-04-01','2016-04-01','2016-04-02',
'2016-04-02','2016-04-03','2016-04-04',
'2016-04-05','2016-04-06','2016-04-06'],
'User' : ['Mike','John','Mike','John','Mike','Mike',
'Mike','Mike','John'],
'Value' : [1,2,1,3,4.5,1,2,3,6]
})
如下所示,数据集不一定每天都有观察值:
Date User Value
0 2016-04-01 Mike 1.0
1 2016-04-01 John 2.0
2 2016-04-02 Mike 1.0
3 2016-04-02 John 3.0
4 2016-04-03 Mike 4.5
5 2016-04-04 Mike 1.0
6 2016-04-05 Mike 2.0
7 2016-04-06 Mike 3.0
8 2016-04-06 John 6.0
我想添加一个新列,该列显示至少n天中每个用户过去n天(在这种情况下,n =
2)的nan
平均值,否则它将有价值。例如,在2016-04-06
John上得到一个,nan
因为他没有2016-04-05
和的数据2016-04-04
。因此结果将是这样的:
Date User Value Value_Average_Past_2_days
0 2016-04-01 Mike 1.0 NaN
1 2016-04-01 John 2.0 NaN
2 2016-04-02 Mike 1.0 1.00
3 2016-04-02 John 3.0 2.00
4 2016-04-03 Mike 4.5 1.00
5 2016-04-04 Mike 1.0 2.75
6 2016-04-05 Mike 2.0 2.75
7 2016-04-06 Mike 3.0 1.50
8 2016-04-06 John 6.0 NaN
在阅读了论坛中的几篇文章之后,我似乎应该结合使用group_by
和自定义rolling_mean
,但是我还不太清楚该怎么做。
-
我认为您可以使用第一个convert列
Date
to_datetime
,然后Days
通过groupby
withresample
和last查找丢失apply
rolling
test['Date'] = pd.to_datetime(test['Date']) df = test.groupby('User').apply(lambda x: x.set_index('Date').resample('1D').first()) print df User Value User Date John 2016-04-01 John 2.0 2016-04-02 John 3.0 2016-04-03 NaN NaN 2016-04-04 NaN NaN 2016-04-05 NaN NaN 2016-04-06 John 6.0 Mike 2016-04-01 Mike 1.0 2016-04-02 Mike 1.0 2016-04-03 Mike 4.5 2016-04-04 Mike 1.0 2016-04-05 Mike 2.0 df1 = df.groupby(level=0)['Value'] .apply(lambda x: x.shift().rolling(min_periods=1,window=2).mean()) .reset_index(name='Value_Average_Past_2_days') print df1 User Date Value_Average_Past_2_days 0 John 2016-04-01 NaN 1 John 2016-04-02 2.00 2 John 2016-04-03 2.50 3 John 2016-04-04 3.00 4 John 2016-04-05 NaN 5 John 2016-04-06 NaN 6 Mike 2016-04-01 NaN 7 Mike 2016-04-02 1.00 8 Mike 2016-04-03 1.00 9 Mike 2016-04-04 2.75 10 Mike 2016-04-05 2.75 11 Mike 2016-04-06 1.50 print pd.merge(test, df1, on=['Date', 'User'], how='left') Date User Value Value_Average_Past_2_days 0 2016-04-01 Mike 1.0 NaN 1 2016-04-01 John 2.0 NaN 2 2016-04-02 Mike 1.0 1.00 3 2016-04-02 John 3.0 2.00 4 2016-04-03 Mike 4.5 1.00 5 2016-04-04 Mike 1.0 2.75 6 2016-04-05 Mike 2.0 2.75 7 2016-04-06 Mike 3.0 1.50 8 2016-04-06 John 6.0 NaN