熊猫康卡特产生nan值
发布于 2021-01-29 19:04:04
我很好奇为什么在熊猫中简单地串联两个数据框:
shape: (66441, 1)
dtypes: prediction int64
dtype: object
isnull().sum(): prediction 0
dtype: int64
shape: (66441, 1)
CUSTOMER_ID int64
dtype: object
isnull().sum() CUSTOMER_ID 0
dtype: int64
形状相同且都没有NaN值
foo = pd.concat([initId, ypred], join='outer', axis=1)
print(foo.shape)
print(foo.isnull().sum())
如果加入,可能会导致很多NaN值。
(83384, 2)
CUSTOMER_ID 16943
prediction 16943
如何解决此问题并防止引入NaN值?
试图像复制它
aaa = pd.DataFrame([0,1,0,1,0,0], columns=['prediction'])
print(aaa)
bbb = pd.DataFrame([0,0,1,0,1,1], columns=['groundTruth'])
print(bbb)
pd.concat([aaa, bbb], axis=1)
失败,例如,因为没有引入NaN值,所以效果很好。
关注者
0
被浏览
52
1 个回答
-
我认为索引值不同存在问题,因此
concat
无法对齐getNaN
:aaa = pd.DataFrame([0,1,0,1,0,0], columns=['prediction'], index=[4,5,8,7,10,12]) print(aaa) prediction 4 0 5 1 8 0 7 1 10 0 12 0 bbb = pd.DataFrame([0,0,1,0,1,1], columns=['groundTruth']) print(bbb) groundTruth 0 0 1 0 2 1 3 0 4 1 5 1 print (pd.concat([aaa, bbb], axis=1)) prediction groundTruth 0 NaN 0.0 1 NaN 0.0 2 NaN 1.0 3 NaN 0.0 4 0.0 1.0 5 1.0 1.0 7 1.0 NaN 8 0.0 NaN 10 0.0 NaN 12 0.0 NaN
解决方案是
reset_index
如果不需要索引值:aaa.reset_index(drop=True, inplace=True) bbb.reset_index(drop=True, inplace=True) print(aaa) prediction 0 0 1 1 2 0 3 1 4 0 5 0 print(bbb) groundTruth 0 0 1 0 2 1 3 0 4 1 5 1 print (pd.concat([aaa, bbb], axis=1)) prediction groundTruth 0 0 0 1 1 0 2 0 1 3 1 0 4 0 1 5 0 1