如果所有行的列中只有一个值,则在Pandas数据框中折叠行
发布于 2021-01-29 18:19:39
我有以下DF
col1 | col2 | col3 | col4 | col5 | col6
0 - | 15.0 | - | - | - | -
1 - | - | - | - | - | US
2 - | - | - | Large | - | -
3 ABC1 | - | - | - | - | -
4 - | - | 24RA | - | - | -
5 - | - | - | - | 345 | -
我想将行折叠成一个如下
output DF:
col1 | col2 | col3 | col4 | col5 | col6
0 ABC1 | 15.0 | 24RA | Large | 345 | US
我不想遍历列,但想使用熊猫来实现这一点。
关注者
0
被浏览
79
1 个回答
-
选项0
超级简单pd.concat([pd.Series(df[c].dropna().values, name=c) for c in df], axis=1) col1 col2 col3 col4 col5 col6 0 ABC1 15.0 24RA Large 345.0 US
每列可以处理多个值吗?
我们当然可以!df.loc[2, 'col3'] = 'Test' col1 col2 col3 col4 col5 col6 0 ABC1 15.0 Test Large 345.0 US 1 NaN NaN 24RA NaN NaN NaN
选项1
使用np.where
像外科医生一样的通用解决方案v = df.values i, j = np.where(np.isnan(v)) s = pd.Series(v[i, j], df.columns[j]) c = s.groupby(level=0).cumcount() s.index = [c, s.index] s.unstack(fill_value='-') # <-- don't fill to get NaN col1 col2 col3 col4 col5 col6 0 ABC1 15.0 24RA Large 345 US
df.loc[2, 'col3'] = 'Test' v = df.values i, j = np.where(np.isnan(v)) s = pd.Series(v[i, j], df.columns[j]) c = s.groupby(level=0).cumcount() s.index = [c, s.index] s.unstack(fill_value='-') # <-- don't fill to get NaN col1 col2 col3 col4 col5 col6 0 ABC1 15.0 Test Large 345 US 1 - - 24RA - - -
选项2
mask
制作null然后stack
摆脱它们否则我们可以
# This should work even if `'-'` are NaN # but you can skip the `.mask(df == '-')` s = df.mask(df == '-').stack().reset_index(0, drop=True) c = s.groupby(level=0).cumcount() s.index = [c, s.index] s.unstack(fill_value='-') col1 col2 col3 col4 col5 col6 0 ABC1 15.0 Test Large 345 US 1 - - 24RA - - -