Python Pandas - 查找两个数据帧之间的差异

发布于 2022-07-28 23:01:44

我有两个数据框 df1 和 df2，其中 df2 是 df1 的子集。如何获得一个新的数据框（df3），这是两个数据框之间的区别？

换句话说，一个数据框包含 df1 中所有不在 df2 中的行/列？

关注者

被浏览

1 个回答

面试哥 2022-07-28

为面试而生，有面试问题，就找面试哥。

通过使用drop_duplicates

pd.concat([df1,df2]).drop_duplicates(keep=False)

Update :

The above method only works for those data frames that don't already have duplicates themselves. For example:

df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]})
df2=pd.DataFrame({'A':[1],'B':[2]})

它会像下面这样输出，这是错误的

错误输出：

pd.concat([df1, df2]).drop_duplicates(keep=False)
Out[655]: 
   A  B
1  2  3

正确输出

如何做到这一点？

方法一：使用isinwithtuple

df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))]
Out[657]: 
   A  B
1  2  3
2  3  4
3  3  4

方法2：merge用indicator

df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both']
Out[421]: 
   A  B     _merge
1  2  3  left_only
2  3  4  left_only
3  3  4  left_only

知识点

面圈网VIP题库全新上线，海量真题题库资源。 90大类考试，超10万份考试真题开放下载啦

去下载看看