如何删除在特定列中的值为NaN的Pandas DataFrame行

发布于 2021-02-02 23:19:11

我有这个DataFrame,只想要EPS列不是的记录NaN:

>>> df
                 STK_ID  EPS  cash
STK_ID RPT_Date                   
601166 20111231  601166  NaN   NaN
600036 20111231  600036  NaN    12
600016 20111231  600016  4.3   NaN
601009 20111231  601009  NaN   NaN
601939 20111231  601939  2.5   NaN
000001 20111231  000001  NaN   NaN

…例如df.drop(....)要得到这个结果的数据框:

                  STK_ID  EPS  cash
STK_ID RPT_Date                   
600016 20111231  600016  4.3   NaN
601939 20111231  601939  2.5   NaN

我怎么做?

关注者
0
被浏览
116
1 个回答
  • 面试哥
    面试哥 2021-02-02
    为面试而生,有面试问题,就找面试哥。

    不要drop。就拿行,其中EPS是有限的:

    import numpy as np
    
    df = df[np.isfinite(df['EPS'])]
    


  • 面试哥
    面试哥 2021-02-02
    为面试而生,有面试问题,就找面试哥。

    这个问题已经解决,但是…

    …还要考虑伍特(Wouter)在其原始评论中提出的解决方案。dropna()大熊猫内置了处理丢失数据(包括)的功能。除了通过手动执行可能会提高的性能之外,这些功能还带有多种可能有用的选项。

    In [24]: df = pd.DataFrame(np.random.randn(10,3))
    
    In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;
    
    In [26]: df
    Out[26]:
              0         1         2
    0       NaN       NaN       NaN
    1  2.677677 -1.466923 -0.750366
    2       NaN  0.798002 -0.906038
    3  0.672201  0.964789       NaN
    4       NaN       NaN  0.050742
    5 -1.250970  0.030561 -2.678622
    6       NaN  1.036043       NaN
    7  0.049896 -0.308003  0.823295
    8       NaN       NaN  0.637482
    9 -0.310130  0.078891       NaN
    
    In [27]: df.dropna()     #drop all rows that have any NaN values
    Out[27]:
              0         1         2
    1  2.677677 -1.466923 -0.750366
    5 -1.250970  0.030561 -2.678622
    7  0.049896 -0.308003  0.823295
    
    In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
    Out[28]:
              0         1         2
    1  2.677677 -1.466923 -0.750366
    2       NaN  0.798002 -0.906038
    3  0.672201  0.964789       NaN
    4       NaN       NaN  0.050742
    5 -1.250970  0.030561 -2.678622
    6       NaN  1.036043       NaN
    7  0.049896 -0.308003  0.823295
    8       NaN       NaN  0.637482
    9 -0.310130  0.078891       NaN
    
    In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
    Out[29]:
              0         1         2
    1  2.677677 -1.466923 -0.750366
    2       NaN  0.798002 -0.906038
    3  0.672201  0.964789       NaN
    5 -1.250970  0.030561 -2.678622
    7  0.049896 -0.308003  0.823295
    9 -0.310130  0.078891       NaN
    
    In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
    Out[30]:
              0         1         2
    1  2.677677 -1.466923 -0.750366
    2       NaN  0.798002 -0.906038
    3  0.672201  0.964789       NaN
    5 -1.250970  0.030561 -2.678622
    6       NaN  1.036043       NaN
    7  0.049896 -0.308003  0.823295
    9 -0.310130  0.078891       NaN
    

    还有其他选项(请参见http://pandas.pydata.org/pandas-docs/stable/generation/pandas.DataFrame.dropna.html上的文档),包括删除列而不是行。



知识点
面圈网VIP题库

面圈网VIP题库全新上线,海量真题题库资源。 90大类考试,超10万份考试真题开放下载啦

去下载看看