Python

将pandas.Series从dtype对象转换为float，将错误转换为nans

发布于 2021-01-29 17:49:06

请考虑以下情况：

In [2]: a = pd.Series([1,2,3,4,'.'])

In [3]: a
Out[3]: 
0    1
1    2
2    3
3    4
4    .
dtype: object

In [8]: a.astype('float64', raise_on_error = False)
Out[8]: 
0    1
1    2
2    3
3    4
4    .
dtype: object

我本来希望有一个允许将错误值（例如that .）转换为NaNs的转换的选项。有没有办法做到这一点？

关注者

被浏览

160

1 个回答

面试哥 2021-01-29

为面试而生，有面试问题，就找面试哥。

使用[`pd.to_numeric`](http://pandas.pydata.org/pandas-

docs/stable/generated/pandas.to_numeric.html)与errors='coerce'

# Setup
s = pd.Series(['1', '2', '3', '4', '.'])
s

0    1
1    2
2    3
3    4
4    .
dtype: object



pd.to_numeric(s, errors='coerce')

0    1.0
1    2.0
2    3.0
3    4.0
4    NaN
dtype: float64

如果需要NaN填写，请使用Series.fillna。

pd.to_numeric(s, errors='coerce').fillna(0, downcast='infer')

0    1
1    2
2    3
3    4
4    0
dtype: float64

注意，downcast='infer'在可能的情况下，将尝试将浮点型转换为整数。如果不需要，请删除该参数。

从v0.24 +起，pandas引入了Nullable Integer类型，该类型允许整数与NaN共存。如果列中有整数，则可以使用
pd.__version__
# '0.24.1'

pd.to_numeric(s, errors='coerce').astype('Int32')

0      1
1      2
2      3
3      4
4    NaN
dtype: Int32
还有其他选项可供选择，请阅读文档以获取更多信息。

扩展为`DataFrames`

如果需要将此扩展到DataFrames，则需要将
其应用于每一行。您可以使用进行此操作DataFrame.apply。

# Setup.
np.random.seed(0)
df = pd.DataFrame({
    'A' : np.random.choice(10, 5), 
    'C' : np.random.choice(10, 5), 
    'B' : ['1', '###', '...', 50, '234'], 
    'D' : ['23', '1', '...', '268', '$$']}
)[list('ABCD')]
df

   A    B  C    D
0  5    1  9   23
1  0  ###  3    1
2  3  ...  5  ...
3  3   50  2  268
4  7  234  4   $$

df.dtypes

A     int64
B    object
C     int64
D    object
dtype: object



df2 = df.apply(pd.to_numeric, errors='coerce')
df2

   A      B  C      D
0  5    1.0  9   23.0
1  0    NaN  3    1.0
2  3    NaN  5    NaN
3  3   50.0  2  268.0
4  7  234.0  4    NaN

df2.dtypes

A      int64
B    float64
C      int64
D    float64
dtype: object

您也可以使用DataFrame.transform;
尽管我的测试表明这稍微慢一些：

df.transform(pd.to_numeric, errors='coerce')

   A      B  C      D
0  5    1.0  9   23.0
1  0    NaN  3    1.0
2  3    NaN  5    NaN
3  3   50.0  2  268.0
4  7  234.0  4    NaN

如果您有许多列（数字；非数字），则可以通过pd.to_numeric仅对非数字列应用来提高性能。

df.dtypes.eq(object)

A    False
B     True
C    False
D     True
dtype: bool

cols = df.columns[df.dtypes.eq(object)]
# Actually, `cols` can be any list of columns you need to convert.
cols
# Index(['B', 'D'], dtype='object')

df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
# Alternatively,
# for c in cols:
#     df[c] = pd.to_numeric(df[c], errors='coerce')

df

   A      B  C      D
0  5    1.0  9   23.0
1  0    NaN  3    1.0
2  3    NaN  5    NaN
3  3   50.0  2  268.0
4  7  234.0  4    NaN

对于较长的DataFrame pd.to_numeric，沿列应用（即，axis=0默认值）应稍快一些。

知识点

Python

面圈网VIP题库全新上线，海量真题题库资源。 90大类考试，超10万份考试真题开放下载啦

去下载看看

将pandas.Series从dtype对象转换为float，将错误转换为nans

使用[pd.to_numeric](http://pandas.pydata.org/pandas-

扩展为DataFrames

使用[`pd.to_numeric`](http://pandas.pydata.org/pandas-

扩展为`DataFrames`