Python

Python-如何将函数应用于Pandas数据框的两列

发布于 2021-02-02 23:16:11

假设我有一个df包含的列'ID', 'col_1', 'col_2'。我定义一个函数：

f = lambda x, y : my_function_expression。

现在，我要应用f到df的两列'col_1', 'col_2'，以逐元素的计算新列'col_3'，有点像：

df['col_3'] = df[['col_1','col_2']].apply(f)  
# Pandas gives : TypeError: ('<lambda>() takes exactly 2 arguments (1 given)'

怎么做？

如下添加详细样本 *

import pandas as pd

df = pd.DataFrame({'ID':['1','2','3'], 'col_1': [0,2,3], 'col_2':[1,4,5]})
mylist = ['a','b','c','d','e','f']

def get_sublist(sta,end):
    return mylist[sta:end+1]

#df['col_3'] = df[['col_1','col_2']].apply(get_sublist,axis=1)
# expect above to output df as below 

  ID  col_1  col_2            col_3
0  1      0      1       ['a', 'b']
1  2      2      4  ['c', 'd', 'e']
2  3      3      5  ['d', 'e', 'f']

关注者

被浏览

140

1 个回答

面试哥 2021-02-02

为面试而生，有面试问题，就找面试哥。
这是apply在数据框上使用的示例，我正在用进行调用axis = 1。

请注意，区别在于，与其尝试将两个值传递给该函数f，不如重写该函数以接受pandas Series对象，然后对Series进行索引以获取所需的值。
```
In [49]: df
Out[49]: 
          0         1
0  1.000000  0.000000
1 -0.494375  0.570994
2  1.000000  0.000000
3  1.876360 -0.229738
4  1.000000  0.000000

In [50]: def f(x):    
   ....:  return x[0] + x[1]  
   ....:  

In [51]: df.apply(f, axis=1) #passes a Series object, row-wise
Out[51]: 
0    1.000000
1    0.076619
2    1.000000
3    1.646622
4    1.000000
```
根据你的用例，有时创建一个pandas group对象然后apply在组中使用很有帮助。

知识点

Python

面圈网VIP题库全新上线，海量真题题库资源。 90大类考试，超10万份考试真题开放下载啦

去下载看看