应用 function 在 pandas 中返回 dataframe

Question

I have two dataframes, one with columns [a,b,c] and other with [a,b,d] as follows:我有两个数据框，一个带有 [a,b,c] 列，另一个带有 [a,b,d] 列，如下所示：

matrix = [(222, 34, 23),
         (333, 31, 11),
         (444, 16, 21),
         (555, 32, 22),
         (666, 33, 27),
         (777, 35, 11)
         ]

# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('abc'))

print(dfObj)



     a  b   c
0   222 34  23
1   333 31  11
2   444 16  21
3   555 32  22
4   666 33  27
5   777 35  11


matrix = [(222, 34, 5),
         (333, 31, 6),
         (444, 16, 7),
         (555, 32, 8),
         (666, 33, 9),
         (777, 35, 10)
         ]

# Create a DataFrame object
dfObj1 = pd.DataFrame(matrix, columns=list('abd'))

I want to take the construct a new matrix with the columns [a,b,c,d] as follows:我想用 [a,b,c,d] 列构造一个新矩阵，如下所示：

def test_func(x):
    return dfObj1.d[dfObj1['a'].isin([x['a']])]
dfObj['d'] = dfObj.apply(test_func, axis = 1)

However, the output of dfObj.apply(test_func, axis = 1) is a dataframe as shown below:但是dfObj.apply(test_func, axis = 1)的output是一个dataframe如下图：

    1   2   3   4   5
1   6.0 NaN NaN NaN NaN
2   NaN 7.0 NaN NaN NaN
3   NaN NaN 8.0 NaN NaN
4   NaN NaN NaN 9.0 NaN
5   NaN NaN NaN NaN 10.0

I was expecting the following output - [6,7,8,9,10] .我期待以下 output - [6,7,8,9,10] 。

I know that there are several methods to achieve this objective but I am just trying to find out what I am doing wrong in this approach.我知道有几种方法可以实现这一目标，但我只是想找出我在这种方法中做错了什么。

Answer 1

It is possible if return numpy array with .values in function and also add result_type='expand' parameter in DataFrame.apply :如果在 function 中返回带有.values的 numpy 数组，并在DataFrame.apply中添加result_type='expand'参数，则这是可能的：

def test_func(x):
    return  dfObj1.loc[dfObj1['a'].isin([x['a']]), 'd'].values

dfObj['d'] = dfObj.apply(test_func, axis = 1, result_type='expand')
print(dfObj)
     a   b   c   d
0  222  34  23   5
1  333  31  11   6
2  444  16  21   7
3  555  32  22   8
4  666  33  27   9
5  777  35  11  10

Another idea if need return scalar with missing value is use next with iter :如果需要返回缺少值的标量，另一个想法是使用next和iter ：

def test_func(x):
    return  next(iter(dfObj1.loc[dfObj1['a'].isin([x['a']]), 'd']), np.nan)

dfObj['d'] = dfObj.apply(test_func, axis = 1)

But better/faster is use DataFrame.merge :但更好/更快的是使用DataFrame.merge ：

dfObj= dfObj.merge(dfObj1[['a','d']], on='a', how='left')
print(dfObj)
     a   b   c   d
0  222  34  23   5
1  333  31  11   6
2  444  16  21   7
3  555  32  22   8
4  666  33  27   9
5  777  35  11  10

Or Series.map :或Series.map ：

dfObj['d'] = dfObj['a'].map(dfObj1.set_index('a')['d'])
print(dfObj)
     a   b   c   d
0  222  34  23   5
1  333  31  11   6
2  444  16  21   7
3  555  32  22   8
4  666  33  27   9
5  777  35  11  10

Answer 2

In your function, the result return as Series , when you assign it the index do matter, for example, the index 1 will return a Series with index 1, so it will show in the position as matrix.在您的 function 中，结果返回为Series ，当您为其分配索引时，索引确实很重要，例如，索引 1 将返回索引为 1 的 Series，因此它将在 position 中显示为矩阵。 (apply result will concat, you have different index and columns for each input, like a small dataframe) （应用结果将连接，每个输入都有不同的索引和列，就像一个小数据框）

def test_func(x):
    return type(dfObj1.d[dfObj1['a'].isin([x['a']])])
dfObj.apply(test_func, axis = 1)
Out[48]: 
0    <class 'pandas.core.series.Series'>
1    <class 'pandas.core.series.Series'>
2    <class 'pandas.core.series.Series'>
3    <class 'pandas.core.series.Series'>
4    <class 'pandas.core.series.Series'>
5    <class 'pandas.core.series.Series'>
dtype: object

Eliminate the index impact to fix the out消除索引影响以解决问题

def test_func(x):
    return dfObj1.d[dfObj1['a'].isin([x['a']])].iloc[0]
dfObj.apply(test_func, axis = 1)
Out[49]: 
0     5
1     6
2     7
3     8
4     9
5    10
dtype: int64

应用 function 在 pandas 中返回 dataframe

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-10-29 13:56:07

解决方案2
1 2019-10-29 14:00:01

应用 function 在 pandas 中返回 dataframe

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-10-29 13:56:07

解决方案2 1 2019-10-29 14:00:01

解决方案1
2 已采纳 2019-10-29 13:56:07

解决方案2
1 2019-10-29 14:00:01