[英]apply function returns a dataframe in pandas
I have two dataframes, one with columns [a,b,c] and other with [a,b,d] as follows:我有两个数据框,一个带有 [a,b,c] 列,另一个带有 [a,b,d] 列,如下所示:
matrix = [(222, 34, 23),
(333, 31, 11),
(444, 16, 21),
(555, 32, 22),
(666, 33, 27),
(777, 35, 11)
]
# Create a DataFrame object
dfObj = pd.DataFrame(matrix, columns=list('abc'))
print(dfObj)
a b c
0 222 34 23
1 333 31 11
2 444 16 21
3 555 32 22
4 666 33 27
5 777 35 11
matrix = [(222, 34, 5),
(333, 31, 6),
(444, 16, 7),
(555, 32, 8),
(666, 33, 9),
(777, 35, 10)
]
# Create a DataFrame object
dfObj1 = pd.DataFrame(matrix, columns=list('abd'))
I want to take the construct a new matrix with the columns [a,b,c,d] as follows:我想用 [a,b,c,d] 列构造一个新矩阵,如下所示:
def test_func(x):
return dfObj1.d[dfObj1['a'].isin([x['a']])]
dfObj['d'] = dfObj.apply(test_func, axis = 1)
However, the output of dfObj.apply(test_func, axis = 1)
is a dataframe as shown below:但是dfObj.apply(test_func, axis = 1)
的output是一个dataframe如下图:
1 2 3 4 5
1 6.0 NaN NaN NaN NaN
2 NaN 7.0 NaN NaN NaN
3 NaN NaN 8.0 NaN NaN
4 NaN NaN NaN 9.0 NaN
5 NaN NaN NaN NaN 10.0
I was expecting the following output - [6,7,8,9,10]
.我期待以下 output - [6,7,8,9,10]
。
I know that there are several methods to achieve this objective but I am just trying to find out what I am doing wrong in this approach.我知道有几种方法可以实现这一目标,但我只是想找出我在这种方法中做错了什么。
It is possible if return numpy array with .values
in function and also add result_type='expand'
parameter in DataFrame.apply
:如果在 function 中返回带有.values
的 numpy 数组,并在DataFrame.apply
中添加result_type='expand'
参数,则这是可能的:
def test_func(x):
return dfObj1.loc[dfObj1['a'].isin([x['a']]), 'd'].values
dfObj['d'] = dfObj.apply(test_func, axis = 1, result_type='expand')
print(dfObj)
a b c d
0 222 34 23 5
1 333 31 11 6
2 444 16 21 7
3 555 32 22 8
4 666 33 27 9
5 777 35 11 10
Another idea if need return scalar with missing value is use next
with iter
:如果需要返回缺少值的标量,另一个想法是使用next
和iter
:
def test_func(x):
return next(iter(dfObj1.loc[dfObj1['a'].isin([x['a']]), 'd']), np.nan)
dfObj['d'] = dfObj.apply(test_func, axis = 1)
But better/faster is use DataFrame.merge
:但更好/更快的是使用DataFrame.merge
:
dfObj= dfObj.merge(dfObj1[['a','d']], on='a', how='left')
print(dfObj)
a b c d
0 222 34 23 5
1 333 31 11 6
2 444 16 21 7
3 555 32 22 8
4 666 33 27 9
5 777 35 11 10
Or Series.map
:或Series.map
:
dfObj['d'] = dfObj['a'].map(dfObj1.set_index('a')['d'])
print(dfObj)
a b c d
0 222 34 23 5
1 333 31 11 6
2 444 16 21 7
3 555 32 22 8
4 666 33 27 9
5 777 35 11 10
In your function, the result return as Series
, when you assign it the index do matter, for example, the index 1 will return a Series with index 1, so it will show in the position as matrix.在您的 function 中,结果返回为Series
,当您为其分配索引时,索引确实很重要,例如,索引 1 将返回索引为 1 的 Series,因此它将在 position 中显示为矩阵。 (apply result will concat, you have different index and columns for each input, like a small dataframe) (应用结果将连接,每个输入都有不同的索引和列,就像一个小数据框)
def test_func(x):
return type(dfObj1.d[dfObj1['a'].isin([x['a']])])
dfObj.apply(test_func, axis = 1)
Out[48]:
0 <class 'pandas.core.series.Series'>
1 <class 'pandas.core.series.Series'>
2 <class 'pandas.core.series.Series'>
3 <class 'pandas.core.series.Series'>
4 <class 'pandas.core.series.Series'>
5 <class 'pandas.core.series.Series'>
dtype: object
Eliminate the index impact to fix the out消除索引影响以解决问题
def test_func(x):
return dfObj1.d[dfObj1['a'].isin([x['a']])].iloc[0]
dfObj.apply(test_func, axis = 1)
Out[49]:
0 5
1 6
2 7
3 8
4 9
5 10
dtype: int64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.