[英]Groupby a column and then compare two other columns and return a value in a different column
I have a dataframe similar to this我有一个与此类似的数据框
data={'COMB':["PNR1", "PNR1", "PNR11", "PNR2", "PNR2"],
'FROM':["MAA", "BLR", "DEL", "TRV", "HYD"],
'TO':["BLR", "MAA", "MAA", "HYD", "TRV"]}
md=pd.DataFrame(data)
md
What I want to do is to create another column based on the condition that if the From of one row is equal to the To of the next row, then it sholud return "R" otherwise it will return "O" in the new column.我想要做的是根据条件创建另一列,如果一行的 From 等于下一行的 To,那么它应该返回“R”,否则它将在新列中返回“O”。 My final output should look like this.我的最终输出应该是这样的。
Can anyone help me in python.任何人都可以在 python 中帮助我。 I tried following method, but it gives me error我尝试了以下方法,但它给了我错误
md_merged=(md>>
group_by('COMB')>>
mutate(TYPE=np.where(md['FROM'].isin(md['TO']),"R","O"))>>
ungroup)
ValueError: Length of values does not match length of index Please help. ValueError:值的长度与索引的长度不匹配请帮助。
This solution compare all values between groups, not only prvious and next.此解决方案比较组之间的所有值,而不仅仅是上一个和下一个。
You can use custom lambda function in GroupBy.apply
for boolean mask, for avoid MultiIndex
is added group_keys=False
to DataFrame.groupby
, last set new values in numpy.where
:您可以在GroupBy.apply
使用自定义 lambda 函数作为布尔掩码,以避免将MultiIndex
添加到DataFrame.groupby
group_keys=False
,最后在numpy.where
设置新值:
mask = md.groupby('COMB', group_keys=False).apply(lambda x: x['FROM'].isin(x['TO']))
md = md.assign(Type=np.where(mask,"R","O"))
print (md)
COMB FROM TO Type
0 PNR1 MAA BLR R
1 PNR1 BLR MAA R
2 PNR11 DEL MAA O
3 PNR2 TRV HYD R
4 PNR2 HYD TRV R
This solution compare previous and next rows per groups:此解决方案比较每组的前一行和下一行:
Another idea is use DataFrameGroupBy.shift
, it should be faster like groupby.apply
:另一个想法是使用DataFrameGroupBy.shift
,它应该像groupby.apply
一样groupby.apply
:
mask = (md.groupby('COMB')['FROM'].shift().eq(md['TO']) |
md.groupby('COMB')['TO'].shift(-1).eq(md['FROM']))
md = md.assign(Type=np.where(mask,"R","O"))
print (md)
COMB FROM TO Type
0 PNR1 MAA BLR R
1 PNR1 BLR MAA R
2 PNR11 DEL MAA O
3 PNR2 TRV HYD R
4 PNR2 HYD TRV R
Play with numpy.玩麻木。 Take md into numpy, sort columns other than COMB and find all duplicated.将 md 带入 numpy,对 COMB 以外的列进行排序并查找所有重复的列。 Conditionally name the duplicated.有条件地命名重复项。
s =md.to_numpy()
s[:,1:3]=np.sort(s[:,1:3])
md['Type'] =np.where(pd.DataFrame(s).duplicated(keep=False),'R','0')
compare consecutive values and use np.where to impose the Type.比较连续值并使用 np.where 强加类型。 Code below.代码如下。 Worked for me.为我工作。
md['Type'] =np.where(md.groupby('COMB',as_index=False).apply(lambda x: (x['FROM']==x['TO'].shift())|(x['FROM'].shift(-1)==x['TO'])),'R','O')
COMB FROM TO Type
0 PNR1 MAA BLR R
1 PNR1 BLR MAA R
2 PNR11 DEL MAA O
3 PNR2 TRV HYD R
4 PNR2 HYD TRV R
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.