简体   繁体   English

按列分组,然后比较另外两列并返回不同列中的值

[英]Groupby a column and then compare two other columns and return a value in a different column

I have a dataframe similar to this我有一个与此类似的数据框

    data={'COMB':["PNR1", "PNR1", "PNR11", "PNR2", "PNR2"],
        'FROM':["MAA", "BLR", "DEL", "TRV", "HYD"],
         'TO':["BLR", "MAA", "MAA", "HYD", "TRV"]}
md=pd.DataFrame(data)
md

What I want to do is to create another column based on the condition that if the From of one row is equal to the To of the next row, then it sholud return "R" otherwise it will return "O" in the new column.我想要做的是根据条件创建另一列,如果一行的 From 等于下一行的 To,那么它应该返回“R”,否则它将在新列中返回“O”。 My final output should look like this.我的最终输出应该是这样的。 在此处输入图片说明

Can anyone help me in python.任何人都可以在 python 中帮助我。 I tried following method, but it gives me error我尝试了以下方法,但它给了我错误

md_merged=(md>>
            group_by('COMB')>>
            mutate(TYPE=np.where(md['FROM'].isin(md['TO']),"R","O"))>>
           ungroup)

ValueError: Length of values does not match length of index Please help. ValueError:值的长度与索引的长度不匹配请帮助。

This solution compare all values between groups, not only prvious and next.此解决方案比较组之间的所有值,而不仅仅是上一个和下一个。

You can use custom lambda function in GroupBy.apply for boolean mask, for avoid MultiIndex is added group_keys=False to DataFrame.groupby , last set new values in numpy.where :您可以在GroupBy.apply使用自定义 lambda 函数作为布尔掩码,以避免将MultiIndex添加到DataFrame.groupby group_keys=False ,最后在numpy.where设置新值:

mask = md.groupby('COMB', group_keys=False).apply(lambda x: x['FROM'].isin(x['TO']))
md = md.assign(Type=np.where(mask,"R","O"))
print (md)
    COMB FROM   TO Type
0   PNR1  MAA  BLR    R
1   PNR1  BLR  MAA    R
2  PNR11  DEL  MAA    O
3   PNR2  TRV  HYD    R
4   PNR2  HYD  TRV    R

This solution compare previous and next rows per groups:此解决方案比较每组的前一行和下一行:

Another idea is use DataFrameGroupBy.shift , it should be faster like groupby.apply :另一个想法是使用DataFrameGroupBy.shift ,它应该像groupby.apply一样groupby.apply

mask = (md.groupby('COMB')['FROM'].shift().eq(md['TO']) | 
        md.groupby('COMB')['TO'].shift(-1).eq(md['FROM']))

md = md.assign(Type=np.where(mask,"R","O"))
print (md)
    COMB FROM   TO Type
0   PNR1  MAA  BLR    R
1   PNR1  BLR  MAA    R
2  PNR11  DEL  MAA    O
3   PNR2  TRV  HYD    R
4   PNR2  HYD  TRV    R

Play with numpy.玩麻木。 Take md into numpy, sort columns other than COMB and find all duplicated.将 md 带入 numpy,对 COMB 以外的列进行排序并查找所有重复的列。 Conditionally name the duplicated.有条件地命名重复项。

s =md.to_numpy()
s[:,1:3]=np.sort(s[:,1:3])
md['Type'] =np.where(pd.DataFrame(s).duplicated(keep=False),'R','0')

compare consecutive values and use np.where to impose the Type.比较连续值并使用 np.where 强加类型。 Code below.代码如下。 Worked for me.为我工作。

md['Type'] =np.where(md.groupby('COMB',as_index=False).apply(lambda x: (x['FROM']==x['TO'].shift())|(x['FROM'].shift(-1)==x['TO'])),'R','O')



   COMB FROM   TO Type
0   PNR1  MAA  BLR    R
1   PNR1  BLR  MAA    R
2  PNR11  DEL  MAA    O
3   PNR2  TRV  HYD    R
4   PNR2  HYD  TRV    R

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将一个 dataframe 中的一列与不同 dataframe 中的其他两列进行比较? - Compare a column in one dataframe with two other columns in a different dataframe? 根据其他列的groupby设置列的值 - setting a value for a column based on groupby of other columns 使用 Python Pandas 将列值与不同列进行比较,并从同一行但不同列返回值 - Compare a Column value to different columns and return a value from same row but different column using Python Pandas 匹配两个 dataframe 列,返回不同的列值 - match two dataframe columns, return different column value 如何比较两个 dataframe 列以查看一列中的值是否在另一列的 object 中 - How to compare two dataframe columns to see if value in one column is in object of other column 如何比较两个不同的 excel 工作表中的列,如果找到匹配项,则复制其他列值 - How to compare columns in two different excel sheets and if match found copy other other column values 对一列进行分组并查找熊猫中其他两列的差的最大(绝对)值 - Groupby one column and find max (absolute) value of difference of other two columns in pandas 如何按列分组并将其他列的值作为列表返回到pandas中? - How to groupby by a column and return the values of other columns as lists in pandas? Pandas groupby,根据其他列的最大值返回1列的行 - Pandas groupby, return rows of 1 column based on maximum values of other columns 添加一个新列,其值基于另外两个列的 groupby 值 - Add a new column with values based on groupby values two other columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM