简体   繁体   English

根据多个 IF 条件创建具有新 ID 的列

[英]Create column with new IDs based on multiple IF Conditions

I am trying to assign values in new columns based on value changes in rows of other columns.我正在尝试根据其他列的行中的值更改在新列中分配值。 Please refer to the dataset given.请参考给出的数据集。

ID1- Is based on the diff columns, whenever value is NOT equal to 1, then It should assign new ID added one to ID in above row. ID1- 基于 diff 列,每当值不等于 1 时,它应该将新 ID 添加到上一行的 ID。

ID2- Assign the IDs when Region changes within ID1 ID2-当区域在 ID1 内发生变化时分配 ID

ID3- Assign the IDs within ID1 and ID2 ID3-分配 ID1 和 ID2 中的 ID

All the above three IDs should start from 1 when the Indv column changes to new value.当 Indv 列更改为新值时,上述三个 ID 都应从 1 开始。



import pandas as pd

# intialise data of lists. 
data={'Indv':[1,1,1,1,1,1,1,1,1,1,1,2,2,2],
'Region':['A','A','A','A','B','B','B','C','C','C','D','A','A','C'],
'diff':[ 1,1,10,1,1,1,1,10,1,1,1,-11,1,1],
}

#CreateDataFrame
df=pd.DataFrame(data)

#creating ID1
df['ID1']=1

#Code only for ID1
for i in range(len(df)):
    j=i+1
    if(df['Indv'][i]!=df['Indv'][j]):
        df['session_ID'][j]=1
        if df['diff'][j]==1:
            df['ID1'][j]=df['ID1'][i]
        else:
            df['ID1'][j]=df['ID1'][i]+1
    break;

Dataset with expected outcome - Need to generate ID1, ID2 and ID3 columns.具有预期结果的数据集 - 需要生成 ID1、ID2 和 ID3 列。

Indv, Region, diff, ID1, ID2, ID3
1, A, 1, 1, 1, 1
1, A, 1, 1, 1, 2
1, A, 10, 2, 1, 1
1, A, 1, 2, 1, 2
1, B, 1, 2, 2, 1
1, B, 1, 2, 2, 2
1, B, 1, 2, 2, 3
1, C, 10, 3, 1, 1
1, C, 1, 3, 1, 2
1, C, 1, 3, 1, 3
1, D, 1, 3, 2, 1
2, A, -11, 1, 1, 1
2, A, 1, 1, 1, 2
2, C, 1, 1, 2, 1

This is my solution:这是我的解决方案:

  1. Create the DataFrame创建数据框
data={'Indv':[1,1,1,1,1,1,1,1,1,1,1,2,2,2],
'Region1':['A','A','A','A','B','B','B','C','C','C','D','A','A','C'],
'diff':[ 1,1,10,1,1,1,1,10,1,1,1,-11,1,1]
}
df = pd.DataFrame(data)
  1. Declare the function used to find id1 and id2:声明用于查找 id1 和 id2 的函数:
def createId1(group):
    cumsum = group.ne(1).cumsum()
    if cumsum.iloc[0] == 0:
        return cumsum + 1
    return cumsum

def createId2(group):
    return group.ne(group.shift(1)).cumsum()
  1. Create the id columns创建 id 列
df["id1"] = df.groupby(["Indv"])["diff"].transform(lambda group: createId1(group))
df["id2"] = df.groupby(["Indv", "id1"])["Region1"].transform(lambda group: createId2(group))
df["id3"] = df.groupby(["Indv", "id1", "id2"]).cumcount()+1

Ouput:输出:

print(df.to_string())

    Indv Region1  diff  id1  id2  id3
0      1       A     1    1    1    1
1      1       A     1    1    1    2
2      1       A    10    2    1    1
3      1       A     1    2    1    2
4      1       B     1    2    2    1
5      1       B     1    2    2    2
6      1       B     1    2    2    3
7      1       C    10    3    1    1
8      1       C     1    3    1    2
9      1       C     1    3    1    3
10     1       D     1    3    2    1
11     2       A   -11    1    1    1
12     2       A     1    1    1    2
13     2       C     1    1    2    1

Documentation:文档:

DataFrame.groupby : group rows based on a mapper (here I used one or several series). DataFrame.groupby :基于映射器对行进行分组(这里我使用了一个或多个系列)。

GrouBy.transform : apply a function on each groups ( GroupBy.apply would have worked too). GrouBy.transform :在每个组上应用一个函数( GroupBy.apply也可以)。

Series.ne : return a series of boolean based on non equality element wise of a value. Series.ne :基于值的非相等元素返回一系列布尔值。

Series.shift : shift the index of a series by a given step. Series.shift :将系列的索引移动给定的步骤。

DataFrame.cumsum : return the cumulative sum of the Series. DataFrame.cumsum :返回系列的累积总和。 When applied on boolean Series return the cumulative sum of True values encountered.当应用于布尔系列时,返回遇到的 True 值的累积总和。

GroupBy.cumcount : Number each item in a group starting at 0. GroupBy.cumcount :从 0 开始为组中的每个项目编号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM