[英]Create column with new IDs based on multiple IF Conditions
I am trying to assign values in new columns based on value changes in rows of other columns.我正在尝试根据其他列的行中的值更改在新列中分配值。 Please refer to the dataset given.请参考给出的数据集。
ID1- Is based on the diff columns, whenever value is NOT equal to 1, then It should assign new ID added one to ID in above row. ID1- 基于 diff 列,每当值不等于 1 时,它应该将新 ID 添加到上一行的 ID。
ID2- Assign the IDs when Region changes within ID1 ID2-当区域在 ID1 内发生变化时分配 ID
ID3- Assign the IDs within ID1 and ID2 ID3-分配 ID1 和 ID2 中的 ID
All the above three IDs should start from 1 when the Indv column changes to new value.当 Indv 列更改为新值时,上述三个 ID 都应从 1 开始。
import pandas as pd
# intialise data of lists.
data={'Indv':[1,1,1,1,1,1,1,1,1,1,1,2,2,2],
'Region':['A','A','A','A','B','B','B','C','C','C','D','A','A','C'],
'diff':[ 1,1,10,1,1,1,1,10,1,1,1,-11,1,1],
}
#CreateDataFrame
df=pd.DataFrame(data)
#creating ID1
df['ID1']=1
#Code only for ID1
for i in range(len(df)):
j=i+1
if(df['Indv'][i]!=df['Indv'][j]):
df['session_ID'][j]=1
if df['diff'][j]==1:
df['ID1'][j]=df['ID1'][i]
else:
df['ID1'][j]=df['ID1'][i]+1
break;
Indv, Region, diff, ID1, ID2, ID3
1, A, 1, 1, 1, 1
1, A, 1, 1, 1, 2
1, A, 10, 2, 1, 1
1, A, 1, 2, 1, 2
1, B, 1, 2, 2, 1
1, B, 1, 2, 2, 2
1, B, 1, 2, 2, 3
1, C, 10, 3, 1, 1
1, C, 1, 3, 1, 2
1, C, 1, 3, 1, 3
1, D, 1, 3, 2, 1
2, A, -11, 1, 1, 1
2, A, 1, 1, 1, 2
2, C, 1, 1, 2, 1
This is my solution:这是我的解决方案:
data={'Indv':[1,1,1,1,1,1,1,1,1,1,1,2,2,2],
'Region1':['A','A','A','A','B','B','B','C','C','C','D','A','A','C'],
'diff':[ 1,1,10,1,1,1,1,10,1,1,1,-11,1,1]
}
df = pd.DataFrame(data)
def createId1(group):
cumsum = group.ne(1).cumsum()
if cumsum.iloc[0] == 0:
return cumsum + 1
return cumsum
def createId2(group):
return group.ne(group.shift(1)).cumsum()
df["id1"] = df.groupby(["Indv"])["diff"].transform(lambda group: createId1(group))
df["id2"] = df.groupby(["Indv", "id1"])["Region1"].transform(lambda group: createId2(group))
df["id3"] = df.groupby(["Indv", "id1", "id2"]).cumcount()+1
Ouput:输出:
print(df.to_string())
Indv Region1 diff id1 id2 id3
0 1 A 1 1 1 1
1 1 A 1 1 1 2
2 1 A 10 2 1 1
3 1 A 1 2 1 2
4 1 B 1 2 2 1
5 1 B 1 2 2 2
6 1 B 1 2 2 3
7 1 C 10 3 1 1
8 1 C 1 3 1 2
9 1 C 1 3 1 3
10 1 D 1 3 2 1
11 2 A -11 1 1 1
12 2 A 1 1 1 2
13 2 C 1 1 2 1
Documentation:文档:
DataFrame.groupby : group rows based on a mapper (here I used one or several series). DataFrame.groupby :基于映射器对行进行分组(这里我使用了一个或多个系列)。
GrouBy.transform : apply a function on each groups ( GroupBy.apply would have worked too). GrouBy.transform :在每个组上应用一个函数( GroupBy.apply也可以)。
Series.ne : return a series of boolean based on non equality element wise of a value. Series.ne :基于值的非相等元素返回一系列布尔值。
Series.shift : shift the index of a series by a given step. Series.shift :将系列的索引移动给定的步骤。
DataFrame.cumsum : return the cumulative sum of the Series. DataFrame.cumsum :返回系列的累积总和。 When applied on boolean Series return the cumulative sum of True values encountered.当应用于布尔系列时,返回遇到的 True 值的累积总和。
GroupBy.cumcount : Number each item in a group starting at 0. GroupBy.cumcount :从 0 开始为组中的每个项目编号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.