[英]Add column with unique identifiers based on values from other columns in pandas
I have the foll. 我有傻瓜。 dataframe: 数据框:
Cnt Year JD Min_Temp
S 2000 1 277.139
S 2000 2 274.725
S 2001 1 270.945
S 2001 2 271.505
N 2000 1 257.709
N 2000 2 254.533
N 2000 3 258.472
N 2001 1 255.763
N 2001 2 265.714
N 2001 3 267.943
I would like to add a new column where each separate row for a given 'Cnt' is given a unique identifier (1,2,3...). 我想添加一个新列,其中给定'Cnt'的每个单独行都具有唯一的标识符(1,2,3 ...)。 So, the result should look like this: 因此,结果应如下所示:
Cnt Year JD Min_Temp unq
S 2000 1 277.139 1
S 2000 2 274.725 2
S 2001 1 270.945 3
S 2001 2 271.505 4
N 2000 1 257.709 1
N 2000 2 254.533 2
N 2000 3 258.472 3
N 2001 1 255.763 4
N 2001 2 265.714 5
N 2001 3 267.943 6
Here, each row corresponding to the same value in the column 'Cnt' as a unique identifier. 这里,每一行对应于列“ Cnt”中的相同值作为唯一标识符。
Currently, all I can do is add a new column with increasing values: df['unq'] = numpy.arange(1,len(df)) 当前,我所能做的就是添加一个具有递增值的新列:df ['unq'] = numpy.arange(1,len(df))
You could use groupby
with cumcount
您可以将cumcount
与groupby
一起cumcount
>>> df["unq"] = df.groupby("Cnt").cumcount() + 1
>>> df
Cnt Year JD Min_Temp unq
0 S 2000 1 277.139 1
1 S 2000 2 274.725 2
2 S 2001 1 270.945 3
3 S 2001 2 271.505 4
4 N 2000 1 257.709 1
5 N 2000 2 254.533 2
6 N 2000 3 258.472 3
7 N 2001 1 255.763 4
8 N 2001 2 265.714 5
9 N 2001 3 267.943 6
Note that because the groups are based on the Cnt column values and not on contiguity, if you have a second group of S below the group of N, the first unq
value in that group will be 5. 请注意,由于这些组是基于CNT列值,而不是在邻接,如果你有选自N以下S的第二组,所述第一unq
该组中的值将是5。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.