[英]Add column with number for each unique element in pandas
I have a dataframe such as :我有一个数据框,例如:
Groups species Numbers
G1 sp1 1
G1 sp2 2
G1 sp3 3
G1 sp4 4
G1 sp4 5
G1 sp5 6
G2 sp3 1
G2 sp3 2
G2 sp2 3
G3 sp1 1
G3 sp3 1
G4 sp3 1
G5 sp3 1
G5 sp3 2
G5 sp3 3
G5 sp1 4
List_groups =["G1","G5"]
and the idea is to replace the column Numbers
only for Groups
within the List_groups
, and to add a number for each unique species
column.这个想法是只为List_groups
Groups
替换列Numbers
,并为每个唯一的species
列添加一个数字。
Then I should get the following output :然后我应该得到以下输出:
Groups species Numbers
G1 sp1 1
G1 sp2 2
G1 sp3 3
G1 sp4 4
G1 sp4 4
G1 sp5 5
G2 sp3 1
G2 sp3 2
G2 sp2 3
G3 sp1 1
G3 sp3 1
G4 sp3 1
G5 sp3 1
G5 sp3 1
G5 sp3 1
G5 sp1 2
Here it was a possibility but since the dataframe is quite long, it takes to much time...这是一种可能性,但由于数据帧很长,因此需要很多时间......
m=tab.loc[~tab['Clustername'].isin(list_cluster)]
df.loc[m,'Numbers']=(df[m].groupby(['Groups','species']).ngroup()+1)
Here is the dataframe in dic format :这是 dic 格式的数据框:
{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G2', 7: 'G2', 8: 'G2', 9: 'G3', 10: 'G3', 11: 'G4', 12: 'G5', 13: 'G5', 14: 'G5', 15: 'G5'}, 'species': {0: 'sp1', 1: 'sp2', 2: 'sp3', 3: 'sp4', 4: 'sp4', 5: 'sp5', 6: 'sp3', 7: 'sp3', 8: 'sp2', 9: 'sp1', 10: 'sp3', 11: 'sp3', 12: 'sp3', 13: 'sp3', 14: 'sp3', 15: 'sp1'}, 'Numbers': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 1, 7: 2, 8: 3, 9: 1, 10: 1, 11: 1, 12: 1, 13: 2, 14: 3, 15: 4}}
You can use pandas.CategoricalIndex
and pandas.CategoricalIndex.codes
to assign the numbering for each unique specie within the group您可以使用pandas.CategoricalIndex
和pandas.CategoricalIndex.codes
为组内的每个唯一物种分配编号
List_groups = ["G1", "G5"]
filters = df["Groups"].isin(List_groups)
# change Numbers for rows in the group only, other groups are not affected
df.loc[filters, "Numbers"] = (
df[filters].groupby(["Groups"])["species"].transform(lambda x: pd.CategoricalIndex(x).codes + 1)
) # add 1 since codes starts from 0
Groups团体 | species物种 | Numbers数字 | |
---|---|---|---|
0 0 | G1 G1 | sp1 sp1 | 1 1 |
1 1 | G1 G1 | sp2 sp2 | 2 2 |
2 2 | G1 G1 | sp3 sp3 | 3 3 |
3 3 | G1 G1 | sp4 sp4 | 4 4 |
4 4 | G1 G1 | sp4 sp4 | 4 4 |
5 5 | G1 G1 | sp5 sp5 | 5 5 |
6 6 | G2 G2 | sp3 sp3 | 1 1 |
7 7 | G2 G2 | sp3 sp3 | 2 2 |
8 8 | G2 G2 | sp2 sp2 | 3 3 |
9 9 | G3 G3 | sp1 sp1 | 1 1 |
10 10 | G3 G3 | sp3 sp3 | 1 1 |
11 11 | G4 G4 | sp3 sp3 | 1 1 |
12 12 | G5 G5 | sp3 sp3 | 2 2 |
13 13 | G5 G5 | sp3 sp3 | 2 2 |
14 14 | G5 G5 | sp3 sp3 | 2 2 |
15 15 | G5 G5 | sp1 sp1 | 1 1 |
List_groups =["G1","G5"]
You can create a custom function for this:您可以为此创建自定义函数:
def getgroup(List_groups):
lst=[]
for x in List_groups:
m=df['Groups'].eq(x)
if m.any():
lst.append(df[m].groupby(['Groups','species'],sort=False).ngroup()+1)
return pd.concat(lst)
#Finally:
df['Numbers']=pd.Series(df.index.map(getgroup(List_groups))).fillna(df['Numbers']).astype(int)
You can cast species
as Category
type and use Category.cat.codes
您可以将species
为Category
类型并使用Category.cat.codes
m = df["Groups"].isin(List_groups)
c = (
df[m]
.groupby("Groups")["species"]
.apply(lambda x: x.astype("category").cat.codes)
+ 1
)
df.loc[m, "Numbers"] = c
Groups species Numbers
0 G1 sp1 1
1 G1 sp2 2
2 G1 sp3 3
3 G1 sp4 4
4 G1 sp4 4
5 G1 sp5 5
6 G2 sp3 1
7 G2 sp3 2
8 G2 sp2 3
9 G3 sp1 1
10 G3 sp3 1
11 G4 sp3 1
12 G5 sp3 2
13 G5 sp3 2
14 G5 sp3 2
15 G5 sp1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.