简体   繁体   English

为熊猫中的每个唯一元素添加带有编号的列

[英]Add column with number for each unique element in pandas

I have a dataframe such as :我有一个数据框,例如:

Groups species Numbers
G1 sp1 1
G1 sp2 2
G1 sp3 3
G1 sp4 4
G1 sp4 5
G1 sp5 6
G2 sp3 1
G2 sp3 2
G2 sp2 3
G3 sp1 1
G3 sp3 1
G4 sp3 1
G5 sp3 1
G5 sp3 2
G5 sp3 3
G5 sp1 4
List_groups =["G1","G5"]

and the idea is to replace the column Numbers only for Groups within the List_groups , and to add a number for each unique species column.这个想法是只为List_groups Groups替换列Numbers ,并为每个唯一的species列添加一个数字。

Then I should get the following output :然后我应该得到以下输出:

Groups species Numbers
G1 sp1 1
G1 sp2 2
G1 sp3 3
G1 sp4 4
G1 sp4 4
G1 sp5 5
G2 sp3 1
G2 sp3 2
G2 sp2 3
G3 sp1 1
G3 sp3 1
G4 sp3 1
G5 sp3 1
G5 sp3 1
G5 sp3 1
G5 sp1 2

Here it was a possibility but since the dataframe is quite long, it takes to much time...这是一种可能性,但由于数据帧很长,因此需要很多时间......

m=tab.loc[~tab['Clustername'].isin(list_cluster)]
df.loc[m,'Numbers']=(df[m].groupby(['Groups','species']).ngroup()+1)

Here is the dataframe in dic format :这是 dic 格式的数据框:

{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G2', 7: 'G2', 8: 'G2', 9: 'G3', 10: 'G3', 11: 'G4', 12: 'G5', 13: 'G5', 14: 'G5', 15: 'G5'}, 'species': {0: 'sp1', 1: 'sp2', 2: 'sp3', 3: 'sp4', 4: 'sp4', 5: 'sp5', 6: 'sp3', 7: 'sp3', 8: 'sp2', 9: 'sp1', 10: 'sp3', 11: 'sp3', 12: 'sp3', 13: 'sp3', 14: 'sp3', 15: 'sp1'}, 'Numbers': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 1, 7: 2, 8: 3, 9: 1, 10: 1, 11: 1, 12: 1, 13: 2, 14: 3, 15: 4}}

You can use pandas.CategoricalIndex and pandas.CategoricalIndex.codes to assign the numbering for each unique specie within the group您可以使用pandas.CategoricalIndexpandas.CategoricalIndex.codes为组内的每个唯一物种分配编号

List_groups = ["G1", "G5"]
filters = df["Groups"].isin(List_groups)

# change Numbers for rows in the group only, other groups are not affected
df.loc[filters, "Numbers"] = (
    df[filters].groupby(["Groups"])["species"].transform(lambda x: pd.CategoricalIndex(x).codes + 1)
) # add 1 since codes starts from 0
Groups团体 species物种 Numbers数字
0 0 G1 G1 sp1 sp1 1 1
1 1 G1 G1 sp2 sp2 2 2
2 2 G1 G1 sp3 sp3 3 3
3 3 G1 G1 sp4 sp4 4 4
4 4 G1 G1 sp4 sp4 4 4
5 5 G1 G1 sp5 sp5 5 5
6 6 G2 G2 sp3 sp3 1 1
7 7 G2 G2 sp3 sp3 2 2
8 8 G2 G2 sp2 sp2 3 3
9 9 G3 G3 sp1 sp1 1 1
10 10 G3 G3 sp3 sp3 1 1
11 11 G4 G4 sp3 sp3 1 1
12 12 G5 G5 sp3 sp3 2 2
13 13 G5 G5 sp3 sp3 2 2
14 14 G5 G5 sp3 sp3 2 2
15 15 G5 G5 sp1 sp1 1 1
List_groups =["G1","G5"]

You can create a custom function for this:您可以为此创建自定义函数:

def getgroup(List_groups):
    lst=[]
    for x in List_groups:
        m=df['Groups'].eq(x)
        if m.any():
            lst.append(df[m].groupby(['Groups','species'],sort=False).ngroup()+1)
    return pd.concat(lst)

#Finally:
df['Numbers']=pd.Series(df.index.map(getgroup(List_groups))).fillna(df['Numbers']).astype(int)

You can cast species as Category type and use Category.cat.codes您可以将speciesCategory类型并使用Category.cat.codes

m = df["Groups"].isin(List_groups)
c = (
    df[m]
    .groupby("Groups")["species"]
    .apply(lambda x: x.astype("category").cat.codes)
    + 1
)
df.loc[m, "Numbers"] = c

   Groups species  Numbers
0      G1     sp1        1
1      G1     sp2        2
2      G1     sp3        3
3      G1     sp4        4
4      G1     sp4        4
5      G1     sp5        5
6      G2     sp3        1
7      G2     sp3        2
8      G2     sp2        3
9      G3     sp1        1
10     G3     sp3        1
11     G4     sp3        1
12     G5     sp3        2
13     G5     sp3        2
14     G5     sp3        2
15     G5     sp1        1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM