为熊猫中的每个唯一元素添加带有编号的列

Question

I have a dataframe such as :我有一个数据框，例如：

Groups species Numbers
G1 sp1 1
G1 sp2 2
G1 sp3 3
G1 sp4 4
G1 sp4 5
G1 sp5 6
G2 sp3 1
G2 sp3 2
G2 sp2 3
G3 sp1 1
G3 sp3 1
G4 sp3 1
G5 sp3 1
G5 sp3 2
G5 sp3 3
G5 sp1 4
List_groups =["G1","G5"]

and the idea is to replace the column Numbers only for Groups within the List_groups , and to add a number for each unique species column.这个想法是只为List_groups Groups替换列Numbers ，并为每个唯一的species列添加一个数字。

Then I should get the following output :然后我应该得到以下输出：

Groups species Numbers
G1 sp1 1
G1 sp2 2
G1 sp3 3
G1 sp4 4
G1 sp4 4
G1 sp5 5
G2 sp3 1
G2 sp3 2
G2 sp2 3
G3 sp1 1
G3 sp3 1
G4 sp3 1
G5 sp3 1
G5 sp3 1
G5 sp3 1
G5 sp1 2

Here it was a possibility but since the dataframe is quite long, it takes to much time...这是一种可能性，但由于数据帧很长，因此需要很多时间......

m=tab.loc[~tab['Clustername'].isin(list_cluster)]
df.loc[m,'Numbers']=(df[m].groupby(['Groups','species']).ngroup()+1)

Here is the dataframe in dic format :这是 dic 格式的数据框：

{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G2', 7: 'G2', 8: 'G2', 9: 'G3', 10: 'G3', 11: 'G4', 12: 'G5', 13: 'G5', 14: 'G5', 15: 'G5'}, 'species': {0: 'sp1', 1: 'sp2', 2: 'sp3', 3: 'sp4', 4: 'sp4', 5: 'sp5', 6: 'sp3', 7: 'sp3', 8: 'sp2', 9: 'sp1', 10: 'sp3', 11: 'sp3', 12: 'sp3', 13: 'sp3', 14: 'sp3', 15: 'sp1'}, 'Numbers': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 1, 7: 2, 8: 3, 9: 1, 10: 1, 11: 1, 12: 1, 13: 2, 14: 3, 15: 4}}

Answer 1

You can use pandas.CategoricalIndex and pandas.CategoricalIndex.codes to assign the numbering for each unique specie within the group您可以使用pandas.CategoricalIndex和pandas.CategoricalIndex.codes为组内的每个唯一物种分配编号

List_groups = ["G1", "G5"]
filters = df["Groups"].isin(List_groups)

# change Numbers for rows in the group only, other groups are not affected
df.loc[filters, "Numbers"] = (
    df[filters].groupby(["Groups"])["species"].transform(lambda x: pd.CategoricalIndex(x).codes + 1)
) # add 1 since codes starts from 0

	Groups团体	species物种	Numbers数字
0 0	G1 G1	sp1 sp1	1 1
1 1	G1 G1	sp2 sp2	2 2
2 2	G1 G1	sp3 sp3	3 3
3 3	G1 G1	sp4 sp4	4 4
4 4	G1 G1	sp4 sp4	4 4
5 5	G1 G1	sp5 sp5	5 5
6 6	G2 G2	sp3 sp3	1 1
7 7	G2 G2	sp3 sp3	2 2
8 8	G2 G2	sp2 sp2	3 3
9 9	G3 G3	sp1 sp1	1 1
10 10	G3 G3	sp3 sp3	1 1
11 11	G4 G4	sp3 sp3	1 1
12 12	G5 G5	sp3 sp3	2 2
13 13	G5 G5	sp3 sp3	2 2
14 14	G5 G5	sp3 sp3	2 2
15 15	G5 G5	sp1 sp1	1 1

Answer 2

List_groups =["G1","G5"]

You can create a custom function for this:您可以为此创建自定义函数：

def getgroup(List_groups):
    lst=[]
    for x in List_groups:
        m=df['Groups'].eq(x)
        if m.any():
            lst.append(df[m].groupby(['Groups','species'],sort=False).ngroup()+1)
    return pd.concat(lst)

#Finally:
df['Numbers']=pd.Series(df.index.map(getgroup(List_groups))).fillna(df['Numbers']).astype(int)

Answer 3

You can cast species as Category type and use Category.cat.codes您可以将species为Category类型并使用Category.cat.codes

m = df["Groups"].isin(List_groups)
c = (
    df[m]
    .groupby("Groups")["species"]
    .apply(lambda x: x.astype("category").cat.codes)
    + 1
)
df.loc[m, "Numbers"] = c

   Groups species  Numbers
0      G1     sp1        1
1      G1     sp2        2
2      G1     sp3        3
3      G1     sp4        4
4      G1     sp4        4
5      G1     sp5        5
6      G2     sp3        1
7      G2     sp3        2
8      G2     sp2        3
9      G3     sp1        1
10     G3     sp3        1
11     G4     sp3        1
12     G5     sp3        2
13     G5     sp3        2
14     G5     sp3        2
15     G5     sp1        1

为熊猫中的每个唯一元素添加带有编号的列

问题描述

3 个解决方案

解决方案1
4 2021-07-18 09:07:59

解决方案2
3 已采纳 2021-07-18 08:48:58

解决方案3
3 2021-07-18 09:09:40

为熊猫中的每个唯一元素添加带有编号的列

问题描述

3 个解决方案

解决方案1 4 2021-07-18 09:07:59

解决方案2 3 已采纳 2021-07-18 08:48:58

解决方案3 3 2021-07-18 09:09:40

解决方案1
4 2021-07-18 09:07:59

解决方案2
3 已采纳 2021-07-18 08:48:58

解决方案3
3 2021-07-18 09:09:40