简体   繁体   English

根据 pandas 中 3 列中的重复元素创建组列

[英]Create a group column based on duplicated elements within 3 columns in pandas

I have a dataframe such as:我有一个 dataframe 例如:

COL1 COL2 COL3
G1   SP1  A
G1   SP1  A
G1   SP2  B
G2   SP1  C
G2   SP2  C
G3   SP1  D
G3   SP1  D
G3   SP1  D

And I would simply like to add a new Groups column with groups of duplicated COL1,COL2 and COL3 values and a Nb_dup column with the number of dup such as:我只想添加一个新的Groups列,其中包含重复的COL1,COL2 and COL3值组,以及一个Nb_dup列,其中包含 dup 的数量,例如:

COL1 COL2 COL3 Groups Nb_dup
G1   SP1  A    Group1      2
G1   SP1  A    Group1      2
G1   SP2  B    Group2      1
G2   SP1  C    Group3      1
G2   SP2  C    Group4      1
G3   SP1  D    Group5      3
G3   SP1  D    Group5      3
G3   SP1  D    Group5      3

So far I tried:到目前为止,我尝试过:

key_set = set(df[['COL1','COL2','COL3']])
df_a = pd.DataFrame(list(key_set))
df_a['Groups'] = df_a.index
result = pd.merge(tab,df_a,left_on=['COL1','COL2','COL3'],right_on=0,how='left')

Here is the df in dict format if it can helps:如果有帮助,这里是 dict 格式的 df:

{'COL1': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G2', 4: 'G2', 5: 'G3', 6: 'G3', 7: 'G3'}, 'COL2': {0: 'SP1', 1: 'SP1', 2: 'SP2', 3: 'SP1', 4: 'SP2', 5: 'SP1', 6: 'SP1', 7: 'SP1'}, 'COL3': {0: 'A', 1: 'A', 2: 'B', 3: 'C', 4: 'C', 5: 'D', 6: 'D', 7: 'D'}}

Let's try我们试试看

cols = ['COL1', 'COL2', 'COL3']

df['Groups'] = 'Group' + df.groupby(cols).ngroup().add(1).astype(str)
df['Nb_dup'] = df.groupby('Groups')['Groups'].transform('count')
print(df)

  COL1 COL2 COL3  Groups  Nb_dup
0   G1  SP1    A  Group1       2
1   G1  SP1    A  Group1       2
2   G1  SP2    B  Group2       1
3   G2  SP1    C  Group3       1
4   G2  SP2    C  Group4       1
5   G3  SP1    D  Group5       3
6   G3  SP1    D  Group5       3
7   G3  SP1    D  Group5       3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于一列元素在 pandas 上创建新列 - create new columns on pandas based on one column elements 当两列重复时删除,但根据第三列的值保留(熊猫) - Remove when 2 columns are duplicated, but keep based on value of a third column (pandas) 根据 pandas 中另一列中的重复 ID 将行转换为宽列 - Converting rows to wide columns based on duplicated ids in another column in pandas 如何根据 pandas dataframe 中的多列按元素分组并将每组的元素数量保存在另一列中? - How can I group by elements based on multiple columns in pandas dataframe and save the number of elements of each group in another column? Python pandas:根据组内的最大值创建新列,但使用来自附加(字符串)列的值 - Python pandas: create new column based on max value within group, but using value from additional (string) column Pandas - 根据多列分组并在组内排名 - Pandas - Group by and rank within group based on multiple columns 比较两个熊猫数据框列的元素,并基于第三列创建一个新列 - Compare elements of two pandas data frame columns and create a new column based on a third column Pandas:根据组内元素的相对索引在多索引数据框中创建列 - Pandas: create column in multi-indexed dataframe from the relative indexes of elements within their group 如何根据 pandas DataFrame 中同一组内的先前值创建列? - How to create column based on previous value within the same group in pandas DataFrame? 使用基于列范围的条件创建 pandas 列 - Create pandas column with a condition based on a range of columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM