简体   繁体   English

如何分配唯一ID来检测pandas数据帧中的重复行?

[英]How to assign a unique ID to detect repeated rows in a pandas dataframe?

I am working with a large pandas dataframe, with several columns pretty much like this: 我正在使用一个大型的pandas数据框,其中有几个列非常类似:

A      B         C    D   

John   Tom       0    1
Homer  Bart      2    3
Tom    Maggie    1    4 
Lisa   John      5    0
Homer  Bart      2    3
Lisa   John      5    0
Homer  Bart      2    3
Homer  Bart      2    3
Tom    Maggie    1    4

How can I assign an unique id to each repeated row? 如何为每个重复的行分配唯一的ID? For example: 例如:

A      B         C    D      new_id

John   Tom       0    1.2      1
Homer  Bart      2    3.0      2
Tom    Maggie    1    4.2      3
Lisa   John      5    0        4
Homer  Bart      2    3        5
Lisa   John      5    0        4
Homer  Bart      2    3.0      2
Homer  Bart      2    3.0      2
Tom    Maggie    1    4.1      6

I know that I can use duplicate to detect the duplicated rows, however I can not visualize were are reapeting those rows. 我知道我可以使用duplicate来检测重复的行,但是我无法想象正在重新划分这些行。 I tried to: 我试过了:

df.assign(id=(df.columns).astype('category').cat.codes)
df

However, is not working. 但是,不起作用。 How can I get a unique id for detecting groups of duplicated rows? 如何获取用于检测重复行组的唯一ID?

按您尝试查找重复项的列分组并使用ngroup

df['new_id'] = df.groupby(['A','B','C','D']).ngroup()

For small dataframes, you can convert your rows to tuples, which can be hashed, and then use pd.factorize . 对于小型数据帧,您可以将行转换为可以散列的元组,然后使用pd.factorize

df['new_id'] = pd.factorize(df.apply(tuple, axis=1))[0] + 1

groupby is more efficient for larger dataframes: groupby对于更大的数据帧更有效:

df['new_id'] = df.groupby(df.columns.tolist(), sort=False).ngroup() + 1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何为 pandas dataframe 中的重复列值序列分配唯一 ID? - How to assign a unique id for a sequence of repeated column value in pandas dataframe? 如何根据条件为pandas数据框中的行组分配唯一值? - How to assign unique values to groups of rows in a pandas dataframe based on a condition? 如何为熊猫数据框中的不同组分配唯一的ID? - How to assign a unique ID for different groups in pandas dataframe? 为 Pandas 组分配唯一 ID,但如果重复则添加一个 - Assign unique ID to Pandas group but add one if repeated 为熊猫数据框中的每 4 行分配一个数字 ID - assign a number id for every 4 rows in pandas dataframe Python 为 pandas dataframe 中的两列和多行的组合分配唯一 ID - Python Assign unique ID to combination of two columns and multiples rows in pandas dataframe 如何为 Pandas 数据框列中的每个唯一值添加重复的月份行? - How do I add repeated month rows for every unique value in a pandas dataframe column? 如何为整个 dataframe 分配唯一 ID? - How to assign a unique ID to an entire dataframe? 将唯一 ID 分配给 Pandas 数据框中两列的组合,按其顺序独立 - Assign unique ID to combination of two columns in pandas dataframe independently on their order 在给定列中删除具有唯一元素的pandas dataFrame行。 (独特的意思是重复一次) - Drop rows of a pandas dataFrame with unique elements in a given column. (by unique I mean repeated once)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM