简体   繁体   English

根据列组合在 dataframe 中创建唯一标识符

[英]create unique identifier in dataframe based on combination of columns

I have the following dataframe:我有以下 dataframe:

    id  Lat         Lon         Year    Area    State
50319   -36.0629    -62.3423    2019    90  Iowa
18873   -36.0629    -62.3423    2017    90  Iowa
18876   -36.0754    -62.327     2017    124 Illinois
18878   -36.0688    -62.3353    2017    138 Kansas

I want to create a new column which assigns a unique identifier based on whether the columns Lat , Lon and Area have the same values.我想创建一个新列,它根据LatLonArea列是否具有相同的值来分配唯一标识符。 Eg in this case rows 1 and 2 have the same values in those columns and will be given the same unique identifier 0_Iowa where Iowa comes from the State column.例如,在这种情况下,第 1 行和第 2 行在这些列中具有相同的值,并将被赋予相同的唯一标识符0_Iowa ,其中Iowa来自State列。 I tried using a for loop but is there a more pythonic way to do it?我尝试使用 for 循环,但有没有更 Pythonic 的方式来做到这一点?

id       Lat         Lon       Year    Area State   unique_id
50319   -36.0629    -62.3423    2019    90  Iowa    0_Iowa
18873   -36.0629    -62.3423    2017    90  Iowa    0_Iowa
18876   -36.0754    -62.327     2017    124 Illinois    1_Illinois
18878   -36.0688    -62.3353    2017    138 Kansas  2_Kansas

I'd go with groupby.ngroup setting sort=False for the grouping and str.cat to concatenate with State setting a separator:我将 go 与groupby.ngroup设置sort=False用于分组和str.catState连接设置分隔符:

df['Sate'] = (df.groupby(['Lat','Lon','Area'], sort=False)
                .ngroup() 
                .astype(str)
                .str.cat(df.State, sep='_'))

print(df)

      id      Lat      Lon  Year  Area     State        Sate
0  50319 -36.0629 -62.3423  2019    90      Iowa      0_Iowa
1  18873 -36.0629 -62.3423  2017    90      Iowa      0_Iowa
2  18876 -36.0754 -62.3270  2017   124  Illinois  1_Illinois
3  18878 -36.0688 -62.3353  2017   138    Kansas    2_Kansas
1
​

you can do groupby.ngroup and add the column State:您可以执行groupby.ngroup并添加列 State:

df['unique_id'] = (df.groupby(['Lat', 'Lon','Area'], sort=False).ngroup().astype(str) 
                   + '_' + df['State'])
print (df)
      id      Lat      Lon  Year  Area     State   unique_id
0  50319 -36.0629 -62.3423  2019    90      Iowa      0_Iowa
1  18873 -36.0629 -62.3423  2017    90      Iowa      0_Iowa
2  18876 -36.0754 -62.3270  2017   124  Illinois  1_Illinois
3  18878 -36.0688 -62.3353  2017   138    Kansas    2_Kansas

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据列组合在 dataframe 中创建唯一标识符,但仅适用于重复行 - create unique identifier in dataframe based on combination of columns, but only for duplicated rows 如何基于多列创建唯一标识符? - How to create a unique identifier based on multiple columns? 根据其他列的唯一组合更改数据框列值 - Change the dataframe column values based on unique combination of other columns 根据Python中数据框中的条件创建包含所有唯一可能组合的列表 - Create list with all unique possible combination based on condition in dataframe in Python Pandas,如何根据多个列的组合创建一个唯一的ID? - In Pandas, how to create a unique ID based on the combination of many columns? 如何基于组合1和许多列在Pandas DataFrame中创建新列 - How to create a new column in Pandas DataFrame based on a combination 1 and many columns 基于具有预分配唯一标识符的 dataframe 为 dataframe 行分配唯一标识符 - Assign unique identifier for dataframe rows based on dataframe with preassigned unique identifier 检索任意行以获得数据框中列的唯一组合 - Retrieve arbitrary row for unique combination of columns in a dataframe Python/Pandas:根据共同的行标识符和唯一的行列组合从不同的数据框中划分数字列 - Python/Pandas: Divide numeric columns from different dataframes based on a common row identifier and unique row-col combination 根据两列的组合过滤 Pandas 数据框 - Filter Pandas dataframe based on combination of two columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM