简体   繁体   English

为列pandas数据框分配唯一ID

[英]Assign unique id to columns pandas data frame

Hello I have the following dataframe 您好我有以下数据帧

df = 
A      B   
John   Tom
Homer  Bart
Tom    Maggie
Lisa   John 

I would like to assign to each name a unique ID and returns 我想为每个名称分配一个唯一的ID并返回

df = 
A      B         C    D

John   Tom       0    1
Homer  Bart      2    3
Tom    Maggie    1    4 
Lisa   John      5    0

What I have done is the following: 我所做的是以下内容:

LL1 = pd.concat([df.a,df.b],ignore_index=True)
LL1 = pd.DataFrame(LL1)
LL1.columns=['a']
nameun = pd.unique(LL1.a.ravel())
LLout['c'] = 0
LLout['d'] = 0
NN = list(nameun)
for i in range(1,len(LLout)):
   LLout.c[i] = NN.index(LLout.a[i])
   LLout.d[i] = NN.index(LLout.b[i])

But since I have a very large dataset this process is very slow. 但由于我有一个非常大的数据集,这个过程非常缓慢。

Here's one way. 这是一种方式。 First get the array of unique names: 首先获取唯一名称数组:

In [11]: df.values.ravel()
Out[11]: array(['John', 'Tom', 'Homer', 'Bart', 'Tom', 'Maggie', 'Lisa', 'John'], dtype=object)

In [12]: pd.unique(df.values.ravel())
Out[12]: array(['John', 'Tom', 'Homer', 'Bart', 'Maggie', 'Lisa'], dtype=object)

and make this a Series, mapping names to their respective numbers: 并将其设为系列,将名称映射到各自的数字:

In [13]: names = pd.unique(df.values.ravel())

In [14]: names = pd.Series(np.arange(len(names)), names)

In [15]: names
Out[15]:
John      0
Tom       1
Homer     2
Bart      3
Maggie    4
Lisa      5
dtype: int64

Now use applymap and names.get to lookup these numbers: 现在使用applymapnames.get来查找这些数字:

In [16]: df.applymap(names.get)
Out[16]:
   A  B
0  0  1
1  2  3
2  1  4
3  5  0

and assign it to the correct columns: 并将其分配给正确的列:

In [17]: df[["C", "D"]] = df.applymap(names.get)

In [18]: df
Out[18]:
       A       B  C  D
0   John     Tom  0  1
1  Homer    Bart  2  3
2    Tom  Maggie  1  4
3   Lisa    John  5  0

Note: This assumes that all the values are names to begin with, you may want to restrict this to some columns only: 注意:这假设所有值都是以名称开头的名称,您可能只想将其限制为某些列:

df[['A', 'B']].values.ravel()
...
df[['A', 'B']].applymap(names.get)

(Note: I'm assuming you don't care about the precise details of the mapping -- which number John becomes, for example -- but only that there is one.) (注意:我假设你不关心映射的精确细节 - 例如John变成的数字 - 但只有那个有。)

Method #1: you could use a Categorical object as an intermediary: 方法#1:您可以使用Categorical对象作为中介:

>>> ranked = pd.Categorical(df.stack()).codes.reshape(df.shape)
>>> df.join(pd.DataFrame(ranked, columns=["C", "D"]))
       A       B  C  D
0   John     Tom  2  5
1  Homer    Bart  1  0
2    Tom  Maggie  5  4
3   Lisa    John  3  2

It feels like you should be able to treat a Categorical as providing an encoding dictionary somehow (whether directly or by generating a Series) but I can't see a convenient way to do it. 感觉你应该能够将分类视为以某种方式提供编码字典(无论是直接还是通过生成系列),但我看不到一种方便的方法。

Method #2: you could use rank("dense") , which generates an increasing number for each value in order: 方法#2:你可以使用rank("dense") ,它按顺序为每个值生成一个递增的数字:

>>> ranked = df.stack().rank("dense").reshape(df.shape).astype(int)-1
>>> df.join(pd.DataFrame(ranked, columns=["C", "D"]))
       A       B  C  D
0   John     Tom  2  5
1  Homer    Bart  1  0
2    Tom  Maggie  5  4
3   Lisa    John  3  2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为 pandas 数据帧中的所有链接元素行分配相同的唯一 ID - Assign same unique ID for all chaining element rows from pandas data frame 将唯一 ID 分配给 Pandas 数据框中两列的组合,按其顺序独立 - Assign unique ID to combination of two columns in pandas dataframe independently on their order 如何根据条件在熊猫数据框的多列上分配值 - How to assign values on multiple columns of a pandas data frame based on condition 熊猫-为分组数据中的每个组分配唯一ID - Pandas - Assign unique ID to each group in grouped data 如何在pandas数据框的所有列中获取唯一值 - how to get unique values in all columns in pandas data frame 查找熊猫数据框列的所有唯一组合 - Finding all unique combinations of columns of pandas data frame 熊猫数据框和计数中选定列中值的唯一组合 - unique combinations of values in selected columns in pandas data frame and count python pandas 数据帧:将 function 返回元组分配给数据帧的两列 - python pandas data frame: assign function return tuple to two columns of a data frame Pandas 数据帧,重复的行只有两列具有唯一信息,将这些列移动到上一行的新列 - Pandas data frame, duplicated rows with only two columns with unique information, move these columns to new columns in previous row Python 为 pandas dataframe 中的两列和多行的组合分配唯一 ID - Python Assign unique ID to combination of two columns and multiples rows in pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM