[英]Pandas: Create dataframe column based on other dataframe
If I have 2 dataframes like these two: 如果我有两个像这样的两个数据框:
import pandas as pd
df1 = pd.DataFrame({'Type':list('AABAC')})
df2 = pd.DataFrame({'Type':list('ABCDEF'), 'Value':[1,2,3,4,5,6]})
Type
0 A
1 A
2 B
3 A
4 C
Type Value
0 A 1
1 B 2
2 C 3
3 D 4
4 E 5
5 F 6
I would like to add a column in df1 based on the values in df2. 我想基于df2中的值在df1中添加一列。 df2 only contains unique values, whereas df1 has multiple entries of each value. df2仅包含唯一值,而df1每个值都有多个条目。 So the resulting df1 should look like this: 因此,生成的df1应该如下所示:
Type Value
0 A 1
1 A 1
2 B 2
3 A 1
4 C 3
My actual dataframe df1 is quite long, so I need something that is efficient (I tried it in a loop but this takes forever). 我的实际数据帧df1很长,因此我需要一些有效的东西(我在一个循环中尝试过,但这要花很多时间)。
You could create dict
from your df2
with to_dict
method and then map
result to Type
column for df1
: 您可以使用to_dict
方法从df2
创建dict
,然后map
结果map
到df1
Type
列:
replace_dict = dict(df2.to_dict('split')['data'])
In [50]: replace_dict
Out[50]: {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6}
df1['Value'] = df1['Type'].map(replace_dict)
In [52]: df1
Out[52]:
Type Value
0 A 1
1 A 1
2 B 2
3 A 1
4 C 3
As requested I am posting a solution that uses map
without the need to create a temporary dict: 根据要求,我发布了一个使用map
的解决方案,而无需创建临时字典:
In[3]:
df1['Value'] = df1['Type'].map(df2.set_index('Type')['Value'])
df1
Out[3]:
Type Value
0 A 1
1 A 1
2 B 2
3 A 1
4 C 3
This relies on a couple things, that the key values that are being looked up exist otherwise we get a KeyError
and that we don't have duplicate entries in df2
otherwise setting the index raises InvalidIndexError: Reindexing only valid with uniquely valued Index objects
这依赖于以下InvalidIndexError: Reindexing only valid with uniquely valued Index objects
正在查找的键值存在,否则我们将得到KeyError
并且df2
没有重复的条目,否则设置索引会引发InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Another way to do this is by using the label based indexer loc
. 另一种方法是使用基于标签的索引器loc
。 First use the Type
column as the index using .set_index
, then access using the df1
column, and reset the index to the original with .reset_index
: 首先使用Type
列的索引使用.set_index
,然后访问使用df1
列,指数恢复到原来用.reset_index
:
df2.set_index('Type').loc[df1['Type'],:].reset_index()
Either use this as your new df1
or extract the Value
column: 将此用作新的df1
或提取“ Value
列:
df1['Value'] = df2.set_index('Type').loc[df1['Type'],:].reset_index()['Value']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.