简体   繁体   English

大熊猫:根据其他数据框创建数据框列

[英]Pandas: Create dataframe column based on other dataframe

If I have 2 dataframes like these two: 如果我有两个像这样的两个数据框:

import pandas as pd

df1 = pd.DataFrame({'Type':list('AABAC')})
df2 = pd.DataFrame({'Type':list('ABCDEF'), 'Value':[1,2,3,4,5,6]})

  Type
0    A
1    A
2    B
3    A
4    C

  Type  Value
0    A      1
1    B      2
2    C      3
3    D      4
4    E      5
5    F      6

I would like to add a column in df1 based on the values in df2. 我想基于df2中的值在d​​f1中添加一列。 df2 only contains unique values, whereas df1 has multiple entries of each value. df2仅包含唯一值,而df1每个值都有多个条目。 So the resulting df1 should look like this: 因此,生成的df1应该如下所示:

  Type Value
0    A     1
1    A     1
2    B     2
3    A     1
4    C     3

My actual dataframe df1 is quite long, so I need something that is efficient (I tried it in a loop but this takes forever). 我的实际数据帧df1很长,因此我需要一些有效的东西(我在一个循环中尝试过,但这要花很多时间)。

You could create dict from your df2 with to_dict method and then map result to Type column for df1 : 您可以使用to_dict方法从df2创建dict ,然后map结果mapdf1 Type列:

replace_dict = dict(df2.to_dict('split')['data'])

In [50]: replace_dict
Out[50]: {'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6}

df1['Value'] = df1['Type'].map(replace_dict)

In [52]: df1
Out[52]:
  Type  Value
0    A      1
1    A      1
2    B      2
3    A      1
4    C      3

As requested I am posting a solution that uses map without the need to create a temporary dict: 根据要求,我发布了一个使用map的解决方案,而无需创建临时字典:

In[3]:
df1['Value'] = df1['Type'].map(df2.set_index('Type')['Value'])
df1

Out[3]: 
  Type  Value
0    A      1
1    A      1
2    B      2
3    A      1
4    C      3

This relies on a couple things, that the key values that are being looked up exist otherwise we get a KeyError and that we don't have duplicate entries in df2 otherwise setting the index raises InvalidIndexError: Reindexing only valid with uniquely valued Index objects 这依赖于以下InvalidIndexError: Reindexing only valid with uniquely valued Index objects正在查找的键值存在,否则我们将得到KeyError并且df2没有重复的条目,否则设置索引会引发InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Another way to do this is by using the label based indexer loc . 另一种方法是使用基于标签的索引器loc First use the Type column as the index using .set_index , then access using the df1 column, and reset the index to the original with .reset_index : 首先使用Type列的索引使用.set_index ,然后访问使用df1列,指数恢复到原来用.reset_index

df2.set_index('Type').loc[df1['Type'],:].reset_index()

Either use this as your new df1 or extract the Value column: 将此用作新的df1或提取“ Value列:

df1['Value'] = df2.set_index('Type').loc[df1['Type'],:].reset_index()['Value']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM