简体   繁体   English

如果来自另一个 dataframe 的列和来自原始 dataframe 的列具有匹配值,则在原始 dataframe 中创建一个新列

[英]Create a new column in the original dataframe if the column from another dataframe and a column from original dataframe have matching values

I have two dataframes in Python.我在 Python 中有两个数据框。 One has got more than 90,000 rows.一个有超过 90,000 行。 I would like to create a new column in the original dataframe from another dataframe if column values of the second dataframe match values in the original dataframe. I would like to create a new column in the original dataframe from another dataframe if column values of the second dataframe match values in the original dataframe.

For example, if I'm given two DataFrames like this:例如,如果给我两个这样的 DataFrame:

         countries = {'Country':['India','South Korea', 'France', 'Austria', 'India','Spain',             
                                 'France', 'Algeria', 'Angola','Spain','Belgium','Austria'],
          'Capital':['Delhi', 'Seoul', 'Paris', 'Vienna', 'Delhi', 'Madrid', 'Paris', 
                     'Algiers','Luanda','Madrid','Brussels','Vienna'],
          'Landmark':['TajMahal','Seoul Tower','EiffelTower','Belvedere Palace', 'TajMahal', 
                      'La Sagrada','EiffelTower','Algiers Memorial','Ruacana Falls','La 
                      'Sagrada','Grand Place','Belvedere Palace']
         }

        language = {'Country':['India','South Korea', 'France', 'Algeria', 'Angola', 'Spain', 
        'Belgium', 'Austria'],
                    'Language':['Hindi', 'Korean', 'French', 'Arabic', 'Portuguese', 'Spanish', 
                                'Dutch', 'German']
           }

>>>df1

         Country   Capital          Landmark
0         India     Delhi          TajMahal
1   South Korea     Seoul       Seoul Tower
2        France     Paris       EiffelTower
3       Austria    Vienna  Belvedere Palace
4         India     Delhi          TajMahal
5         Spain    Madrid        La Sagrada
6        France     Paris       EiffelTower
7       Algeria   Algiers  Algiers Memorial
8        Angola    Luanda     Ruacana Falls
9         Spain    Madrid        La Sagrada
10      Belgium  Brussels       Grand Place
11      Austria    Vienna  Belvedere Palace

>>>df2

      Country   Language
0        India      Hindi
1  South Korea     Korean
2       France     French
3      Algeria     Arabic
4       Angola  Portuguese
5        Spain    Spanish
6      Belgium      Dutch
7      Austria     German

I would like to get a result like this:我想得到这样的结果:

>>>df1

        Country   Capital          Landmark   Language
0         India     Delhi          TajMahal      Hindi
1   South Korea     Seoul       Seoul Tower     Korean
2        France     Paris       EiffelTower     French
3       Austria    Vienna  Belvedere Palace     German
4         India     Delhi          TajMahal      Hindi
5         Spain    Madrid        La Sagrada   Spanish
6        France     Paris       EiffelTower     French
7       Algeria   Algiers  Algiers Memorial     Arabic
8        Angola    Luanda     Ruacana Falls  Portuguese
9         Spain    Madrid        La Sagrada    Spanish
10      Belgium  Brussels       Grand Place      Dutch
11      Austria    Vienna  Belvedere Palace     German

I've tried using nested for loops, but my python code goes into an infinite loop and I'd to kill the program to come out of it.我试过使用嵌套的 for 循环,但是我的 python 代码进入了一个无限循环,我想杀死程序来摆脱它。 This is the error message I'm getting:这是我收到的错误消息:

ValueError                                Traceback (most recent call last)
<ipython-input-13-c4d8473be816> in <module>
----> 1 df2['Countrylanguage'] = languages

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   3368         else:
   3369             # set column
-> 3370             self._set_item(key, value)
   3371 
   3372     def _setitem_slice(self, key, value):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   3443 
   3444         self._ensure_valid_index(value)
-> 3445         value = self._sanitize_column(key, value)
   3446         NDFrame._set_item(self, key, value)
   3447 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
   3628 
   3629             # turn me into an ndarray
-> 3630             value = sanitize_index(value, self.index, copy=False)
   3631             if not isinstance(value, (np.ndarray, Index)):
   3632                 if isinstance(value, list) and len(value) > 0:

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/internals/construction.py in sanitize_index(data, index, copy)
    517 
    518     if len(data) != len(index):
--> 519         raise ValueError('Length of values does not match length of index')
    520 
    521     if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match the length of the index

What is the right way of adding a new column to the original DataFrame?在原来的DataFrame中添加新列的正确方法是什么?

Thank you for your help!谢谢您的帮助!

There are many ways to do that, including merge, join, map , here's one of them,有很多方法可以做到这一点,包括merge, join, map ,这是其中之一,

df1.merge(df2)

Alternatively, I would recommend creating the following dictionary and do map或者,我建议创建以下字典并执行map

language = {'India': 'Hindi',
            'South Korea': 'Korean',
            'France': 'French',
            'Algeria': 'Arabic',
            'Angola': 'Portuguese',
            'Spain': 'Spanish',
            'Belgium': 'Dutch',
            'Austria': 'German'}

df1['Language'] = df1['Country'].map(language)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas - 将一个数据框中的列与另一个数据框中的多个列匹配,并从原始数据框创建新列 - Pandas - matching values from a column in one dataframe to several columns in another dataframe and creating new columns from the original dataframe 为列中的值从原始创建新数据框(需要为每个新数据框更改名称) - Create New Dataframes from Original for values in a column (Need to change name for every new dataframe) 熊猫:在一个数据框中创建新列,并根据与另一个数据框中的匹配键进行匹配 - Pandas: create new column in one dataframe with values based on matching key from another dataframe 如何创建一个新的数据框列,并从另一个列中移出值? - How to create a new dataframe column with shifted values from another column? 基于匹配来自另一个数据帧pandas的值的新列 - New column based on matching values from another dataframe pandas 如何从原始数据框列添加新的列 - python) - how to add new colume from original dataframe column - python) 使用来自另一个数据帧中匹配索引的值设置数据帧列 - Set dataframe column using values from matching indices in another dataframe 如何从原始 DataFrame - Python 将列添加到由 DataFrame 分组的列 - How to add a column to a grouped by DataFrame from the original DataFrame - Python 根据条件从另一个数据帧的值向数据帧添加新列 - Adding a new column to a dataframe from the values of another dataframe based on a condition 使用来自另一个数据帧的 if 条件在 Pandas 数据帧中创建一个新列 - create a new column in pandas dataframe using if condition from another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM