繁体   English   中英

如果来自另一个 dataframe 的列和来自原始 dataframe 的列具有匹配值,则在原始 dataframe 中创建一个新列

[英]Create a new column in the original dataframe if the column from another dataframe and a column from original dataframe have matching values

我在 Python 中有两个数据框。 一个有超过 90,000 行。 I would like to create a new column in the original dataframe from another dataframe if column values of the second dataframe match values in the original dataframe.

例如,如果给我两个这样的 DataFrame:

         countries = {'Country':['India','South Korea', 'France', 'Austria', 'India','Spain',             
                                 'France', 'Algeria', 'Angola','Spain','Belgium','Austria'],
          'Capital':['Delhi', 'Seoul', 'Paris', 'Vienna', 'Delhi', 'Madrid', 'Paris', 
                     'Algiers','Luanda','Madrid','Brussels','Vienna'],
          'Landmark':['TajMahal','Seoul Tower','EiffelTower','Belvedere Palace', 'TajMahal', 
                      'La Sagrada','EiffelTower','Algiers Memorial','Ruacana Falls','La 
                      'Sagrada','Grand Place','Belvedere Palace']
         }

        language = {'Country':['India','South Korea', 'France', 'Algeria', 'Angola', 'Spain', 
        'Belgium', 'Austria'],
                    'Language':['Hindi', 'Korean', 'French', 'Arabic', 'Portuguese', 'Spanish', 
                                'Dutch', 'German']
           }

>>>df1

         Country   Capital          Landmark
0         India     Delhi          TajMahal
1   South Korea     Seoul       Seoul Tower
2        France     Paris       EiffelTower
3       Austria    Vienna  Belvedere Palace
4         India     Delhi          TajMahal
5         Spain    Madrid        La Sagrada
6        France     Paris       EiffelTower
7       Algeria   Algiers  Algiers Memorial
8        Angola    Luanda     Ruacana Falls
9         Spain    Madrid        La Sagrada
10      Belgium  Brussels       Grand Place
11      Austria    Vienna  Belvedere Palace

>>>df2

      Country   Language
0        India      Hindi
1  South Korea     Korean
2       France     French
3      Algeria     Arabic
4       Angola  Portuguese
5        Spain    Spanish
6      Belgium      Dutch
7      Austria     German

我想得到这样的结果:

>>>df1

        Country   Capital          Landmark   Language
0         India     Delhi          TajMahal      Hindi
1   South Korea     Seoul       Seoul Tower     Korean
2        France     Paris       EiffelTower     French
3       Austria    Vienna  Belvedere Palace     German
4         India     Delhi          TajMahal      Hindi
5         Spain    Madrid        La Sagrada   Spanish
6        France     Paris       EiffelTower     French
7       Algeria   Algiers  Algiers Memorial     Arabic
8        Angola    Luanda     Ruacana Falls  Portuguese
9         Spain    Madrid        La Sagrada    Spanish
10      Belgium  Brussels       Grand Place      Dutch
11      Austria    Vienna  Belvedere Palace     German

我试过使用嵌套的 for 循环,但是我的 python 代码进入了一个无限循环,我想杀死程序来摆脱它。 这是我收到的错误消息:

ValueError                                Traceback (most recent call last)
<ipython-input-13-c4d8473be816> in <module>
----> 1 df2['Countrylanguage'] = languages

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   3368         else:
   3369             # set column
-> 3370             self._set_item(key, value)
   3371 
   3372     def _setitem_slice(self, key, value):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   3443 
   3444         self._ensure_valid_index(value)
-> 3445         value = self._sanitize_column(key, value)
   3446         NDFrame._set_item(self, key, value)
   3447 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
   3628 
   3629             # turn me into an ndarray
-> 3630             value = sanitize_index(value, self.index, copy=False)
   3631             if not isinstance(value, (np.ndarray, Index)):
   3632                 if isinstance(value, list) and len(value) > 0:

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/internals/construction.py in sanitize_index(data, index, copy)
    517 
    518     if len(data) != len(index):
--> 519         raise ValueError('Length of values does not match length of index')
    520 
    521     if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match the length of the index

在原来的DataFrame中添加新列的正确方法是什么?

谢谢您的帮助!

有很多方法可以做到这一点,包括merge, join, map ,这是其中之一,

df1.merge(df2)

或者,我建议创建以下字典并执行map

language = {'India': 'Hindi',
            'South Korea': 'Korean',
            'France': 'French',
            'Algeria': 'Arabic',
            'Angola': 'Portuguese',
            'Spain': 'Spanish',
            'Belgium': 'Dutch',
            'Austria': 'German'}

df1['Language'] = df1['Country'].map(language)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM