简体   繁体   中英

Create a new column in the original dataframe if the column from another dataframe and a column from original dataframe have matching values

I have two dataframes in Python. One has got more than 90,000 rows. I would like to create a new column in the original dataframe from another dataframe if column values of the second dataframe match values in the original dataframe.

For example, if I'm given two DataFrames like this:

         countries = {'Country':['India','South Korea', 'France', 'Austria', 'India','Spain',             
                                 'France', 'Algeria', 'Angola','Spain','Belgium','Austria'],
          'Capital':['Delhi', 'Seoul', 'Paris', 'Vienna', 'Delhi', 'Madrid', 'Paris', 
                     'Algiers','Luanda','Madrid','Brussels','Vienna'],
          'Landmark':['TajMahal','Seoul Tower','EiffelTower','Belvedere Palace', 'TajMahal', 
                      'La Sagrada','EiffelTower','Algiers Memorial','Ruacana Falls','La 
                      'Sagrada','Grand Place','Belvedere Palace']
         }

        language = {'Country':['India','South Korea', 'France', 'Algeria', 'Angola', 'Spain', 
        'Belgium', 'Austria'],
                    'Language':['Hindi', 'Korean', 'French', 'Arabic', 'Portuguese', 'Spanish', 
                                'Dutch', 'German']
           }

>>>df1

         Country   Capital          Landmark
0         India     Delhi          TajMahal
1   South Korea     Seoul       Seoul Tower
2        France     Paris       EiffelTower
3       Austria    Vienna  Belvedere Palace
4         India     Delhi          TajMahal
5         Spain    Madrid        La Sagrada
6        France     Paris       EiffelTower
7       Algeria   Algiers  Algiers Memorial
8        Angola    Luanda     Ruacana Falls
9         Spain    Madrid        La Sagrada
10      Belgium  Brussels       Grand Place
11      Austria    Vienna  Belvedere Palace

>>>df2

      Country   Language
0        India      Hindi
1  South Korea     Korean
2       France     French
3      Algeria     Arabic
4       Angola  Portuguese
5        Spain    Spanish
6      Belgium      Dutch
7      Austria     German

I would like to get a result like this:

>>>df1

        Country   Capital          Landmark   Language
0         India     Delhi          TajMahal      Hindi
1   South Korea     Seoul       Seoul Tower     Korean
2        France     Paris       EiffelTower     French
3       Austria    Vienna  Belvedere Palace     German
4         India     Delhi          TajMahal      Hindi
5         Spain    Madrid        La Sagrada   Spanish
6        France     Paris       EiffelTower     French
7       Algeria   Algiers  Algiers Memorial     Arabic
8        Angola    Luanda     Ruacana Falls  Portuguese
9         Spain    Madrid        La Sagrada    Spanish
10      Belgium  Brussels       Grand Place      Dutch
11      Austria    Vienna  Belvedere Palace     German

I've tried using nested for loops, but my python code goes into an infinite loop and I'd to kill the program to come out of it. This is the error message I'm getting:

ValueError                                Traceback (most recent call last)
<ipython-input-13-c4d8473be816> in <module>
----> 1 df2['Countrylanguage'] = languages

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   3368         else:
   3369             # set column
-> 3370             self._set_item(key, value)
   3371 
   3372     def _setitem_slice(self, key, value):

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   3443 
   3444         self._ensure_valid_index(value)
-> 3445         value = self._sanitize_column(key, value)
   3446         NDFrame._set_item(self, key, value)
   3447 

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
   3628 
   3629             # turn me into an ndarray
-> 3630             value = sanitize_index(value, self.index, copy=False)
   3631             if not isinstance(value, (np.ndarray, Index)):
   3632                 if isinstance(value, list) and len(value) > 0:

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/internals/construction.py in sanitize_index(data, index, copy)
    517 
    518     if len(data) != len(index):
--> 519         raise ValueError('Length of values does not match length of index')
    520 
    521     if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match the length of the index

What is the right way of adding a new column to the original DataFrame?

Thank you for your help!

There are many ways to do that, including merge, join, map , here's one of them,

df1.merge(df2)

Alternatively, I would recommend creating the following dictionary and do map

language = {'India': 'Hindi',
            'South Korea': 'Korean',
            'France': 'French',
            'Algeria': 'Arabic',
            'Angola': 'Portuguese',
            'Spain': 'Spanish',
            'Belgium': 'Dutch',
            'Austria': 'German'}

df1['Language'] = df1['Country'].map(language)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM