I have two dataframes in Python. One has got more than 90,000 rows. I would like to create a new column in the original dataframe from another dataframe if column values of the second dataframe match values in the original dataframe.
For example, if I'm given two DataFrames like this:
countries = {'Country':['India','South Korea', 'France', 'Austria', 'India','Spain',
'France', 'Algeria', 'Angola','Spain','Belgium','Austria'],
'Capital':['Delhi', 'Seoul', 'Paris', 'Vienna', 'Delhi', 'Madrid', 'Paris',
'Algiers','Luanda','Madrid','Brussels','Vienna'],
'Landmark':['TajMahal','Seoul Tower','EiffelTower','Belvedere Palace', 'TajMahal',
'La Sagrada','EiffelTower','Algiers Memorial','Ruacana Falls','La
'Sagrada','Grand Place','Belvedere Palace']
}
language = {'Country':['India','South Korea', 'France', 'Algeria', 'Angola', 'Spain',
'Belgium', 'Austria'],
'Language':['Hindi', 'Korean', 'French', 'Arabic', 'Portuguese', 'Spanish',
'Dutch', 'German']
}
>>>df1
Country Capital Landmark
0 India Delhi TajMahal
1 South Korea Seoul Seoul Tower
2 France Paris EiffelTower
3 Austria Vienna Belvedere Palace
4 India Delhi TajMahal
5 Spain Madrid La Sagrada
6 France Paris EiffelTower
7 Algeria Algiers Algiers Memorial
8 Angola Luanda Ruacana Falls
9 Spain Madrid La Sagrada
10 Belgium Brussels Grand Place
11 Austria Vienna Belvedere Palace
>>>df2
Country Language
0 India Hindi
1 South Korea Korean
2 France French
3 Algeria Arabic
4 Angola Portuguese
5 Spain Spanish
6 Belgium Dutch
7 Austria German
I would like to get a result like this:
>>>df1
Country Capital Landmark Language
0 India Delhi TajMahal Hindi
1 South Korea Seoul Seoul Tower Korean
2 France Paris EiffelTower French
3 Austria Vienna Belvedere Palace German
4 India Delhi TajMahal Hindi
5 Spain Madrid La Sagrada Spanish
6 France Paris EiffelTower French
7 Algeria Algiers Algiers Memorial Arabic
8 Angola Luanda Ruacana Falls Portuguese
9 Spain Madrid La Sagrada Spanish
10 Belgium Brussels Grand Place Dutch
11 Austria Vienna Belvedere Palace German
ValueError Traceback (most recent call last)
<ipython-input-13-c4d8473be816> in <module>
----> 1 df2['Countrylanguage'] = languages
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
3368 else:
3369 # set column
-> 3370 self._set_item(key, value)
3371
3372 def _setitem_slice(self, key, value):
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _set_item(self, key, value)
3443
3444 self._ensure_valid_index(value)
-> 3445 value = self._sanitize_column(key, value)
3446 NDFrame._set_item(self, key, value)
3447
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
3628
3629 # turn me into an ndarray
-> 3630 value = sanitize_index(value, self.index, copy=False)
3631 if not isinstance(value, (np.ndarray, Index)):
3632 if isinstance(value, list) and len(value) > 0:
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/internals/construction.py in sanitize_index(data, index, copy)
517
518 if len(data) != len(index):
--> 519 raise ValueError('Length of values does not match length of index')
520
521 if isinstance(data, ABCIndexClass) and not copy:
ValueError: Length of values does not match the length of the index
What is the right way of adding a new column to the original DataFrame?
Thank you for your help!
There are many ways to do that, including merge, join, map
, here's one of them,
df1.merge(df2)
Alternatively, I would recommend creating the following dictionary and do map
language = {'India': 'Hindi',
'South Korea': 'Korean',
'France': 'French',
'Algeria': 'Arabic',
'Angola': 'Portuguese',
'Spain': 'Spanish',
'Belgium': 'Dutch',
'Austria': 'German'}
df1['Language'] = df1['Country'].map(language)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.