[英]Pandas - matching values from a column in one dataframe to several columns in another dataframe and creating new columns from the original dataframe
[英]Create a new column in the original dataframe if the column from another dataframe and a column from original dataframe have matching values
我在 Python 中有两个数据框。 一个有超过 90,000 行。 I would like to create a new column in the original dataframe from another dataframe if column values of the second dataframe match values in the original dataframe.
例如,如果给我两个这样的 DataFrame:
countries = {'Country':['India','South Korea', 'France', 'Austria', 'India','Spain',
'France', 'Algeria', 'Angola','Spain','Belgium','Austria'],
'Capital':['Delhi', 'Seoul', 'Paris', 'Vienna', 'Delhi', 'Madrid', 'Paris',
'Algiers','Luanda','Madrid','Brussels','Vienna'],
'Landmark':['TajMahal','Seoul Tower','EiffelTower','Belvedere Palace', 'TajMahal',
'La Sagrada','EiffelTower','Algiers Memorial','Ruacana Falls','La
'Sagrada','Grand Place','Belvedere Palace']
}
language = {'Country':['India','South Korea', 'France', 'Algeria', 'Angola', 'Spain',
'Belgium', 'Austria'],
'Language':['Hindi', 'Korean', 'French', 'Arabic', 'Portuguese', 'Spanish',
'Dutch', 'German']
}
>>>df1
Country Capital Landmark
0 India Delhi TajMahal
1 South Korea Seoul Seoul Tower
2 France Paris EiffelTower
3 Austria Vienna Belvedere Palace
4 India Delhi TajMahal
5 Spain Madrid La Sagrada
6 France Paris EiffelTower
7 Algeria Algiers Algiers Memorial
8 Angola Luanda Ruacana Falls
9 Spain Madrid La Sagrada
10 Belgium Brussels Grand Place
11 Austria Vienna Belvedere Palace
>>>df2
Country Language
0 India Hindi
1 South Korea Korean
2 France French
3 Algeria Arabic
4 Angola Portuguese
5 Spain Spanish
6 Belgium Dutch
7 Austria German
我想得到这样的结果:
>>>df1
Country Capital Landmark Language
0 India Delhi TajMahal Hindi
1 South Korea Seoul Seoul Tower Korean
2 France Paris EiffelTower French
3 Austria Vienna Belvedere Palace German
4 India Delhi TajMahal Hindi
5 Spain Madrid La Sagrada Spanish
6 France Paris EiffelTower French
7 Algeria Algiers Algiers Memorial Arabic
8 Angola Luanda Ruacana Falls Portuguese
9 Spain Madrid La Sagrada Spanish
10 Belgium Brussels Grand Place Dutch
11 Austria Vienna Belvedere Palace German
ValueError Traceback (most recent call last)
<ipython-input-13-c4d8473be816> in <module>
----> 1 df2['Countrylanguage'] = languages
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
3368 else:
3369 # set column
-> 3370 self._set_item(key, value)
3371
3372 def _setitem_slice(self, key, value):
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _set_item(self, key, value)
3443
3444 self._ensure_valid_index(value)
-> 3445 value = self._sanitize_column(key, value)
3446 NDFrame._set_item(self, key, value)
3447
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
3628
3629 # turn me into an ndarray
-> 3630 value = sanitize_index(value, self.index, copy=False)
3631 if not isinstance(value, (np.ndarray, Index)):
3632 if isinstance(value, list) and len(value) > 0:
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/internals/construction.py in sanitize_index(data, index, copy)
517
518 if len(data) != len(index):
--> 519 raise ValueError('Length of values does not match length of index')
520
521 if isinstance(data, ABCIndexClass) and not copy:
ValueError: Length of values does not match the length of the index
在原来的DataFrame中添加新列的正确方法是什么?
谢谢您的帮助!
有很多方法可以做到这一点,包括merge, join, map
,这是其中之一,
df1.merge(df2)
或者,我建议创建以下字典并执行map
language = {'India': 'Hindi',
'South Korea': 'Korean',
'France': 'French',
'Algeria': 'Arabic',
'Angola': 'Portuguese',
'Spain': 'Spanish',
'Belgium': 'Dutch',
'Austria': 'German'}
df1['Language'] = df1['Country'].map(language)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.