[英]how to do left join using pandas
我有2个数据框,它看起来像这样:DF1:
Product, Region, ProductScore
AAA, R1,100
AAA, R2,100
BBB, R2,200
BBB, R3,200
DF2:
Region, RegionScore
R1,1
R2,2
我怎样才能将这2个加入1个数据帧,结果应该是这样的:
Product, Region, ProductScore, RegionScore
AAA, R1,100,1
AAA, R2,100,2
BBB, R2,200,2
非常感谢!
EDIT1:
我使用了df.merge(df_new)收到此错误消息:
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 4071, in merge
suffixes=suffixes, copy=copy)
File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 37, in merge
copy=copy)
File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 183, in __init__
self.join_names) = self._get_merge_keys()
File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 318, in _get_merge_keys
self._validate_specification()
File "C:\Python34\lib\site-packages\pandas\tools\merge.py", line 409, in _validate_specification
if not self.right.columns.is_unique:
AttributeError: 'list' object has no attribute 'is_unique'
EDIT2:我意识到我的df_new是一个数据系列(通过使用groupby创建)而不是数据帧。 现在我已将其转换为数据帧,这里是信息:print(df.info())Int64Index:1111个条目,0到1110数据列(共8列):产品1111非空对象reviewuserId 1111非空对象ReviewprofileName 1111非空对象reviewelpfulness 881非空float64评论核心1111非空float64审查时间1111非空int64评论摘要1111非空对象reviewtext 1111非空对象dtypes:float64(2),int64(1),object (5)内存使用量:56.4+ KB无
print(df_new_2.info())
<class 'pandas.core.frame.DataFrame'>
Index: 1089 entries, A100Y8WSLFJN7Q to AZWBQPQN96SS6
Data columns (total 1 columns):
reviewelpfulnessbyuserid 864 non-null float64
dtypes: float64(1)
memory usage: 12.8+ KB
None
print(df.head())
product reviewuserId reviewprofileName \
0 B003AI2VGA A141HP4LYPWMSR Brian E. Erland "Rainbow Sphinx"
1 B003AI2VGA A328S9RN3U5M68 Grady Harp
2 B003AI2VGA A1I7QGUDP043DG Chrissy K. McVay "Writer"
3 B003AI2VGA A1M5405JH9THP9 golgotha.gov
4 B003AI2VGA ATXL536YX71TR KerrLines ""MoviesMusicTheatre""
reviewelpfulness reviewscore reviewtime \
0 1.0 3 1182729600
1 1.0 3 1181952000
2 0.8 5 1164844800
3 1.0 3 1197158400
4 1.0 3 1188345600
reviewsummary \
0 There Is So Much Darkness Now ~ Come For The M...
1 Worthwhile and Important Story Hampered by Poo...
2 This movie needed to be made.
3 distantly based on a real tragedy
4 What's going on down in Juarez and shining a l...
reviewtext
0 Synopsis: On the daily trek from Juarez Mexico...
1 THE VIRGIN OF JUAREZ is based on true events s...
2 The scenes in this film can be very disquietin...
3 THE VIRGIN OF JUAREZ (2006)<br />directed by K...
4 Informationally this SHOWTIME original is esse...
print(df_new_2.head())
reviewelpfulnessbyuserid
reviewuserId
A100Y8WSLFJN7Q NaN
A103VZ3KDF2RT5 0.555556
A1041HQGJDKFG5 0.000000
A10FBJXMQPI0LL 0.333333
A10LIHFA4SSK3F 0.000000
现在错误消息看起来像这样:
File "pandas\hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12245)
KeyError: 'reviewuserId'
打印这些信息后,我只需添加以下df_new_2 = df_new.to_frame().reset_index()
即可解决此问题: df_new_2 = df_new.to_frame().reset_index()
您想要的不是左合并,因为您跳过了R3
的行,您只想执行内部merge
:
In [120]:
df.merge(df1)
Out[120]:
Product Region ProductScore RegionScore
0 AAA R1 100 1
1 AAA R2 100 2
2 BBB R2 200 2
左合并将导致以下结果:
In [121]:
df.merge(df1, how='left')
Out[121]:
Product Region ProductScore RegionScore
0 AAA R1 100 1
1 AAA R2 100 2
2 BBB R2 200 2
3 BBB R3 200 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.