[英]Pandas merge doesn't retain as many rows as I would think
Consider the following two data frames考虑以下两个数据帧
df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})
Running跑步
df3 = pd.merge(df1, df2, on='a')
Yields产量
a b c
0 foo 1 3
But why not the following?但为什么不下面呢?
a b c
0 foo 1 3
1 bar 2 -
1 baz - 4
What do I need to tell python to get it to output both rows?我需要告诉 python 什么才能把它送到 output 两行?
A pandas merge does by default an inner join, if you are familiar with database joins.如果您熟悉数据库连接,则 pandas 合并默认情况下会执行内部连接。 That means it only returns the rows that have a matching entry in both the left and right dataframe.
这意味着它只返回在左右 dataframe 中具有匹配条目的行。 For you, that is just 'foo'.
对你来说,这只是'foo'。
You can change that by setting the how
argument.您可以通过设置
how
参数来更改它。 If you want all rows from both left, and right set it to outer
, if you want to keep all from the left frame set it to left
and if you want to keep all from the right frame set it to right
.如果您想要左侧和右侧的所有行,请将其设置为
outer
,如果您想保留左侧框架中的所有行,请将其设置为left
,如果您想保留右侧框架中的所有行,请将其设置为right
。
pd.merge(df1, df2, on='a', how='outer')
will join on matching keys with all non matching keys returned as a new row will NaN
filling in the blanks. pd.merge(df1, df2, on='a', how='outer')
将加入匹配的键,所有不匹配的键作为新行返回, NaN
将填充空白。
try here for an overview of different types of SQL style joins which merge
uses as basis.尝试在这里查看
merge
用作基础的不同类型的 SQL 样式连接的概述。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.