[英]Merge pandas dataframe with unequal length
I have two Pandas dataframes that I would like to merge into one.我有两个 Pandas 数据框,我想将它们合并为一个。 They have unequal length, but contain some of the same information.它们的长度不等,但包含一些相同的信息。
Here is the first dataframe:这是第一个数据框:
BOROUGH TYPE TCOUNT
MAN SPORT 5
MAN CONV 3
MAN WAGON 2
BRO SPORT 2
BRO CONV 3
Where column A
specifies a location, B
a category and C
a count.其中A
列指定位置, B
列指定类别, C
列指定计数。
And the second:第二个:
BOROUGH CAUSE CCOUNT
MAN ALCOHOL 5
MAN SIZE 3
BRO ALCOHOL 2
Here A
is again the same Location as in the other dataframe.这里A
再次与另一个数据帧中的位置相同。 But D
is another category, and E
is the count for D
in that location.但D
是另一个类别, E
是该位置D
的计数。
What I want (and haven't been able to do) is to get the following:我想要(并且无法做到)是获得以下内容:
BOROUGH TYPE TCOUNT CAUSE CCOUNT
MAN SPORT 5 ALCOHOL 5
MAN CONV 3 SIZE 3
MAN WAGON 2 NaN NaN
BRO SPORT 2 ALCOHOL 2
BRO CONV 3 NaN NaN
"-" can be anything. “-”可以是任何东西。 Preferably a string saying "Nothing".最好是一个字符串,上面写着“Nothing”。 If they default to NaN values, I guess it's just a matter of replacing those with a string.如果它们默认为 NaN 值,我想这只是用字符串替换它们的问题。
EDIT :编辑:
Output:输出:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 233 entries, 0 to 232
Data columns (total 3 columns):
BOROUGH 233 non-null object
CONTRIBUTING FACTOR VEHICLE 1 233 non-null object
RCOUNT 233 non-null int64
dtypes: int64(1), object(2)
memory usage: 7.3+ KB
None
<class 'pandas.core.frame.DataFrame'>
Int64Index: 83 entries, 0 to 82
Data columns (total 3 columns):
BOROUGH 83 non-null object
VEHICLE TYPE CODE 1 83 non-null object
VCOUNT 83 non-null int64
dtypes: int64(1), object(2)
memory usage: 2.6+ KB
None
Perform a left
type merge
on columns 'A','B' for the lhs and 'A','D' for the rhs as these are your key columns对 lhs 的列 'A','B' 和 rhs 的 'A','D' 执行left
类型merge
,因为这些是您的关键列
In [16]:
df.merge(df1, left_on=['A','B'], right_on=['A','D'], how='left')
Out[16]:
A B C D E
0 1 1 3 1 5
1 1 2 2 2 3
2 1 3 1 NaN NaN
3 2 1 1 1 2
4 2 2 4 NaN NaN
EDIT编辑
Your question has changed but essentially here you can use combine_first
:您的问题已更改,但基本上在这里您可以使用combine_first
:
In [26]:
merged = df.combine_first(df1)
merged
Out[26]:
BOROUGH CAUSE CCOUNT TCOUNT TYPE
0 MAN ALCOHOL 5 5 SPORT
1 MAN SIZE 3 3 CONV
2 MAN ALCOHOL 2 2 WAGON
3 BRO NaN NaN 2 SPORT
4 BRO NaN NaN 3 CONV
The NaN
you see for 'CAUSE' is the string 'NaN', we can use fillna
to replace these values:该NaN
,你看到的“原因”是字符串“男”,我们可以用fillna
来代替这些值:
In [27]:
merged['CAUSE'] = merged['CAUSE'].fillna('Nothing')
merged['CCOUNT'] = merged['CCOUNT'].fillna(0)
merged
Out[27]:
BOROUGH CAUSE CCOUNT TCOUNT TYPE
0 MAN ALCOHOL 5 5 SPORT
1 MAN SIZE 3 3 CONV
2 MAN ALCOHOL 2 2 WAGON
3 BRO Nothing 0 2 SPORT
4 BRO Nothing 0 3 CONV
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.