合并长度不等的熊猫数据框

Question

I have two Pandas dataframes that I would like to merge into one.我有两个 Pandas 数据框，我想将它们合并为一个。 They have unequal length, but contain some of the same information.它们的长度不等，但包含一些相同的信息。

Here is the first dataframe:这是第一个数据框：

BOROUGH  TYPE  TCOUNT
  MAN    SPORT   5
  MAN    CONV    3
  MAN    WAGON   2
  BRO    SPORT   2
  BRO    CONV    3

Where column A specifies a location, B a category and C a count.其中A列指定位置， B列指定类别， C列指定计数。

And the second:第二个：

BOROUGH  CAUSE  CCOUNT
  MAN   ALCOHOL   5
  MAN     SIZE    3
  BRO   ALCOHOL   2

Here A is again the same Location as in the other dataframe.这里A再次与另一个数据帧中的位置相同。 But D is another category, and E is the count for D in that location.但D是另一个类别， E是该位置D的计数。

What I want (and haven't been able to do) is to get the following:我想要（并且无法做到）是获得以下内容：

BOROUGH   TYPE   TCOUNT  CAUSE  CCOUNT
  MAN    SPORT     5    ALCOHOL    5
  MAN    CONV      3      SIZE     3
  MAN    WAGON     2      NaN     NaN
  BRO    SPORT     2    ALCOHOL    2
  BRO    CONV      3      NaN     NaN

"-" can be anything. “-”可以是任何东西。 Preferably a string saying "Nothing".最好是一个字符串，上面写着“Nothing”。 If they default to NaN values, I guess it's just a matter of replacing those with a string.如果它们默认为 NaN 值，我想这只是用字符串替换它们的问题。

EDIT :编辑：
Output:输出：

<class 'pandas.core.frame.DataFrame'>
Int64Index: 233 entries, 0 to 232
Data columns (total 3 columns):
BOROUGH                          233 non-null object
CONTRIBUTING FACTOR VEHICLE 1    233 non-null object
RCOUNT                           233 non-null int64
dtypes: int64(1), object(2)
memory usage: 7.3+ KB
None

<class 'pandas.core.frame.DataFrame'>
Int64Index: 83 entries, 0 to 82
Data columns (total 3 columns):
BOROUGH                83 non-null object
VEHICLE TYPE CODE 1    83 non-null object
VCOUNT                 83 non-null int64
dtypes: int64(1), object(2)
memory usage: 2.6+ KB
None

Answer 1

Perform a left type merge on columns 'A','B' for the lhs and 'A','D' for the rhs as these are your key columns对 lhs 的列 'A','B' 和 rhs 的 'A','D' 执行left类型merge ，因为这些是您的关键列

In [16]:
df.merge(df1, left_on=['A','B'], right_on=['A','D'], how='left')

Out[16]:
   A  B  C   D   E
0  1  1  3   1   5
1  1  2  2   2   3
2  1  3  1 NaN NaN
3  2  1  1   1   2
4  2  2  4 NaN NaN

EDIT编辑

Your question has changed but essentially here you can use combine_first :您的问题已更改，但基本上在这里您可以使用combine_first ：

In [26]:
merged = df.combine_first(df1)
merged

Out[26]:
  BOROUGH    CAUSE  CCOUNT  TCOUNT   TYPE
0     MAN  ALCOHOL       5       5  SPORT
1     MAN     SIZE       3       3   CONV
2     MAN  ALCOHOL       2       2  WAGON
3     BRO      NaN     NaN       2  SPORT
4     BRO      NaN     NaN       3   CONV

The NaN you see for 'CAUSE' is the string 'NaN', we can use fillna to replace these values:该NaN ，你看到的“原因”是字符串“男”，我们可以用fillna来代替这些值：

In [27]:
merged['CAUSE'] = merged['CAUSE'].fillna('Nothing')
merged['CCOUNT'] = merged['CCOUNT'].fillna(0)
merged

Out[27]:
  BOROUGH    CAUSE  CCOUNT  TCOUNT   TYPE
0     MAN  ALCOHOL       5       5  SPORT
1     MAN     SIZE       3       3   CONV
2     MAN  ALCOHOL       2       2  WAGON
3     BRO  Nothing       0       2  SPORT
4     BRO  Nothing       0       3   CONV

合并长度不等的熊猫数据框

问题描述

1 个解决方案

解决方案1
4 2016-04-26 10:01:39

合并长度不等的熊猫数据框

问题描述

1 个解决方案

解决方案1 4 2016-04-26 10:01:39

解决方案1
4 2016-04-26 10:01:39