Python - Pandas 组合了两个提供不同值的数据帧

Question

I have two diffent dataframes with two columns and i want to merge them + get them sum of column B. The problem is dataframe 1 have some data, that i want to keep.我有两个不同的数据框，有两列，我想合并它们 + 得到它们 B 列的总和。问题是 dataframe 1 有一些数据，我想保留。 I'll write an example so it make sense我会写一个例子，所以它是有意义的

Dataframe 1 Dataframe 1

Columns A Column B
House     walls,doors,rooms
Animal    Legs,nose,eyes
car       tires,engine

Dataframe 2 Dataframe 2

Column A  Column B
House     windows,kitchen
Bike      wheels,bicycle chain

Desired result期望的结果

Column A  Column B
House     walls,doors,rooms,windows,kitchen
Animal    Legs,nose,eyes
Car       tires,engine
Bike      wheels,bicycle chain

The merge function doesnt help and i tried to use pd.concat and then somehow aggregate data but didnt help either.合并 function 没有帮助，我尝试使用 pd.concat 然后以某种方式聚合数据，但也没有帮助。 Someone got an idea of how to solve it?有人知道如何解决它吗？

Answer 1

pd.concat([df1, df2]).groupby("Column A")["Column B"].apply(', '.join).reset_index()

After concating your dataframes, group your values by Column A, then use apply to concat the grouped strings in column B, and finally restore Column A with reset_index() .连接数据框后，按 A 列对值进行分组，然后使用apply将 B 列中的分组字符串连接起来，最后使用reset_index()恢复 A 列。

Edit: expansion on comments编辑：评论扩展

To remove duplicates, you can use the set data structure, which only keeps a single version of each element you put into it.要删除重复项，您可以使用set数据结构，它只保留您放入其中的每个元素的单个版本。 For each row x, split the words, then convert the list of words into a set:对于每一行 x，拆分单词，然后将单词列表转换为一个集合：

df4 = df3["Column B"].apply(lambda x: set(x.split(", "))).reset_index()

Note that after this, your column B will contain sets.请注意，在此之后，您的 B 列将包含集合。 I'll let you figure out how to reconvert from a set to a string using a similar pattern.我将让您弄清楚如何使用类似的模式从集合重新转换为字符串。

Python - Pandas 组合了两个提供不同值的数据帧

问题描述

1 个解决方案

解决方案1
4 已采纳 2020-05-03 20:26:03

Python - Pandas 组合了两个提供不同值的数据帧

问题描述

1 个解决方案

解决方案1 4 已采纳 2020-05-03 20:26:03

解决方案1
4 已采纳 2020-05-03 20:26:03