比较多个 DataFrame 添加新的列填充匹配的二进制值

Question

Let's say I have 2 dataframes.假设我有 2 个数据框。 One with merged dataframe of all instances and another with only unique instances of column id.一个合并了所有实例的 dataframe，另一个合并了唯一的列 id 实例。

df1 looks something like this: df1 看起来像这样：

|    id    |    category_name
|  459291  |    c1
|  349532  |    c1
|  459291  |    c2
|  719300  |    c1
|  349532  |    c3
|  459291  |    c4
|  649202  |    c2
|  459291  |    c5

df2 looks something like this: df2 看起来像这样：

|    id    |    category_name
|  459291  |    c1
|  349532  |    c1
|  719300  |    c1
|  649202  |    c2

What I want to do is create new columns on df2 for each value in column 'category_name' and output a 1 or 0 if unique value in 'id' has that matching 'category_name'.我想要做的是在 df2 上为“category_name”列和 output 中的每个值创建新列，如果“id”中的唯一值具有匹配的“category_name”，则为 1 或 0。 I would then drop the column 'category_name'.然后我会删除“category_name”列。 So, my expected output I'm looking for would be something like this所以，我正在寻找的预期 output 会是这样的

|    id    |    c1                |     c2          |     c3        |  c4 |
|  459291  |           1          |        1        |        1      |     1    |
|  349532  |           1          |        1        |        0      |     0    |
|  719300  |           1          |        0        |        0      |     0    |
|  649202  |           0          |        1        |        0      |     0    |

I feel like this could possibly be done using just the merged dataframe as well, but I'm not sure how I would drop the duplicates while keeping the new column values for each unique ID.我觉得这也可以仅使用合并的 dataframe 来完成，但我不确定如何在保留每个唯一 ID 的新列值的同时删除重复项。 any help is greatly appreciated!任何帮助是极大的赞赏！

Answer 1

This is a way to do it with pivot_table() for a reason I can't get around not having to add the aux column:这是一种使用pivot_table()来实现的方法，因为我无法避免不必添加aux列：

import pandas as pd
df = pd.DataFrame({'id':[459291,349532,459291,719300,349532,459291,649202,459291],
                   'playlist':['new','new','top','new','top','old','top','workout']})
df['aux'] = 1
new_df = pd.pivot_table(df,index='id',columns=['playlist'],aggfunc='count',values='aux').fillna(0).astype(int)
print(new_df)

Output: Output：

playlist  new  old  top  workout
id                              
349532      1    0    1        0
459291      1    1    1        1
649202      0    0    1        0
719300      1    0    0        0

比较多个 DataFrame 添加新的列填充匹配的二进制值

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-02-14 00:55:41

比较多个 DataFrame 添加新的列填充匹配的二进制值

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-02-14 00:55:41

解决方案1
2 已采纳 2020-02-14 00:55:41