简体   繁体   English

比较多个 DataFrame 添加新的列填充匹配的二进制值

[英]Compare Multiple DataFrames Add New Column Fill With Binary Values For Matches

Let's say I have 2 dataframes.假设我有 2 个数据框。 One with merged dataframe of all instances and another with only unique instances of column id.一个合并了所有实例的 dataframe,另一个合并了唯一的列 id 实例。

df1 looks something like this: df1 看起来像这样:

|    id    |    category_name
|  459291  |    c1
|  349532  |    c1
|  459291  |    c2
|  719300  |    c1
|  349532  |    c3
|  459291  |    c4
|  649202  |    c2
|  459291  |    c5

df2 looks something like this: df2 看起来像这样:

|    id    |    category_name
|  459291  |    c1
|  349532  |    c1
|  719300  |    c1
|  649202  |    c2

What I want to do is create new columns on df2 for each value in column 'category_name' and output a 1 or 0 if unique value in 'id' has that matching 'category_name'.我想要做的是在 df2 上为“category_name”列和 output 中的每个值创建新列,如果“id”中的唯一值具有匹配的“category_name”,则为 1 或 0。 I would then drop the column 'category_name'.然后我会删除“category_name”列。 So, my expected output I'm looking for would be something like this所以,我正在寻找的预期 output 会是这样的

|    id    |    c1                |     c2          |     c3        |  c4 |
|  459291  |           1          |        1        |        1      |     1    |
|  349532  |           1          |        1        |        0      |     0    |
|  719300  |           1          |        0        |        0      |     0    |
|  649202  |           0          |        1        |        0      |     0    |

I feel like this could possibly be done using just the merged dataframe as well, but I'm not sure how I would drop the duplicates while keeping the new column values for each unique ID.我觉得这也可以仅使用合并的 dataframe 来完成,但我不确定如何在保留每个唯一 ID 的新列值的同时删除重复项。 any help is greatly appreciated!任何帮助是极大的赞赏!

This is a way to do it with pivot_table() for a reason I can't get around not having to add the aux column:这是一种使用pivot_table()来实现的方法,因为我无法避免不必添加aux列:

import pandas as pd
df = pd.DataFrame({'id':[459291,349532,459291,719300,349532,459291,649202,459291],
                   'playlist':['new','new','top','new','top','old','top','workout']})
df['aux'] = 1
new_df = pd.pivot_table(df,index='id',columns=['playlist'],aggfunc='count',values='aux').fillna(0).astype(int)
print(new_df)

Output: Output:

playlist  new  old  top  workout
id                              
349532      1    0    1        0
459291      1    1    1        1
649202      0    0    1        0
719300      1    0    0        0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM