简体   繁体   English

填写熊猫中缺少的布尔行

[英]Fill in missing boolean rows in Pandas

I have a MySQL query that is doing a groupby and returning data in the following form: 我有一个MySQL查询,正在执行groupby并以以下形式返回数据:

ID | ID | Boolean | 布尔| Count 计数

Sometimes there isn't data in the table for one of the boolean states, so data for a single ID might be returned like this: 有时表中没有用于布尔状态之一的数据,因此单个ID的数据可能会这样返回:

1234 | 1234 | 0 | 0 | 10 10

However I need it in this form for downstream analysis: 但是,我需要以这种形式进行下游分析:

1234 | 1234 | 0 | 0 | 10 10
1234 | 1234 | 1 | 1 | 0 0

with an index on [ID, Boolean]. 在[ID,Boolean]上具有索引。

From querying Google and SO, it seems like getting MySQL to do this transform is a bit of a pain. 从查询Google和SO来看,让MySQL进行此转换似乎有些痛苦。 Is there a simple way to do this in Pandas? 在熊猫中,有没有简单的方法可以做到这一点? I haven't been able to find anything useful in the docs or the Pandas cookbook. 我在文档或《熊猫食谱》中找不到任何有用的东西。

You can assume that I've already loaded the data into a Pandas dataframe with no indexes. 您可以假设我已经将数据加载到没有索引的Pandas数据框中。

Thanks. 谢谢。

I would set the index of your dataframe to the ID and Boolean columns, and the construct an new index from the Cartesian product of the unique values. 我将数据框的索引设置为IDBoolean列,然后根据唯一值的笛卡尔积构建一个新索引。

That would look like this: 看起来像这样:

import pandas
indexcols = ['ID', 'Boolean']

data = pandas.read_sql_query(engine, querytext)
full_index = pandas.MultiIndex.from_product(
    [data['ID'].unique(), [0, 1]], 
    names=indexcols
)

data = (
    data.set_index(indexcols)
        .reindex(full_index)
        .fillna(0)
        .reset_index()
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM