简体   繁体   中英

Fill in missing boolean rows in Pandas

I have a MySQL query that is doing a groupby and returning data in the following form:

ID | Boolean | Count

Sometimes there isn't data in the table for one of the boolean states, so data for a single ID might be returned like this:

1234 | 0 | 10

However I need it in this form for downstream analysis:

1234 | 0 | 10
1234 | 1 | 0

with an index on [ID, Boolean].

From querying Google and SO, it seems like getting MySQL to do this transform is a bit of a pain. Is there a simple way to do this in Pandas? I haven't been able to find anything useful in the docs or the Pandas cookbook.

You can assume that I've already loaded the data into a Pandas dataframe with no indexes.

Thanks.

I would set the index of your dataframe to the ID and Boolean columns, and the construct an new index from the Cartesian product of the unique values.

That would look like this:

import pandas
indexcols = ['ID', 'Boolean']

data = pandas.read_sql_query(engine, querytext)
full_index = pandas.MultiIndex.from_product(
    [data['ID'].unique(), [0, 1]], 
    names=indexcols
)

data = (
    data.set_index(indexcols)
        .reindex(full_index)
        .fillna(0)
        .reset_index()
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM