简体   繁体   中英

Python Pandas groupby removes columns

data_c["dropoff_district"] = "default value"      
data_c["distance"] = "default value"      #Formed a new column named distance for geocoder
data_c["time_of_day"] = "default value"      #Formed a new column named time of the day for timestamps

So I create these columns at the start of the project for plotting and data manipulaton.After I edited and filled these columns with certain values, I wanted to perform a groupby operation on data_c.

avg_d = data_c.groupby(by = 'distance').sum().reset_index()

Although when I perform a groupby on data_c, I somehow lose my 'time_of_day' and 'dropoff_district' columns in avg_d. How can I solve this issue?

The problem is that Pandas doesn't know how to add date/time objects together. Thus, when you tell Pandas to groupby and then sum, it throws out the columns it doesn't know what to do with. Example,

df = pd.DataFrame([['2019-01-01', 2, 3], ['2019-02-02', 2, 4], ['2019-02-03', 3, 5]], 
             columns=['day', 'distance', 'duration'])
df.day = pd.to_datetime(df.day)

If I just run your query, I'd get,

>>> df.groupby('distance').sum()
          duration
distance          
2                7
3                5

You can fix this by telling Pandas you want to do something different with those columns---for example, take the first value,

df.groupby('distance').agg({
    'duration': 'sum',
    'day': 'first'
})

which brings them back,

          duration        day
distance                     
2                7 2019-01-01
3                5 2019-02-03

Groupby does not remove your columns. The sum() call does. If those columns are not numeric, you will not retain them after sum() .

So how do you like to retain columns 'time_of_day' and 'dropoff_district'? Assume you still want to keep them when they are distinct, put them into groupby :

data_c.groupby(['distance','time_of_day','dropoff_district']).sum().reset_index()

otherwise, you will have multiple different 'time_of_day' for the same 'distance'. You need to massage your data first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM