简体   繁体   中英

Pandas groupby does not return the expected output

I have a program that applies pd.groupby().agg('sum') to a bunch of different pandas.DataFrame objects. Those dataframes are all in the same format. The code works on all dataframes except for this dataframe (picture: df1) which produces funny result (picture: result1).

I tried:

df = df.groupby('Mapping')[list(df)].agg('sum')

This code works for df2 but not for df1 .

df1

结果1

The code works fine for other dataframes (pictures: df2, result2)

df2 结果2

Could somebody tell me why it turned out that way for df1?

The problem in the first dataframe is the commas in variables that should be numeric and i think that python is not recognizing the columns as numeric. Did you try to replace the commas?

It seems that in df1 , most of the numeric columns are actually str . You can tell by the commas ( , ) that delimit thousands. Try:

df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: str(x).replace(",",""))
df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: pd.to_numeric(x))

The first line removes the commas from all the second, third, etc. columns. The second line turns them to numeric data types. This could actually be a one-liner, but I wrote it in two lines for readability's sake.

Once this is done, you can try your groupby code.

It's good practice to check the data types of your columns as soon as you load them. You can do so with df1.dtypes .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM