简体   繁体   中英

Pandas groupby std returning an empty dataframe

I have a pandas dataframe with the following columns of interest - productcode and price. I would like to see the standard deviation for products with the same code.

df.price = pd.to_numeric(df.price, errors='raise')

len(df[df.price.isna()])
Out: 0

df.groupby(['productcode'])['price'].describe()

    count   unique  top freq
productcode             

T1H5T   1   1   38  1
T1J0T   1   1   11  1
T1L6E   1   1   24  1
T1H0G9  1   1   69  1

As you can see most of the product codes only appear once. When I run describe, metrics such as std, mean, and others do not appear for some reason.

When I specifically request the standard deviation to be run, I get the following

df.groupby(['productcode'])['price'].std(ddof=0)
Out: _

df[['productcode', 'price']].groupby(['productcode']).mean()
Out: DataError: No numeric types to aggregate

Having gone through my error a bunch of times, apparently the error was that when I was using to_numeric, either with errors raised or coerced, it wasn't actually changing the data type of the column, it remained classified as object. Using

df.price = df.price.astype(float)

was able to fix that problem. That's also why when I tried using describe() method, it would only list metrics that apply to categorical variables. I greatly appreciate your answers @Laurent and @jezrael!

If use errors='raise' if there is non numeric value is returned same ouput, not numeric.

Need:

df.price = pd.to_numeric(df.price, errors='coerce')

So, given the following toy dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "productcode": {
            0: "T1H 4K3",
            1: "T1H6X",
            2: "T1H4K",
            3: "T1H4K",
            4: "T1H6X",
            5: "T1H 4K3",
        },
        "price": {0: "47", 1: "28", 2: "47", 3: "25", 4: "19", 5: "47"},
    }
)
print(df)
# Outputs
  productcode price
0     T1H 4K3    47
1       T1H6X    28
2       T1H4K    47
3       T1H4K    25
4       T1H6X    19
5     T1H 4K3    47

You can get the standard deviation for products with the same code like this:

print(df.groupby("productcode").std())
# Outputs
                 price
productcode
T1H 4K3       0.000000
T1H4K        15.556349
T1H6X         6.363961

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM