简体   繁体   中英

How can I fill up and fill up the missing values of each group in Dataframe using Python?

This is an example of the dataframe:

For example,

df = 

     Name         Type               Price 

0    gg         apartment            8   
1    hh         apartment            4
2    tty        apartment            0
3    ttyt       None                 6
4    re         house                6 
5    ew         house                2
6    rr         house                0
7    tr         None                 5
8    mm         None                 0

I worked on converting the "unknown" to "NoInfo" in "Type":

import pandas as pd import numpy as np from scipy.stats import zscore

df = pd.read_csv("C:/Users/User/Desktop/properties.csv")

df.Type.fillna(value=pd.np.nan, inplace=True)

df['Type'].fillna(value='NoInfo', inplace = True)

The dataframe is like below:

For example,

df = 
     Name         Type               price 

0    gg         apartment            8   
1    hh         apartment            4
2    tty        apartment            0
3    ttyt       NoInfo               6
4    re         house                6 
5    ew         house                2
6    rr         house                0
7    tr         NoInfo               5
8    mm         NoInfo               0

After that, I replaced the "0" values to the average value of the prices of each group "Apartment", "House" and "NoInfo" and take the z-score of each group.

df['price'] = df['price'].replace(0, np.nan)

df['price'] = pd.to_numeric(df.price, errors='coerce')

df['price'] = df.groupby('Type')['price'].transform(lambda x : x.mean())

df['price_zscore'] = df[['price']].apply(zscore)

After running this code, all values of the prices of all property groups have been changed and all z-score values in independent variable 'price_zscore' are "NaN".

I am looking to get the average value of the price for each property group "Apartments and houses" in "Type" with replacing '0' in independent variable 'price' with the average of each property group (apartments, houses).

For example, the "0" values in independent variable "price" in the property group "Apartment" in independent variable "Type" has to be replaced with the average of prices the property group "Apartment", the "0" values in "price" in property group "house" has to be replaced with the average of prices the property group "house" and the "0" values in "price" in property group "NoInfo" has to be replaced with the average of prices the property group "NoInfo"

df = Name Type Price

0    gg         apartment            8   
1    hh         apartment            4
2    tty        apartment            6   # (8+4)/2 = 6
3    ttyt       NoInfo               6
4    re         house                6 
5    ew         house                2
6    rr         house                4  # (6+2)/2 = 4
7    tr         NoInfo               5
8    mm         NoInfo               0

After that, I am looking to get the "Z-score" of each property group. For example, I am looking to get the z-score of the property group "Apartment", the Zscore of the property group "House" and the zscore of the "NoInfo" group and put all z-scores of all groups in independent varieble 'price_zscore'.

I need really your help to fix the code above.

In pandas you can replace missing values with NaN using replace() . Then you can impute them with the group mean. Eventually, you can compute the z-score of the price using the function zscore from the stats module of scipy .

Here is the code:

import numpy as np
import pandas as pd
from scipy.stats import zscore


df = pd.read_csv('./data.csv')

df['price'] = df['price'].replace(0, np.nan)
df['price'] = df.groupby('type').transform(lambda x: x.fillna(x.mean()))

df['price_zscore'] = df[['price']].apply(zscore) # You need to apply score function on a DataFrame—not a Series.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM