简体   繁体   English

如何使用Python填充和填充Dataframe中每个组的缺失值?

[英]How can I fill up and fill up the missing values of each group in Dataframe using Python?

This is an example of the dataframe: 这是数据框的示例:

For example, 例如,

df = 

     Name         Type               Price 

0    gg         apartment            8   
1    hh         apartment            4
2    tty        apartment            0
3    ttyt       None                 6
4    re         house                6 
5    ew         house                2
6    rr         house                0
7    tr         None                 5
8    mm         None                 0

I worked on converting the "unknown" to "NoInfo" in "Type": 我致力于将“类型”中的“未知”转换为“ NoInfo”:

import pandas as pd import numpy as np from scipy.stats import zscore 从scipy.stats中将pandas作为pd导入,将numpy作为np导入

df = pd.read_csv("C:/Users/User/Desktop/properties.csv") df = pd.read_csv(“ C:/Users/User/Desktop/properties.csv”)

df.Type.fillna(value=pd.np.nan, inplace=True) df.Type.fillna(value = pd.np.nan,inplace = True)

df['Type'].fillna(value='NoInfo', inplace = True) df ['Type']。fillna(value ='NoInfo',inplace = True)

The dataframe is like below: 数据框如下所示:

For example, 例如,

df = 
     Name         Type               price 

0    gg         apartment            8   
1    hh         apartment            4
2    tty        apartment            0
3    ttyt       NoInfo               6
4    re         house                6 
5    ew         house                2
6    rr         house                0
7    tr         NoInfo               5
8    mm         NoInfo               0

After that, I replaced the "0" values to the average value of the prices of each group "Apartment", "House" and "NoInfo" and take the z-score of each group. 之后,我将“ 0”值替换为每个组“公寓”,“房屋”和“ NoInfo”的价格平均值,并取每个组的z得分。

df['price'] = df['price'].replace(0, np.nan) df ['price'] = df ['price']。replace(0,np.nan)

df['price'] = pd.to_numeric(df.price, errors='coerce') df ['price'] = pd.to_numeric(df.price,errors ='coerce')

df['price'] = df.groupby('Type')['price'].transform(lambda x : x.mean()) df ['price'] = df.groupby('Type')['price']。transform(lambda x:x.mean())

df['price_zscore'] = df[['price']].apply(zscore) df ['price_zscore'] = df [['price']]。apply(zscore)

After running this code, all values of the prices of all property groups have been changed and all z-score values in independent variable 'price_zscore' are "NaN". 运行此代码后,所有属性组的价格的所有值均已更改,并且自变量'price_zscore'中的所有z得分值均为“ NaN”。

I am looking to get the average value of the price for each property group "Apartments and houses" in "Type" with replacing '0' in independent variable 'price' with the average of each property group (apartments, houses). 我希望在“类型”中获取每个属性组“公寓和房屋”的平均价格,将独立变量“价格”中的“ 0”替换为每个属性组(公寓,房屋)的平均值。

For example, the "0" values in independent variable "price" in the property group "Apartment" in independent variable "Type" has to be replaced with the average of prices the property group "Apartment", the "0" values in "price" in property group "house" has to be replaced with the average of prices the property group "house" and the "0" values in "price" in property group "NoInfo" has to be replaced with the average of prices the property group "NoInfo" 例如,必须将属性变量“类型”中属性组“公寓”中自变量“价格”中的“ 0”值替换为属性组“公寓”中的平均价格,而“属性组“房屋”中的“价格”必须替换为属性组“房屋”的平均价格,属性组“ NoInfo”中“价格”中的“ 0”值必须替换为属性的平均价格组“ NoInfo”

df = Name Type Price df =名称类型价格

0    gg         apartment            8   
1    hh         apartment            4
2    tty        apartment            6   # (8+4)/2 = 6
3    ttyt       NoInfo               6
4    re         house                6 
5    ew         house                2
6    rr         house                4  # (6+2)/2 = 4
7    tr         NoInfo               5
8    mm         NoInfo               0

After that, I am looking to get the "Z-score" of each property group. 在那之后,我希望获得每个属性组的“ Z分数”。 For example, I am looking to get the z-score of the property group "Apartment", the Zscore of the property group "House" and the zscore of the "NoInfo" group and put all z-scores of all groups in independent varieble 'price_zscore'. 例如,我要获取属性组“公寓”的z分数,属性组“房屋”的zscore和“ NoInfo”组的zscore,并将所有组的所有z分数放入独立变量中“ price_zscore”。

I need really your help to fix the code above. 我确实需要您的帮助来修复上面的代码。

In pandas you can replace missing values with NaN using replace() . pandas您可以使用replace()将缺失的值替换为NaN。 Then you can impute them with the group mean. 然后,您可以使用组均值来估算它们。 Eventually, you can compute the z-score of the price using the function zscore from the stats module of scipy . 最终,您可以使用zscorestats模块中的scipy函数来计算价格的z分数。

Here is the code: 这是代码:

import numpy as np
import pandas as pd
from scipy.stats import zscore


df = pd.read_csv('./data.csv')

df['price'] = df['price'].replace(0, np.nan)
df['price'] = df.groupby('type').transform(lambda x: x.fillna(x.mean()))

df['price_zscore'] = df[['price']].apply(zscore) # You need to apply score function on a DataFrame—not a Series.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM