如何將 F 統計量和 P 值放入表格中？

Question

如何將這些代碼簡化為 for 循環並創建一個表格來顯示特征的 F 統計量和 P 值。

print(scipystats.f_oneway(df_data.loc[df_data["SaleCondition"] == 'Normal'].SalePrice, 
                          df_data.loc[df_data["SaleCondition"] == 'Abnorml'].SalePrice,
                          df_data.loc[df_data["SaleCondition"] == 'Partial'].SalePrice, 
                          df_data.loc[df_data["SaleCondition"] == 'AdjLand'].SalePrice, 
                          df_data.loc[df_data["SaleCondition"] == 'Alloca'].SalePrice, 
                          df_data.loc[df_data["SaleCondition"] == 'Family'].SalePrice))


>>>F_onewayResult(statistic=45.57842830969571, pvalue=7.988268404991176e-44)

print(scipystats.f_oneway(df_data.loc[df_data["Fence"] == 'MnPrv'].SalePrice,
               df_data.loc[df_data["Fence"] == 'GdWo'].SalePrice,
               df_data.loc[df_data["Fence"] == 'GdPrv'].SalePrice,
               df_data.loc[df_data["Fence"] == 'MnWw'].SalePrice))
>>>

F_onewayResult(statistic=4.948158647146986, pvalue=0.002312645635631918)

如何創建表格並提取 F 統計量和 P 值作為相應列的輸入？ 並對具有最高 F 統計值的變量進行升序排序？

圖像鏈接 - 要創建的表

已編輯 - 哪個結果更准確？

我的方法的結果：

               F-statistics        P-value
ExterQual        443.334831  1.439551e-204
KitchenQual      407.806352  3.032213e-192
BsmtQual         392.913506  9.610615e-186
GarageFinish     250.962467   1.199117e-93
MasVnrType       111.672380   4.793331e-65
Foundation       100.253851   5.791895e-91
CentralAir        98.305344   1.809506e-22
HeatingQC         88.394462   2.667062e-67
Neighborhood      71.784865  1.558600e-225
GarageType        71.522123   1.247154e-66
BsmtExposure      70.887984   1.022671e-42
BsmtFinType1      67.602175   1.807731e-63
SaleCondition     45.578428   7.988268e-44
MSZoning          43.840282   8.817634e-35
PavedDrive        42.024179   1.803569e-18
LotShape          40.132852   6.447524e-25
Alley             35.562060   4.899826e-08
SaleType          28.863054   5.039767e-42
FireplaceQu       24.398929   5.016300e-19
Electrical        23.067673   1.663249e-18
HouseStyle        19.595001   3.376777e-25
Exterior1st       18.611743   2.586089e-43
RoofStyle         17.805497   3.653523e-17
Exterior2nd       17.500840   4.842186e-43
BsmtCond          14.030600   5.136901e-09
BldgType          13.011077   2.056736e-10
LandContour       12.850188   2.742217e-08
GarageQual         9.570389   1.240803e-07
GarageCond         9.541161   1.309714e-07
ExterCond          8.798714   5.106681e-07
LotConfig          7.809954   3.163167e-06
RoofMatl           6.727305   7.231445e-08
Condition1         6.118017   8.904549e-08
Fence              4.948159   2.312646e-03
Heating            4.259819   7.534721e-04
Functional         4.057875   4.841697e-04
BsmtFinType2       2.702450   1.941009e-02
Street             2.459290   1.170486e-01
MiscFeature        2.157324   1.047276e-01
Condition2         2.073899   4.342566e-02
LandSlope          1.958817   1.413964e-01
PoolQC             1.627469   3.039853e-01
Utilities          0.298804   5.847168e-01
MSSubClass              NaN            NaN
MoSold                  NaN            NaN
YrSold                  NaN            NaN

@kitman0804 方法的結果：

def anova(data, x, y):
    x_val = data[x].unique()
    fstat = scipy.stats.f_oneway(*[df_data[y][data[x].isin([x_v])] for x_v in x_val])
    tbl = pd.DataFrame({'F-statistics': [fstat.statistic], 'P-value': [fstat.pvalue]})
    tbl.index = [x]
    return tbl

f2_table = pd.concat([anova(categorical_data, x, 'SalePrice') for x in categorical_data.columns])

               F-statistics        P-value
ExterQual        443.334831  1.439551e-204
KitchenQual      407.806352  3.032213e-192
BsmtQual         316.148635  8.158548e-196
GarageFinish     213.867028  6.228747e-115
FireplaceQu      121.075121  2.971217e-107
Foundation       100.253851   5.791895e-91
CentralAir        98.305344   1.809506e-22
HeatingQC         88.394462   2.667062e-67
MasVnrType        84.672201   1.054025e-64
GarageType        80.379992   6.117026e-87
Neighborhood      71.784865  1.558600e-225
BsmtFinType1      64.688200   2.386358e-71
BsmtExposure      63.939761   7.557758e-50
SaleCondition     45.578428   7.988268e-44
MSZoning          43.840282   8.817634e-35
PavedDrive        42.024179   1.803569e-18
LotShape          40.132852   6.447524e-25
MSSubClass        33.732076   8.662166e-79
SaleType          28.863054   5.039767e-42
GarageQual        25.776093   5.388762e-25
GarageCond        25.750153   5.711746e-25
BsmtCond          19.708139   8.195794e-16
HouseStyle        19.595001   3.376777e-25
Exterior1st       18.611743   2.586089e-43
Electrical        18.460192   8.226925e-18
RoofStyle         17.805497   3.653523e-17
Exterior2nd       17.500840   4.842186e-43
Alley             15.176614   2.996380e-07
Fence             13.433276   9.379977e-11
BldgType          13.011077   2.056736e-10
LandContour       12.850188   2.742217e-08
PoolQC            10.509853   7.700989e-07
ExterCond          8.798714   5.106681e-07
LotConfig          7.809954   3.163167e-06
BsmtFinType2       7.565378   5.225649e-08
RoofMatl           6.727305   7.231445e-08
Condition1         6.118017   8.904549e-08
Heating            4.259819   7.534721e-04
Functional         4.057875   4.841697e-04
MiscFeature        2.593622   3.500367e-02
Street             2.459290   1.170486e-01
Condition2         2.073899   4.342566e-02
LandSlope          1.958817   1.413964e-01
MoSold             0.957865   4.833523e-01
YrSold             0.645525   6.300888e-01
Utilities          0.298804   5.847168e-01

Answer 1

F-statistics 和 P-value 分別存儲在<class 'scipy.stats.stats.F_onewayResult'>中的屬性statistics和pvalue中。

您可以只提取里面的值，然后創建表。 下面是一個快速示例。

import scipy.stats
import pandas as pd

tillamook = [0.0571, 0.0813, 0.0831, 0.0976, 0.0817, 0.0859, 0.0735, 0.0659, 0.0923, 0.0836]
newport = [0.0873, 0.0662, 0.0672, 0.0819, 0.0749, 0.0649, 0.0835, 0.0725]
petersburg = [0.0974, 0.1352, 0.0817, 0.1016, 0.0968, 0.1064, 0.105]
magadan = [0.1033, 0.0915, 0.0781, 0.0685, 0.0677, 0.0697, 0.0764, 0.0689]
tvarminne = [0.0703, 0.1026, 0.0956, 0.0973, 0.1039, 0.1045]

fstat = scipy.stats.f_oneway(tillamook, newport, petersburg, magadan, tvarminne)
tbl = pd.DataFrame({'F-statistics': [fstat.statistic], 'P-value': [fstat.pvalue]})
tbl.index = ['OverallQual']

print(tbl)
#              F-statistics   P-value
# OverallQual      7.121019  0.000281

如果您有多個 F-test 正在進行，您可以使用函數和 for 循環。 下面是一個例子，

df = pd.DataFrame({'a': [0,0,0,1,1,1,2,2,2], 'b': [0,1,1,0,0,1,1,0,0], 'outcome': [1,2,3,4,5,6,7,8,9]})

def anova(data, x, y, drop_nan=True):
    # Unique values in the column
    if drop_nan:
        x_val = data[x].dropna().unique()
    else:
        x_val = data[x].unique()
    # F-test
    fstat = scipy.stats.f_oneway(*[data[y][data[x].isin([x_v])] for x_v in x_val])
    # Tabulate the results
    tbl = pd.DataFrame({'F-statistics': [fstat.statistic], 'P-value': [fstat.pvalue]})
    tbl.index = ['{:}~{:}'.format(y, x)]
    return tbl

f_table = pd.concat([anova(df, x, 'outcome') for x in ['a', 'b']])
print(f_table)

#            F-statistics   P-value
# outcome~a     27.000000  0.001000
# outcome~b      0.216495  0.655852

如何將 F 統計量和 P 值放入表格中？

問題描述

1 個解決方案

解決方案1
0 2019-08-01 06:36:07

如何將 F 統計量和 P 值放入表格中？

問題描述

1 個解決方案

解決方案1 0 2019-08-01 06:36:07

解決方案1
0 2019-08-01 06:36:07