简体   繁体   English

错误:ValueError:仅当我尝试使用statsmodels进行散点图时,才必须将布尔值传递给DataFrame

[英]ERROR: ValueError: Must pass DataFrame with boolean values only when I try to do a scatter plot using statsmodels

I'm new to Python and the relative question I read didn't make much sense to me. 我是Python的新手,我读过的相关问题对我来说没有多大意义。 I have the following issue. 我有以下问题。 I want to use Python to do multiple regression and I am trying statsmodels. 我想使用Python进行多元回归,并尝试使用statsmodels。 In this case I want to do a scatter plot. 在这种情况下,我想做一个散点图。

Sample of my data: 我的数据样本:

ID  order  V1     V2    E1  E2  E3   M
103  1    ECA    TEXT    7   3   5   7
105  1    ECA    TEXT    3   7   4   5
107  1    ECA    TEXT    7   7   7   4
109  1    ECA    TEXT    6   6   6   3

I want to do a multiple regression with E1-E3 as my IVs and the mean score of M as my DV. 我想用E1-E3作为我的IV,将M的平均得分作为我的DV进行多元回归。

This is how I loaded my data. 这就是我加载数据的方式。

myRegressionData = pd.read_csv('C:/Users/user/Desktop/Folder 1/Python/Regression data file.csv')

These are my x and y: 这些是我的x和y:

X_sk = myRegressionData[[col for col in myRegressionData.columns if col[:8] == 'E']]

Y = myRegressionData[['M{}'.format(ii) for ii in range(1, 19)]]
y = np.mean(Y, axis=1)

and this the code where I get the error: 这是我得到错误的代码:

myRegressionData.plot(kind='scatter',x = X_sk, y=np.mean(Y, axis=1))

returns 回报

ValueError : Must pass DataFrame with boolean values only ValueError :必须仅通过布尔值传递DataFrame

myRegressionData.info() 

returns 回报

RangeIndex: 90 entries, 0 to 89 Columns: 146 entries, IDOpenEndedResponse to EngagingAA dtypes: float64(10), int64(134), object(2) memory usage: 102.7+ KB RangeIndex:90个条目,0到89列:146个条目,IDOpenEndedResponse对EngagingAA d类型:float64(10),int64(134),object(2)内存使用量:102.7+ KB

In the following: 在下面的:

myRegressionData.plot(kind='scatter',x = X_sk, y=np.mean(Y, axis=1))

x and y expect column names , or indecies. xy 期望列名或减。 X_sk and np.mean(Y, axis=1) is data. X_sknp.mean(Y, axis=1)是数据。 Supply the column names or use your plotter directly. 提供列名或直接使用您的绘图仪。


Example where we use matplotlib : 我们使用matplotlib的示例:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

myRegressionData = pd.DataFrame([
    {'a0': 4, 'a1': 3, 'b0': 2, 'b1': 1}, 
    {'a0': 3, 'a1': 1, 'b0': 4, 'b1': 1}, 
    {'a0': 1, 'a1': 2, 'b0': 3, 'b1': 1}
])

X_sk = myRegressionData[[col for col in myRegressionData.columns if col[:1] == 'b']]
Y = myRegressionData[['a{}'.format(ii) for ii in range(0,2)]]
plt.scatter(X_sk['b0'], np.mean(Y, axis=1))

The example should be a simplified version of what you're doing. 该示例应该是您正在做的事情的简化版本。


If you insist on using the pandas DataFrame plotter you can do something like this: 如果您坚持使用pandas DataFrame绘图仪,则可以执行以下操作:

y = pd.DataFrame(np.mean(Y, axis=1), columns=['y'])
df = pd.concat([X_sk, y], axis=1)
df.plot(kind='scatter', x='b0', y='y')

Having many X values, but only one y value and differentiate using colors: 具有许多X值,但只有一个y值,并使用颜色进行区分:

X_sk = myRegressionData[[col for col in myRegressionData.columns if col[:1] == 'b']]
Y = myRegressionData[['a{}'.format(ii) for ii in range(0,2)]]
y = pd.DataFrame(np.mean(Y, axis=1))
yy = pd.concat([y, y])
plt.scatter(X_sk, yy, c=['b', 'r'])

Final alternative using scatter_matrix : 使用scatter_matrix最终替代方案:

y = pd.DataFrame(np.mean(Y, axis=1), columns=['y'])
df = pd.concat([X_sk, y], axis=1)
scatter_matrix(df, alpha=0.2, figsize=(6, 6))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ValueError:必须仅传递带有布尔值的 DataFrame - ValueError: Must pass DataFrame with boolean values only ValueError: Must pass DataFrame only with boolean values - 将 Pandas Columns 转换为 Numeric - ValueError: Must pass DataFrame with boolean values only - When converting Pandas Columns to Numeric 尝试绘制散点图时,出现“ ValueError:x和y必须大小相同”错误, - I am getting an error as 'ValueError: x and y must be the same size' when trying to plot a scatter plot 熊猫:必须仅使用asfreq传递带有布尔值的DataFrame - Pandas: Must pass DataFrame with boolean values only using as asfreq Python 3.X Pandas - ValueError:必须仅传递带有布尔值的 DataFrame - Python 3.X Pandas - ValueError: Must pass DataFrame with boolean values only 出现错误:“ValueError:如果使用所有标量值,则必须传递索引”将 ndarray 转换为 pandas Dataframe - Getting Error: “ValueError: If using all scalar values, you must pass an index” when converting ndarray to pandas Dataframe 创建 dataframe 时出错“*** ValueError: If using all scalar values, you must pass an index” - Error when creating dataframe "*** ValueError: If using all scalar values, you must pass an index" 必须在71次迭代后才传递带有布尔值的DataFrame - Must pass DataFrame with boolean values only after 71 iterations 必须传递具有 boolean 值的数据帧吗? - Must pass Data Frame with boolean values only error? 尝试将字典传递给 pandas dataframe 时,如何解决:ValueError: If using all scalar values, you must pass an index - When attempting to pass a dictionary to a pandas dataframe, how to resolve: ValueError: If using all scalar values, you must pass an index
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM