简体   繁体   English

如何创建切片 CSV 文件的散点图矩阵?

[英]How do I create a scatterplot matrix of a sliced CSV file?

I'm pretty new to python and this Pandas stuff.我对 python 和这个 Pandas 还很陌生。 I'm trying to learn machine learning as a hobby more than anything.我最想把学习机器学习作为一种爱好。 This is what I have so far.这是我到目前为止所拥有的。

I can't figure out how to stop the code and put the error I get.我不知道如何停止代码并输入我得到的错误。

With this code I keep getting this error.使用此代码,我不断收到此错误。 How can I fix it?我该如何解决?

I'm using Introduction to Statistical Learning and doing the exercises, but instead of using R I'm using Python if that helps我正在使用统计学习简介并进行练习,但如果有帮助,我使用的不是 R,而是 Python

 File "college.py", line 12, in <module> pd.plotting.scatter_matrix(data1) File "/Users//Library/Python/3.7/lib/python/site-packages/pandas/plotting/_misc.py", line 134, in scatter_matrix **kwargs, File "/Users//Library/Python/3.7/lib/python/site-packages/pandas/plotting/_matplotlib/misc.py", line 30, in scatter_matrix fig, axes = _subplots(naxes=naxes, figsize=figsize, ax=ax, squeeze=False) File "/Users//Library/Python/3.7/lib/python/site-packages/pandas/plotting/_matplotlib/tools.py", line 231, in _subplots ax0 = fig.add_subplot(nrows, ncols, 1, **subplot_kw) File "/Users//Library/Python/3.7/lib/python/site-packages/matplotlib/figure.py", line 1414, in add_subplot a = subplot_class_factory(projection_class)(self, *args, **kwargs) File "/Users//Library/Python/3.7/lib/python/site-packages/matplotlib/axes/_subplots.py", line 59, in __init__ f"num must be 1 <= num <= {rows*cols}, not {num}") ValueError: num must be 1 <= num <= 0, not 1
import matplotlib.pyplot as plt 
import pandas as pd 

data = pd.read_csv('college.csv', index_col = 0) 

# Summarize Dataset
print(data.describe())

# Plot first 10 columns into scatterplot matrix 
data1 = data.iloc[0:10]
pd.plotting.scatter_matrix(data1)```

you can use the kind argument within pandas plot function您可以在 pandas plot function 中使用 kind 参数

data1.plot(kind='scatter',x='x_var',y='y_var')

It is hard to answer this without seeing your full code.如果没有看到您的完整代码,很难回答这个问题。 However here is a simple example using similar data that might help.但是,这是一个使用类似数据的简单示例,可能会有所帮助。

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plot

data = pd.read_csv('college.csv',index_col = 0) 
data = data[:10]
print(data)
print(data.index)
print(data.columns)

dataFrame = pd.DataFrame(data=data, columns=['Private', 'Apps', 'Accept', 'Enroll', 'Top10perc', 'Top25perc','F.Undergrad', 'P.Undergrad', 'Outstate', 'Room.Board', 'Books','Personal', 'PhD', 'Terminal', 'S.F.Ratio', 'perc.alumni', 'Expend','Grad.Rate']);

dataFrame.plot.scatter(x='Apps', y='Expend', title= "Scatter plot between two columns of a multi-column DataFrame");

plot.show(block=True);

Link to fully worked example链接到完整的示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM