[英]selecting columns for scatter plot from csv
我正在嘗試從大型 csv 制作散點矩陣,並且散點矩陣在較小的文件上工作正常,但是將其應用於較大的文件時,我無法獲得正確的列 output,它只返回 2 個散點圖之一和當我嘗試增加從哪里開始選擇的數量時,會在 output 上引發錯誤:
Traceback (most recent call last):
File "C:\Users\Kinkerman\Downloads\Uni\Data Analytics\Python\Python files wk 5\scatter_matrix2.py", line 8, in <module>
scatter_matrix(data.loc[:, "V5":"V8"], diagonal="kde")
File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\plotting\_misc.py", line 128, in scatter_matrix
return plot_backend.scatter_matrix(
File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\plotting\_matplotlib\misc.py", line 50, in scatter_matrix
fig, axes = create_subplots(naxes=naxes, figsize=figsize, ax=ax, squeeze=False)
File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\plotting\_matplotlib\tools.py", line 267, in create_subplots
ax0 = fig.add_subplot(nrows, ncols, 1, **subplot_kw)
File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\figure.py", line 772, in add_subplot
ax = subplot_class_factory(projection_class)(self, *args, **pkw)
File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\axes\_subplots.py", line 36, in __init__
self.set_subplotspec(SubplotSpec._from_subplot_args(fig, args))
File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\gridspec.py", line 597, in _from_subplot_args
gs = GridSpec._check_gridspec_exists(figure, rows, cols)
File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\gridspec.py", line 225, in _check_gridspec_exists
return GridSpec(nrows, ncols, figure=figure)
File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\gridspec.py", line 385, in __init__
super().__init__(nrows, ncols,
File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\gridspec.py", line 52, in __init__
raise ValueError(
ValueError: Number of columns must be a positive integer, not 0ValueError: Number of columns must be a positive integer, not 0
到目前為止,這是我的代碼:
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
#read from file and rename columns "Vi"
data = pd.read_csv('principal_offence_category_april_2014.csv')
data.columns = ["V"+str(i) for i in range(1, len(data.columns)+1)]
#select columns and plot in the scatter matrix
scatter_matrix(data.loc[:, "V5":"V8"], diagonal="kde")
plt.tight_layout()
plt.show()
這是我正在使用的文件 >>> https://drive.google.com/file/d/1X6PN_EtMpspazVxI6d3h3J08NWQIb11Y/view?usp=sharing
您可以使用thousands
參數來指定使用,
並使用na_values
將-
轉換為NaN
值。 然后,您需要一種將%
條目轉換為浮點數的方法。 例如:
data['V5'] = data['V5'].str.rstrip('%').astype('float')
這可以應用於某些百分比列:
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
#read from file and rename columns "Vi"
data = pd.read_csv('principal_offence_category_april_2014.csv', thousands=',', na_values=['-'])
data.columns = [f"V{i}" for i in range(1, len(data.columns)+1)]
cols = [f'V{col}' for col in range(3, 12, 2)] # e.g. percentage columns V3, V5, V7, V9, V11
for col in cols:
data[col] = data[col].str.rstrip('%').astype('float')
#select columns and plot in the scatter matrix
scatter_matrix(data[cols], diagonal="kde")
plt.tight_layout()
plt.show()
它將為您的文件提供此 output:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.