簡體   English   中英

從 csv 中選擇分散列 plot

[英]selecting columns for scatter plot from csv

我正在嘗試從大型 csv 制作散點矩陣,並且散點矩陣在較小的文件上工作正常,但是將其應用於較大的文件時,我無法獲得正確的列 output,它只返回 2 個散點圖之一和當我嘗試增加從哪里開始選擇的數量時,會在 output 上引發錯誤:

    Traceback (most recent call last):
  File "C:\Users\Kinkerman\Downloads\Uni\Data Analytics\Python\Python files wk 5\scatter_matrix2.py", line 8, in <module>
    scatter_matrix(data.loc[:, "V5":"V8"], diagonal="kde")
  File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\plotting\_misc.py", line 128, in scatter_matrix
    return plot_backend.scatter_matrix(
  File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\plotting\_matplotlib\misc.py", line 50, in scatter_matrix
    fig, axes = create_subplots(naxes=naxes, figsize=figsize, ax=ax, squeeze=False)
  File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\plotting\_matplotlib\tools.py", line 267, in create_subplots
    ax0 = fig.add_subplot(nrows, ncols, 1, **subplot_kw)
  File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\figure.py", line 772, in add_subplot
    ax = subplot_class_factory(projection_class)(self, *args, **pkw)
  File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\axes\_subplots.py", line 36, in __init__
    self.set_subplotspec(SubplotSpec._from_subplot_args(fig, args))
  File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\gridspec.py", line 597, in _from_subplot_args
    gs = GridSpec._check_gridspec_exists(figure, rows, cols)
  File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\gridspec.py", line 225, in _check_gridspec_exists
    return GridSpec(nrows, ncols, figure=figure)
  File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\gridspec.py", line 385, in __init__
    super().__init__(nrows, ncols,
  File "C:\Users\Kinkerman\AppData\Local\Programs\Python\Python310\lib\site-packages\matplotlib\gridspec.py", line 52, in __init__
    raise ValueError(
    ValueError: Number of columns must be a positive integer, not 0ValueError: Number of columns must be a positive integer, not 0

到目前為止,這是我的代碼:

    import pandas as pd
    import matplotlib.pyplot as plt
    from pandas.plotting import scatter_matrix

    #read from file and rename columns "Vi"
    data = pd.read_csv('principal_offence_category_april_2014.csv')
    data.columns = ["V"+str(i) for i in range(1, len(data.columns)+1)]

    #select columns and plot in the scatter matrix
    scatter_matrix(data.loc[:, "V5":"V8"], diagonal="kde")
    plt.tight_layout()
    plt.show()

這是我正在使用的文件 >>> https://drive.google.com/file/d/1X6PN_EtMpspazVxI6d3h3J08NWQIb11Y/view?usp=sharing

您可以使用thousands參數來指定使用,並使用na_values-轉換為NaN值。 然后,您需要一種將%條目轉換為浮點數的方法。 例如:

data['V5'] = data['V5'].str.rstrip('%').astype('float')

這可以應用於某些百分比列:

import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix

#read from file and rename columns "Vi"
data = pd.read_csv('principal_offence_category_april_2014.csv', thousands=',', na_values=['-'])
data.columns = [f"V{i}" for i in range(1, len(data.columns)+1)]
cols = [f'V{col}' for col in range(3, 12, 2)]       # e.g. percentage columns  V3, V5, V7, V9, V11

for col in cols:
    data[col] = data[col].str.rstrip('%').astype('float')

#select columns and plot in the scatter matrix
scatter_matrix(data[cols], diagonal="kde")
plt.tight_layout()
plt.show()      

它將為您的文件提供此 output:

散點矩陣

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM