Python中的CSV文件錯誤

Question

當我運行Python代碼時，出現以下錯誤：

df = pd.DataFrame(desm)
scaler = StandardScaler()
scaler.fit(df)


ValueError                                Traceback (most recent call last)
<ipython-input-32-266a989a8af0> in <module>()
      1 scaler = StandardScaler()
----> 2 scaler.fit(df)

C:\Users\VILLAFAÑE\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in fit(self, X, y)
    555         # Reset internal state before fitting
    556         self._reset()
--> 557         return self.partial_fit(X, y)
    558 
    559     def partial_fit(self, X, y=None):

C:\Users\VILLAFAÑE\Anaconda3\lib\site-packages\sklearn\preprocessing\data.py in partial_fit(self, X, y)
    578         X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.copy,
    579                         ensure_2d=False, warn_on_dtype=True,
--> 580                         estimator=self, dtype=FLOAT_DTYPES)
    581 
    582         if X.ndim == 1:

C:\Users\VILLAFAÑE\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    371                                       force_all_finite)
    372     else:
--> 373         array = np.array(array, dtype=dtype, order=order, copy=copy)
    374 
    375         if ensure_2d:

ValueError: could not convert string to float: 'PMP'

我的Python代碼是：

import pandas as pd
desm = pd.read_csv("G:/BASES DE DATOS/desm4.csv")

我知道這是csv格式的內容，但我不知道如何解決。 請幫忙！ 這是csv文件的鏈接，以獲取更多信息

https://drive.google.com/file/d/0B7tO-O0lx79FSnR0cVA3MDhrTG8/view?usp=sharing

Answer 1

您正在嘗試縮放第一列為字符串而不是浮點的數據集。

您需要按以下方式讀取數據框：

import pandas as pd
from sklearn.preprocessing import StandardScaler
df = pd.read_csv('desm4.csv',index_col=0)
scaler = StandardScaler()
scaler.fit(df)

嘗試上面的代碼，如果您遇到任何問題，請告訴我。 上面的代碼將site列用作索引（每行的行ID），標准縮放器不會應用於索引，因此不會出錯。

另外，您不必做

df = pd.DataFrame(desm)

pd.read_csv讀取一個csv並返回一個數據幀

Answer 2

出現您的問題是因為默認的read_csv()方法將Index用作None，這意味着它假定的第一列為可在此處的文檔中讀取的索引。

index_col : int or sequence or False, default None

Column to use as the row labels of the DataFrame.
If a sequence is given, a MultiIndex is used. 
If you have a malformed file with delimiters at the end of each line,
you might consider index_col=False to force pandas to _not_ use the first 
column as the index (row names)

因此，嘗試使用此

import pandas as pd
desm = pd.read_csv("G:/BASES DE DATOS/desm4.csv",index_col = False)

我希望它能起作用。 請讓我知道是否有任何問題。 編碼愉快。 干杯!

Python中的CSV文件錯誤

問題描述

2 個解決方案

解決方案1
1 2017-06-01 06:22:55

解決方案2
0 已采納 2017-06-01 06:18:46

Python中的CSV文件錯誤

問題描述

2 個解決方案

解決方案1 1 2017-06-01 06:22:55

解決方案2 0 已采納 2017-06-01 06:18:46

解決方案1
1 2017-06-01 06:22:55

解決方案2
0 已采納 2017-06-01 06:18:46