简体   繁体   中英

ValueError: could not convert string to float: 'A1' using np.loadtxt

I have a program that needs to process a csv file. This file needs to be converted into a dataset. The example that I am working with comes from the popular python tutorial with the iris data set . I am trying to replace datasets.load_iris() with a method to read the csv 'A1-md.csv'.

Expected: The program process the csv and loads the data.

Actual:

Traceback (most recent call last):
  File ".\example.py", line 38, in <module>
    main()
  File ".\example.py", line 11, in main
    dataset = np.loadtxt(fname = 'A1-dm.csv', delimiter = ',')
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1134, in loadtxt
    for x in read_data(_loadtxt_chunksize):
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1061, in read_data
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 1061, in <listcomp>
    items = [conv(val) for (conv, val) in zip(converters, vals)]
  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\lib\site-packages\numpy\lib\npyio.py", line 768, in floatconv
    return float(x)
ValueError: could not convert string to float: 'A1'

The code for this implementation is

from sklearn import datasets
from sklearn.model_selection import train_test_split
from MDLP import MDLP_Discretizer

def main():

    ######### USE-CASE EXAMPLE #############

    #read dataset
    dataset = np.loadtxt(fname = 'A1-dm.csv', delimiter = ',')
    X, y = dataset['A1'], dataset['Class']
    # feature_names, class_names = dataset['feature_names'], dataset['target_names']
    # numeric_features = np.arange(X.shape[1])  # all fetures in this dataset are numeric. These will be discretized

    # #Split between training and test
    # X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)

    # #Initialize discretizer object and fit to training data
    # discretizer = MDLP_Discretizer(features=numeric_features)
    # discretizer.fit(X_train, y_train)
    # X_train_discretized = discretizer.transform(X_train)

    # #apply same discretization to test set
    # X_test_discretized = discretizer.transform(X_test)

    # #Print a slice of original and discretized data
    # print('Original dataset:\n%s' % str(X_train[0:5]))
    # print('Discretized dataset:\n%s' % str(X_train_discretized[0:5]))

    # #see how feature 0 was discretized
    # print('Feature: %s' % feature_names[0])
    # print('Interval cut-points: %s' % str(discretizer._cuts[0]))
    # print('Bin descriptions: %s' % str(discretizer._bin_descriptions[0]))

if __name__ == '__main__':
    main()

A sample of the csv is

A1,A2,A3,Class
2,0.4631338,1.5,3
8,0.7460648,3.0,3
6,0.264391038,2.5,2
5,0.4406713,2.3,1
2,0.410438159,1.5,3
2,0.302901816,1.5,2
6,0.275869396,2.5,3
8,0.084782428,3.0,3
2,0.53226533,1.5,2

Any help processing this would be greatly appreciated. Thank you.

The first line of your CSV file is an header that displays text. You should skip this line in order to operate string to float conversion.

Please check this out: numpy loadtxt skip first row

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM