Read CSV file with semicolon as delimiter

Question

I have a numpy 2D array which is of the shape (4898, ) where elements in each row are separated by a semi-colon but are still stored in a single column and not multiple columns (the desired outcome). How do I create a split at each occurrence of a semi-colon in each array of the 2D array. I have written the following Python script to do so but it throws errors.

stochastic_gradient_descent_winequality.py

import numpy
import pandas

if __name__ == '__main__' :

    with open('winequality-white.csv', 'r') as f_0 :
        with open('winequality-white-updated.csv', 'w') as f_1 :
            f_0.next()
            for line in f_0 :
                f_1.write(line)


    wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
    wine_data_ = wine_data
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)

    print (numpy.shape(wine_data))

Errors

Traceback (most recent call last):
  File "stochastic_gradient_descent_winequality.py", line 16, in <module>
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)
AttributeError: 'numpy.int64' object has no attribute 'split'

Answer 1

If you're using semicolons ( ; ) as your csv-file separator instead of commas ( , ), you can adjust that first line:

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ';', header = None)

The problem with your list comprehension is that [x.split(';') for x in wine_data_] iterates over the column names .

That being the case, you have no need for the line with the list comprehension. You can read in your data and be done.

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
print (numpy.shape(wine_data))

Answer 2

Suppose your csv file is like this:

2.12;5.12;3.12
3.1233;4;2
4;4.9696;3
2;5.0344;3
3.59595;4;2
4;4;3.59595
...

Then change your code like this:

import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ',', header = None)
wine_data_ = wine_data
wine_data = numpy.array([x.split(';') for x in wine_data_[0]], dtype = numpy.float)
wine_data

The wine_data will be:

array([[ 2.12   ,  5.12   ,  3.12   ],
       [ 3.1233 ,  4.     ,  2.     ],
       [ 4.     ,  4.9696 ,  3.     ],
       [ 2.     ,  5.0344 ,  3.     ],
       [ 3.59595,  4.     ,  2.     ],
       [ 4.     ,  4.     ,  3.59595]])

Be more efficient:

import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ';', header = None)
wine_data = numpy.array(wine_data,dtype = numpy.float)

Answer 3

In this

x.split(';') for x in wine_data_

whatever x you are getting that is not string. Only string have split() . If it is other than string then it will give this error

object has no attribute 'split

Check your x value.

Answer 4

you can try something like this...

def get_y(r): 
    return str(r['label']).split(' ')

result :
   (PILImage mode=RGB size=800x800, TensorMultiCategory([0., 0., 0., 1., 0., 0.]))

Read CSV file with semicolon as delimiter

Question

4 answers

solution1
8 ACCPTED 2017-05-26 07:11:08

solution2
3 2017-05-26 07:17:15

solution3
1 2017-05-26 07:03:26

solution4
0 2020-06-24 20:27:48

Read CSV file with semicolon as delimiter

Question

4 answers

solution1 8 ACCPTED 2017-05-26 07:11:08

solution2 3 2017-05-26 07:17:15

solution3 1 2017-05-26 07:03:26

solution4 0 2020-06-24 20:27:48

solution1
8 ACCPTED 2017-05-26 07:11:08

solution2
3 2017-05-26 07:17:15

solution3
1 2017-05-26 07:03:26

solution4
0 2020-06-24 20:27:48