简体   繁体   English

以分号为分隔符读取 CSV 文件

[英]Read CSV file with semicolon as delimiter

I have a numpy 2D array which is of the shape (4898, ) where elements in each row are separated by a semi-colon but are still stored in a single column and not multiple columns (the desired outcome).我有一个numpy 2D 数组,它的形状为(4898, ) ,其中每行中的元素用分号分隔,但仍存储在单列而不是多列中(所需的结果)。 How do I create a split at each occurrence of a semi-colon in each array of the 2D array.如何在二维数组的每个数组中每次出现分号时创建拆分。 I have written the following Python script to do so but it throws errors.我已经编写了以下 Python 脚本来执行此操作,但它会引发错误。

stochastic_gradient_descent_winequality.py stochastic_gradient_descent_winequality.py

import numpy
import pandas

if __name__ == '__main__' :

    with open('winequality-white.csv', 'r') as f_0 :
        with open('winequality-white-updated.csv', 'w') as f_1 :
            f_0.next()
            for line in f_0 :
                f_1.write(line)


    wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
    wine_data_ = wine_data
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)

    print (numpy.shape(wine_data))

Errors错误

Traceback (most recent call last):
  File "stochastic_gradient_descent_winequality.py", line 16, in <module>
    wine_data = numpy.array([x.split(';') for x in wine_data_], dtype = numpy.float)
AttributeError: 'numpy.int64' object has no attribute 'split'

If you're using semicolons ( ; ) as your csv-file separator instead of commas ( , ), you can adjust that first line:如果您使用分号 ( ; ) 作为 csv 文件分隔符而不是逗号 ( , ),则可以调整第一行:

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ';', header = None)

The problem with your list comprehension is that [x.split(';') for x in wine_data_] iterates over the column names .您的列表理解的问题在于[x.split(';') for x in wine_data_]迭代列名称

That being the case, you have no need for the line with the list comprehension.在这种情况下,您不需要使用列表理解的行。 You can read in your data and be done.您可以读入您的数据并完成。

wine_data = pandas.read_csv('winequality-white-updated.csv', sep = ',', header = None)
print (numpy.shape(wine_data))

Suppose your csv file is like this:假设你的 csv 文件是这样的:

2.12;5.12;3.12
3.1233;4;2
4;4.9696;3
2;5.0344;3
3.59595;4;2
4;4;3.59595
...

Then change your code like this:然后像这样更改您的代码:

import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ',', header = None)
wine_data_ = wine_data
wine_data = numpy.array([x.split(';') for x in wine_data_[0]], dtype = numpy.float)
wine_data

The wine_data will be: wine_data将是:

array([[ 2.12   ,  5.12   ,  3.12   ],
       [ 3.1233 ,  4.     ,  2.     ],
       [ 4.     ,  4.9696 ,  3.     ],
       [ 2.     ,  5.0344 ,  3.     ],
       [ 3.59595,  4.     ,  2.     ],
       [ 4.     ,  4.     ,  3.59595]])

Be more efficient:提高效率:

import pandas, numpy
wine_data = pandas.read_csv('test.csv', sep = ';', header = None)
wine_data = numpy.array(wine_data,dtype = numpy.float)

In this在这

x.split(';') for x in wine_data_  

whatever x you are getting that is not string.无论你得到什么x都不是字符串。 Only string have split() .只有字符串有split() If it is other than string then it will give this error如果它不是字符串,那么它会给出这个错误

object has no attribute 'split对象没有属性 'split

Check your x value.检查您的x值。

you can try something like this...你可以试试这样的...

def get_y(r): 
    return str(r['label']).split(' ')

result :
   (PILImage mode=RGB size=800x800, TensorMultiCategory([0., 0., 0., 1., 0., 0.]))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM