python csv file reading: turning the first row into column headers, next(reader) returns unwanted characters

Question

Currently I'm writing some code to read in csv files with pandas and I need the first row of the file to be read into a list in order to use it for some descriptives (see code Part1). I can just use the pandas.read_csv Parameter header=0 , which reads out column headers automatically, but it does not return a list afaik. In the comment in print() , names is the list that I used to manually pass column headers to pandas.read_csv but I'd like to have that be automatic (so when I add/delete columns I don't have to edit the array of names manually).

So, to work around this, I came up with the idea to just separately read in the first row using csv.reader and get a list with column names that I can use in pandas.read_csv that way (see code Part2).

Part1 pandas csv reading and printing descriptives of the data

import pandas as pd
filename = 'test.csv'
dataheadsize = 10
data = pd.read_csv(filename, sep=";", header=0, decimal=",")

used to pass list of names here instead of header=0

print('Descriptives:\n', data.describe(), '\n\n',
'Datasheet (', dataheadsize, 'rows shown):\n', data.head(dataheadsize),
#'Count per class:\n',data.groupby(names[0]).size(),'\n\n',
)

Part2 trying to get the first row of the csv to be read into a list

import csv
file = open(filename, 'r')
reader = csv.reader(file, delimiter=';')
names = next(reader)
print(names)

This gives me the list that I need but for some reason it reads in some additional unwanted characters at index [0]. this is what is returned by print() :

['ï»¿VAR00001', 'VAR00002', 'VAR00003']

As you can see, I don't want those ' ï»¿ ' characters to be returned and I wonder what the best method is to circumvent that, and I'd like it to be as automatic as possible for future uses, which is why I don't want to just remove the characters by slicing because I don't know if those characters change depending on the csv file, if the amount of them changes, etc.

As a reference, this is the first 5 rows of the .csv file:

VAR00001;VAR00002;VAR00003
1;2;4
1;2;4
0;5;4
0;1;4

As you can probably tell by now, I'm not the most experienced coder, so if there's a way to skip the whole 'separately reading in the csv just to get the column names into a list' part, please do let me know, because I couldn't figure that out!

Answer 1

我不知道为什么要添加这些字符，但为什么不尝试：

list(data.keys())

Answer 2

If all else fails you can manually remove it.

def FixHeader(headerArr):
    newHeaderArr = []
    for i in range(len(headerArr)):
        if i == 0: 
            newHeaderArr.append(headerArr[i][1:])
            # 1 being how many chars you want to remove
        else:
            newHeaderArr.append(headerArr[i])
    #print(newHeaderArr)
    return newHeaderArr

Answer 3

You can use the nrows argument to pd.read_csv to read in column labels separately:

# read in column labels as list
cols = pd.read_csv('file.csv', nrows=0).columns.tolist()

# read in data; use default pd.RangeIndex, i.e. 0, 1, 2, etc., as columns
data = pd.read_csv('file.csv', header=None, skiprows=[0])

If you need to specify an encoding, you can do so via the encoding argument, eg encoding='latin-1' .

Answer 4

Thanks for the rapid replies guys!

Just fyi, when I change the encoding to utf-8 I get this list

['\VAR00001', 'VAR00002', 'VAR00003']

and when I use latin-1 it doesn't change anything compared to the list I originally posted. I'm sure this would work, though, given I figure out the correct Encoding.

However, I'm using list(data.keys()) as it was suggested and that works like a charm while also completely removing the need to read in anything separately. Thanks a bunch to everyone who responded!

python csv file reading: turning the first row into column headers, next(reader) returns unwanted characters

Question

Part1 pandas csv reading and printing descriptives of the data

used to pass list of names here instead of header=0

Part2 trying to get the first row of the csv to be read into a list

4 answers

solution1
0 2019-01-22 09:17:41

solution2
0 2019-01-22 09:18:10

solution3
0 2019-01-22 09:22:56

solution4
0 2019-01-22 09:34:45

python csv file reading: turning the first row into column headers, next(reader) returns unwanted characters

Question

Part1 pandas csv reading and printing descriptives of the data

used to pass list of names here instead of header=0

Part2 trying to get the first row of the csv to be read into a list

4 answers

solution1 0 2019-01-22 09:17:41

solution2 0 2019-01-22 09:18:10

solution3 0 2019-01-22 09:22:56

solution4 0 2019-01-22 09:34:45

solution1
0 2019-01-22 09:17:41

solution2
0 2019-01-22 09:18:10

solution3
0 2019-01-22 09:22:56

solution4
0 2019-01-22 09:34:45