简体   繁体   中英

A Pythonic way to read CSV with row and column headers

Let's have a CSV table with row and column headers, eg:

, "Car", "Bike", "Boat", "Plane", "Shuttle"
"Red", 1, 7, 3, 0, 0
"Green", 5, 0, 0, 0, 0
"Blue", 1, 1, 4, 0, 1

I want to get row and column headers, ie:

col_headers = ["Car", "Bike", "Boat", "Plane", "Shuttle"]
row_headers = ["Red", "Green", "Blue"]
data = [[1, 7, 3, 0, 0],
        [5, 0, 0, 0, 0],
        [1, 1, 4, 0, 1]]

Of course I can do something like

import csv
with open("path/to/file.csv", "r") as f:
    csvraw = list(csv.reader(f))
col_headers = csvraw[1][1:]
row_headers = [row[0] for row in csvraw[1:]]
data = [row[1:] for row in csvraw[1:]]

...but it does not look Pythonic enough.

Is there a neater way for this natural operation?

Take a look at csv.DictReader .

If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames.

Then you can just do reader.fieldnames . This, of course, only gives you column headers. You would still have to parse the row headers manually.

I think your original solution is pretty good, however.

Now I see that what I want is the easiest (and the most robust) to accomplish with Pandas .

import pandas as pd
df = pd.read_csv('foo.csv', index_col=0)

And if I want, it is easy to extract:

col_headers = list(df.columns)
row_headers = list(df.index)

Otherwise, in the "raw" Python, it seems that the method I wrote in the question is "good enough".

I am aware that this solution gives you another output format than the requested, but it is very convenient. This reads the csv line into a dictionary:

reader = csv.reader(open(parameters_file), dialect)

keys = [key.lower() for key in reader.next()]
for line in reader:
    parameter = dict(zip(keys, cells))

Without 3rd party libs (and if you can live with results being tuples from the zip op):

import csv

with open('your_csv_file') as fin:
    csvin = csv.reader(fin, skipinitialspace=True)
    col_header = next(csvin, [])[1:]
    row_header, data = zip(*((row[0], row[1:]) for row in csvin))

Gives you for col_header , row_header and data :

['Bike', 'Boat', 'Plane', 'Shuttle']
('Red', 'Green', 'Blue')
(['1', '7', '3', '0', '0'], ['5', '0', '0', '0', '0'], ['1', '1', '4', '0', '1'])

Agree, pandas is the best I found too. I'm interested into read specific values of my dataframe. Here is what I did:

import pandas as pd
d=pd.read_csv(pathToFile+"easyEx.csv")
print(d)
print(d.index.values)
print(d.index.values[2])
print(d.columns.values)
print(d.columns.values[2])
print(pd.DataFrame(d,index=['Blue'],columns=['Boat'])+0.333)

And this is what it returns:

         Car  Bike  Boat  Plane  Shuttle                                                                                                            
Red      1     7     3      0        0                                                                                                            
Green    5     0     0      0        0                                                                                                            
Blue     1     1     4      0        1                                                                                                            
['Red' 'Green' 'Blue']                                                                                                                            
Blue                                                                                                                                              
['Car' 'Bike' 'Boat' 'Plane' 'Shuttle']                                                                                                           
Boat                                                                                                                                              
      Boat                                                                                                                                        
Blue     4.333                              

Note that I can check row names with 'index' and column names with 'columns'. Also note that I can read a specific element of the 'dataframe' base on its row and column names, and that the values are still numeric, that why I added '+0.333' in the last print.

I chaged the data file, I removed the quote chars (" ") and spaces after commas in the first row. Here you have 'easyEx.csv' file:

Car,Bike,Boat,Plane,Shuttle
Red, 1, 7, 3, 0, 0
Green, 5, 0, 0, 0, 0
Blue, 1, 1, 4, 0, 1

Hope this can help =)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM