简体   繁体   English

使用Python 2读取CSV文件

[英]Reading a CSV file using Python 2

I'm running Python 2.7. 我正在运行Python 2.7。 I'm very new to Python. 我是Python的新手。 I'm trying to read a CSV file (the values are separated by spaces) and separate the values inside based on the header above the coordinates. 我正在尝试读取CSV文件(这些值由空格分隔),并根据坐标上方的标头将内部的值分开。 The format of the file isn't what I'm used to and I'm having trouble getting the values to read correctly. 该文件的格式不是我所习惯的格式,并且我无法正确读取值。 Even if I could get them to read correctly, I don't understand how to put them in a list. 即使我能正确阅读它们,我也不知道如何将它们放在列表中。

Here is what the CSV file looks like: CSV文件如下所示:

# image name
1.png
# probe locations
100 100
200 100
100 200
300 300

# another image name
2.png
100 200
200 100
300 300
135 322

# end

Here's the code I am playing with: 这是我正在使用的代码:

class CommentedFile:
    def __init__(self, f, commentstring="#"):
        self.f = f
        self.commentstring = commentstring
    def next(self):
        line = self.f.next()
        while line.startswith(self.commentstring):
            line = self.f.next()
        return line
    def __iter__(self):
        return self

#I did this in order to ignore the comments in the CSV file

tsv_file = csv.reader(CommentedFile(open("test.exp", "rb")),
                  delimiter=' ')


for row in tsv_file:
    if row != int:
        next(tsv_file)
    if row:
        print row

the code prints out: 代码输出:

['100', '100']
['100', '200']
['100', '200']
['300', '300']
Traceback (most recent call last):
  File "the path", line 57, in <module>
next(tsv_file)
StopIteration

So I'm trying to get the program to separate the coordinates based on the header and then put them into separate lists. 因此,我正在尝试使程序根据标题分离坐标,然后将其放入单独的列表中。 Thank you for your help! 谢谢您的帮助!

Take a look at pandas . 看看熊猫 It has a DataFrame object which can hold your data and allow you manipulate in an intuitive way. 它具有一个DataFrame对象,该对象可以保存您的数据并允许您以直观的方式进行操作。 It also has a read_csv function which takes out a lot of the hassle when dealing with csv files. 它还具有read_csv函数,该函数在处理csv文件时消除了很多麻烦。

for example: 例如:

import pandas as pd

#reads your csv file in and returns a DataFrame object as metioned above. 
df = pd.read_csv("your_csv.csv", sep=' ', names=['co_a','co_b'], header=None, skiprows=2)

#extracts your discordant to separate lists
list1 = df.co_a.to_list()
list2 = df.co_b.to_list()

you can use df or df.head() to see your dataframe and how your data is managed. 您可以使用dfdf.head()查看数据df.head()以及如何管理数据。 It's also worth mentioning that df.co_a is a Series object, think super list / dict, and you can probably do your analysis or manipulation right from there. 还值得一提的是df.co_a是一个Series对象,请考虑超级列表/字典,您可能可以从那里直接进行分析或操作。

Also if you show me how the comments are in the csv file, I can show you how to ignore them with read_csv . 另外,如果您向我展示csv文件中的注释,我可以通过read_csv向您展示如何忽略它们。

I know you were looking for an answer with the csv module but this is a much more advanced tool and might help you out in the long run. 我知道您正在使用csv module寻找答案,但这是一个高级得多的工具,从长远来看可能会帮助您。

Hope it helps! 希望能帮助到你!

Your code worked well for me actually. 实际上,您的代码对我来说效果很好。 I don't know why you're getting the traceback. 我不知道你为什么要追溯。

tmp.csv tmp.csv

# image name
1.png
# probe locations
100 100
200 100
100 200
300 300

# another image name
2.png
100 200
200 100
300 300
135 322

# end

tmp.py tmp.py

import csv

class CommentedFile:
    def __init__(self, f, commentstring="#"):
        self.f = f
        self.commentstring = commentstring
    def next(self):
        line = self.f.next()
        while line.startswith(self.commentstring):
            line = self.f.next()
        return line
    def __iter__(self):
        return self

#I did this in order to ignore the comments in the CSV file

tsv_file = csv.reader(CommentedFile(open("tmp.csv", "rb")),
                  delimiter=' ')


for row in tsv_file:
    if row != int:
        next(tsv_file)
    if row:
        print row

Shell output 外壳输出

tmp$python tmp.py 
['1.png']
['200', '100']
['300', '300']
['2.png']
['200', '100']
['135', '322']
tmp$uname -mprsv
Darwin 12.4.0 Darwin Kernel Version 12.4.0: Wed May  1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64 i386
tmp$python --version
Python 2.7.2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM