简体   繁体   English

在Python中从CSV文件访问列数据

[英]Accessing column data from a CSV file in Python

I have a CSV file with A, B, C, D columns and N rows. 我有一个带有A,B,C,D列和N行的CSV文件。 The problems is that the data in these columns is not of the same length ie some are 4.5 and some are 4.52. 问题在于这些列中的数据长度不同,即某些为4.5,有些为4.52。

My question is in two parts : 我的问题分为两个部分:

How do i access these columns from the csv files. 如何从csv文件访问这些列。 I've used this code to print the contents of the csv file and to read them into an array 我已使用此代码来打印csv文件的内容并将其读取到数组中

    import csv
    with open('file.csv','rb') as f:
        reader = csv.reader(f)
        for row in reader:
            print row

to print the rows in the CSV file and i replaced 打印CSV文件中的行,我替换了

    print row 

with

    z = row
    z.append(z)

to save the data into an array. 将数据保存到数组中。

But z is a 1-D array. 但是z是一维数组。 And the data is of type string. 并且数据是字符串类型。 When i try performing operations of the type np.median(z), it gives me an error. 当我尝试执行np.median(z)类型的操作时,它给我一个错误。 Also, i cannot do 另外,我做不到

    z.append(float(z))

This is giving me a TypeError. 这给了我一个TypeError。

And, is there anyway to truncate the values and set them to a certain precision while we are importing them from the csv file?! 而且,当我们从csv文件导入它们时,是否有任何方法可以截断这些值并将它们设置为一定的精度? Like, if the file has values like 4.3, 4.56, 4.299, ..., i want to constrain what i finally import to just one decimal point. 就像,如果文件具有4.3、4.56、4.299等值,我想将我最终导入的内容限制为一个小数点。

This SE question is the closest to answering my 2nd question - Python - CSV: Large file with rows of different lengths - but i do not understand it. 这个SE问题最接近回答我的第二个问题-Python-CSV:具有不同长度的行的大文件 -但我不明白。 If any of you can help me regarding this, I'd be thankful. 如果有任何人可以帮助我,我将非常感激。

EDIT 1 : @ Richie : here's a sample data set - http://goo.gl/io8Az . 编辑1:@ Richie:这是一个示例数据集-http: //goo.gl/io8Az It links to a google doc. 它链接到Google文档。 And regd your comment, this was the outcome with i ran your code on my csv file - regd您的评论,这就是我在csv文件中运行您的代码的结果-

     ValueError: could not convert string to float: plate

@ Pieters : z = row, z.append(z) created this - ['3836', '55302', '402', '22.945717', '22.771544', '23.081865', '22.428421', '21.78294', '164.40663689', '-1.25641627', '1.780485', '1237674648848106129', [...]]. @ Pieters:z =行,z.append(z)创建了此-[['3836','55302','402','22.945717','22.771544','23.081865','22.428421','21.78294',' 164.40663689”,“-1.25641627”,“ 1.780485”,“ 1237674648848106129”,[...]。

I should've mentioned that i've just started using python and i'm learning things on a need-to-know basis! 我应该提到我刚刚开始使用python,并且我在需要了解的基础上学习东西! I'm improvising with bits and pieces of code i'm finding on the web. 我在网络上发现的零散代码即兴发挥。

EDIT 2: I've heard about pandas. 编辑2:我听说过熊猫。 I guess i should start using it. 我想我应该开始使用它。

@ Khalid - i've run your code and i'm able to retrieve the column i want. @ Khalid-我已经运行了您的代码,并且能够检索到我想要的列。 Instead of printing the whole row out, can i access it instead?! 不用打印整个行,我可以访问它吗? as a static array?! 作为静态数组?

EDIT 3: @ richie : the first time i ran your code, this showed up - 编辑3:@ richie:我第一次运行您的代码时,这显示了-

Traceback (most recent call last): File "", line 4, in ValueError: could not convert string to float: plate 追溯(最近一次调用):文件“”,第4行,在ValueError中:无法将字符串转换为float:plate

well, i realized that the first row containing the column names is the cause, so i removed the first row, saved this as a new file and ran the code on that file and it worked perfectly fine. 好吧,我意识到包含列名称的第一行是原因,所以我删除了第一行,将其保存为新文件并在该文件上运行了代码,并且效果很好。

But, if i do remove the first line, which contains the column identifiers, i cannot use the method mentioned by khalid below. 但是,如果我确实删除了包含列标识符的第一行,则无法使用下面khalid提到的方法。 I am looking at pandas in the meanwhile. 同时,我正在看熊猫。

Thanks for everything guys :) 谢谢你们的一切:)

EDIT 4 : Lesson Learnt. 编辑4:经验教训。 Pandas is Awesome. 熊猫真棒。 Job Done :)... 任务完成 :)...

A few things, depending on what you want to do. 几件事,取决于您想要做什么。 Here is the simple approach to get them referenced by columns: 这是使它们被列引用的简单方法:

import csv

with open('file.csv','r') as f:
    reader = csv.DictReader(f, delimiter=',')
    rows = list(reader)

for row in rows:
   print row['plate']

If you want to convert them to floats or ints, you can use map . 如果要将它们转换为浮点数或整数,可以使用map However, I suspect you want to do some calculations in the end, and for that its better to use pandas . 但是,我怀疑您最终需要进行一些计算,因此最好使用pandas

As an added bonus, pandas will give you a 2D grid respresentation called a DataFrame of your file. 作为额外的好处, pandas会为您提供2D网格表示DataFrame ,称为文件的DataFrame

Try this; 尝试这个;

import csv
import numpy as np
class onefloat(float):
   def __repr__(self):
       return "%0.1f" % self
with open('file.csv','rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print map(onefloat,row) # your issue of 1 decimal point is taken care of here
        print '{:.1f}'.format(np.median(map(float,row))) # in case you want this too to be of 1 decimal point

And this is how it is done using Pandas 这就是使用熊猫的方法

import pandas as pd
data = pd.read_csv('richards_quasar_outliers.csv')
print data['plate'].median()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM