简体   繁体   English

在python中读取csv文件两次

[英]Reading csv file twice in python

Here is my Python code: 这是我的Python代码:

import csv

# Reading
ordersFile = open('orders.csv', 'rb')
ordersR = csv.reader(ordersFile, delimiter=',')

# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in ordersR:
    if order[2] == '5' and order[13] == 'Brazil':
        print order
# Find order employeeID=5
print "Find order employeeID=5"
for order in ordersR:
    if order[2] == '5':
        print order
ordersFile.close()

I can print something of "# Find order employeeID=5, shipCountry="Brazil"", but I got nothing for # Find order employeeID=5. 我可以打印一些“#Find order employeeID = 5,shipCountry =”Brazil“”,但是我找不到#Find order employeeID = 5。 I was thinking of how to reading(selecting) rows in the same csv files more than one time. 我在考虑如何在同一个csv文件中多次读取(选择)行。

You're just reading right through your CSV file, but if you want to work on the data in multiple passes, you should read the contents into a variable. 您只是通过CSV文件阅读,但如果您想要多次处理数据,则应将内容读入变量。 Then you don't have to re-read the file every time you need to do stuff with it. 然后,每次需要使用它时,您都不必重新读取文件。

import csv

# Read order rows into our list
# Here I use a context manager so that the file is automatically
# closed upon exit
with open('orders.csv') as orders_file:
    reader = csv.reader(orders_file, delimiter=',')
    orders = list(reader)

# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in orders:
    if order[2] == '5' and order[13] == 'Brazil':
        print order

# Find order employeeID=5
print "Find order employeeID=5"
for order in orders:
    if order[2] == '5':
        print order

If your CSV file is too huge to fit into memory (or you don't want to read it all into memory for some reason), then you'll need a different approach. 如果您的CSV文件太大而无法放入内存(或者您不想因为某种原因将其全部读入内存),那么您将需要一种不同的方法。 If you need that, please leave a comment. 如果您需要,请发表评论。

What you can do is simply convert the reader object result into a list : 您可以做的只是将reader对象结果转换为列表:

with open('orders.csv', 'rb') as ordersFile:
    ordersR = list(csv.reader(ordersFile, delimiter=','))

The reader object is like a generator, once you have iterate the values, you cannot begin a second loop to read the values again. reader对象就像一个生成器,一旦迭代了值,就无法再开始第二次循环来读取值。

if you do not want to store all your data in a list, this is a pure generator-based approach to iterate over your csv file twice. 如果您不想将所有数据存储在列表中,这是一种基于生成器的纯方法,可以两次迭代csv文件。 using itertools.tee : 使用itertools.tee

with open('orders.csv', 'r') as file:
    rows0, rows1 = tee(reader(file, delimiter=','))

    for row in rows0:
        print(row)  # search for something...

    print()

    for row in rows1:
        print(row)  # search for a different thing...

It's better to read through files once because I/O is likely to be the slowest part of your program. 最好通读文件一次,因为I / O可能是程序中最慢的部分。

If you need to re-read the file, you can either close it and re-open it, or seek() to the beginning, ie add ordersFile.seek(0) between your loops. 如果您需要重新读取该文件,您可以关闭它并重新打开它,或者在开头seek() ,即在循环之间添加ordersFile.seek(0)

This a good case for using the pandas module (you need to install it: pip install pandas ) 这是使用pandas模块的好例子(你需要安装它: pip install pandas

After that, you just read the file once, and perform any type of fitering easily 之后,您只需阅读一次文件,轻松执行任何类型的装配

for instance, to read and filter the file more that once, follow this example: 例如,要多次读取和过滤文件,请按照以下示例操作:

import pandas as pd 

# read csv into a dataframe 
df = pd.read_csv('orders.csv', delimiter=',') 

# get the data that has employeeID == 5
df1 = df[df["employeeID"] == 5]
print(df1) 

# get the data that has employeeID == 5 and  shipCountry=\"Brazil\"

df2 = df[(df["employeeID"] == 5)& (df["shipCountry"] == "Brazil")]
print(df2) 

As @ Nick T mentioned above, I/O is considered expensive comparing to RAM access, so if you need to iterate over your file more than once, it is better to save it to a variable. 正如上面提到的@Nick T ,与RAM访问相比,I / O被认为是昂贵的,因此如果您需要多次迭代文件,最好将其保存到变量中。

You also can combine multiple conditions in a single for loop, so it performs faster (single iteration): 您还可以在单​​个for循环中组合多个条件,因此它执行速度更快(单次迭代):

with open('orders.csv', 'rb') as ordersFile:
    orders = list(csv.reader(ordersFile, delimiter=','))

# Find order employeeID=5, shipCountry="Brazil"
emp = []
country = []
for order in orders:
    if order[2] == '5':
        if order[13] == 'Brazil':
            country.append(order)
        else:
            emp.append(order)

 print 'emp id=5 and shippingcountry=Brazil: {}'.format(country)
 print 'emp id=5: {}'.format(emp)

Note that this isn't scalable, you probably don't want to add any more if logic in this block as it becomes not readable 请注意,这不是可伸缩的, if此块中的逻辑变得不可读,您可能不希望再添加它

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM