在python中读取csv文件两次

Question

Here is my Python code: 这是我的Python代码：

import csv

# Reading
ordersFile = open('orders.csv', 'rb')
ordersR = csv.reader(ordersFile, delimiter=',')

# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in ordersR:
    if order[2] == '5' and order[13] == 'Brazil':
        print order
# Find order employeeID=5
print "Find order employeeID=5"
for order in ordersR:
    if order[2] == '5':
        print order
ordersFile.close()

I can print something of "# Find order employeeID=5, shipCountry="Brazil"", but I got nothing for # Find order employeeID=5. 我可以打印一些“＃Find order employeeID = 5，shipCountry =”Brazil“”，但是我找不到＃Find order employeeID = 5。 I was thinking of how to reading(selecting) rows in the same csv files more than one time. 我在考虑如何在同一个csv文件中多次读取（选择）行。

Answer 1

You're just reading right through your CSV file, but if you want to work on the data in multiple passes, you should read the contents into a variable. 您只是通过CSV文件阅读，但如果您想要多次处理数据，则应将内容读入变量。 Then you don't have to re-read the file every time you need to do stuff with it. 然后，每次需要使用它时，您都不必重新读取文件。

import csv

# Read order rows into our list
# Here I use a context manager so that the file is automatically
# closed upon exit
with open('orders.csv') as orders_file:
    reader = csv.reader(orders_file, delimiter=',')
    orders = list(reader)

# Find order employeeID=5, shipCountry="Brazil"
print "Find order employeeID=5, shipCountry=\"Brazil\""
for order in orders:
    if order[2] == '5' and order[13] == 'Brazil':
        print order

# Find order employeeID=5
print "Find order employeeID=5"
for order in orders:
    if order[2] == '5':
        print order

If your CSV file is too huge to fit into memory (or you don't want to read it all into memory for some reason), then you'll need a different approach. 如果您的CSV文件太大而无法放入内存（或者您不想因为某种原因将其全部读入内存），那么您将需要一种不同的方法。 If you need that, please leave a comment. 如果您需要，请发表评论。

Answer 2

What you can do is simply convert the reader object result into a list : 您可以做的只是将reader对象结果转换为列表：

with open('orders.csv', 'rb') as ordersFile:
    ordersR = list(csv.reader(ordersFile, delimiter=','))

The reader object is like a generator, once you have iterate the values, you cannot begin a second loop to read the values again. reader对象就像一个生成器，一旦迭代了值，就无法再开始第二次循环来读取值。

Answer 3

if you do not want to store all your data in a list, this is a pure generator-based approach to iterate over your csv file twice. 如果您不想将所有数据存储在列表中，这是一种基于生成器的纯方法，可以两次迭代csv文件。 using itertools.tee : 使用itertools.tee ：

with open('orders.csv', 'r') as file:
    rows0, rows1 = tee(reader(file, delimiter=','))

    for row in rows0:
        print(row)  # search for something...

    print()

    for row in rows1:
        print(row)  # search for a different thing...

Answer 4

It's better to read through files once because I/O is likely to be the slowest part of your program. 最好通读文件一次，因为I / O可能是程序中最慢的部分。

If you need to re-read the file, you can either close it and re-open it, or seek() to the beginning, ie add ordersFile.seek(0) between your loops. 如果您需要重新读取该文件，您可以关闭它并重新打开它，或者在开头seek() ，即在循环之间添加ordersFile.seek(0) 。

Answer 5

This a good case for using the pandas module (you need to install it: pip install pandas ) 这是使用pandas模块的好例子（你需要安装它： pip install pandas ）

After that, you just read the file once, and perform any type of fitering easily 之后，您只需阅读一次文件，轻松执行任何类型的装配

for instance, to read and filter the file more that once, follow this example: 例如，要多次读取和过滤文件，请按照以下示例操作：

import pandas as pd 

# read csv into a dataframe 
df = pd.read_csv('orders.csv', delimiter=',') 

# get the data that has employeeID == 5
df1 = df[df["employeeID"] == 5]
print(df1) 

# get the data that has employeeID == 5 and  shipCountry=\"Brazil\"

df2 = df[(df["employeeID"] == 5)& (df["shipCountry"] == "Brazil")]
print(df2)

Answer 6

As @ Nick T mentioned above, I/O is considered expensive comparing to RAM access, so if you need to iterate over your file more than once, it is better to save it to a variable. 正如上面提到的@Nick T ，与RAM访问相比，I / O被认为是昂贵的，因此如果您需要多次迭代文件，最好将其保存到变量中。

You also can combine multiple conditions in a single for loop, so it performs faster (single iteration): 您还可以在单个for循环中组合多个条件，因此它执行速度更快（单次迭代）：

with open('orders.csv', 'rb') as ordersFile:
    orders = list(csv.reader(ordersFile, delimiter=','))

# Find order employeeID=5, shipCountry="Brazil"
emp = []
country = []
for order in orders:
    if order[2] == '5':
        if order[13] == 'Brazil':
            country.append(order)
        else:
            emp.append(order)

 print 'emp id=5 and shippingcountry=Brazil: {}'.format(country)
 print 'emp id=5: {}'.format(emp)

Note that this isn't scalable, you probably don't want to add any more if logic in this block as it becomes not readable 请注意，这不是可伸缩的， if此块中的逻辑变得不可读，您可能不希望再添加它

在python中读取csv文件两次

问题描述

6 个解决方案

解决方案1
5 2017-09-06 20:13:20

解决方案2
1 2017-09-06 20:14:11

解决方案3
1 2017-09-06 20:32:31

解决方案4
0 2017-09-06 20:13:39

解决方案5
0 2017-09-06 20:14:15

解决方案6
0 2017-09-06 20:41:06

在python中读取csv文件两次

问题描述

6 个解决方案

解决方案1 5 2017-09-06 20:13:20

解决方案2 1 2017-09-06 20:14:11

解决方案3 1 2017-09-06 20:32:31

解决方案4 0 2017-09-06 20:13:39

解决方案5 0 2017-09-06 20:14:15

解决方案6 0 2017-09-06 20:41:06

解决方案1
5 2017-09-06 20:13:20

解决方案2
1 2017-09-06 20:14:11

解决方案3
1 2017-09-06 20:32:31

解决方案4
0 2017-09-06 20:13:39

解决方案5
0 2017-09-06 20:14:15

解决方案6
0 2017-09-06 20:41:06