[英]How to iterate through two columns in python?
im trying to iterate through two columns in the csv file using python?, I heard that you have to import pandas for this, but im just struggling on the coding part. 我试图使用python迭代csv文件中的两列?我听说你必须为此导入pandas,但我只是在编码部分挣扎。
import csv as csv
import numpy as np
import pandas as pd
csv_file_object = csv.reader(open('train.csv', 'rb')) # Load in the csv file
header = csv_file_object.next() # Skip the fist line as it is a header
data=[] # Create a variable to hold the data
for row in csv_file_object: # Skip through each row in the csv file,
data.append(row[0:]) # adding each row to the data variable
data = np.array(data)
def number_of_female_in_class_3(data):
for row in data.iterow:
if row[2] == 'female' and row[4] == '3':
sum += 1
The problem is the function number_of_female_in_class_3 i want to go through two colunms, i want to go through column 2 to check if the rows contains the string 'female' and go through columns 4 and check if the status is '3'.If this is true then i want to increment 1 to sum . 问题是函数number_of_female_in_class_3我想通过两个colunms,我想通过第2列来检查行是否包含字符串'female'并通过第4列并检查状态是否为'3'。如果这是是的,然后我想增加1到总和 。
I was wondering if someone can post a simple code on how to accomplish this? 我想知道是否有人可以发布一个简单的代码来说明如何实现这一目标?
here is the train.csv file im trying to retrieve. 这是我试图检索的train.csv文件。
**PassengerID** | **Survived** | **Pclass** | **Name** | **Sex** |
1 | 0 | 3 | mary | Female |
2 | 1 | 2 | james | Male |
3 | 1 | 3 | Tanya | Female |
Thank you 谢谢
Indeed, pandas
can help you here. 的确, pandas
可以帮助你。
I'm starting with a cleaner CSV: 我从一个更干净的CSV开始:
PassengerID,Survived,Pclass,Name,Sex
1,0,3,mary,female
2,1,2,james,male
3,1,3,tanya,female
If your CSV actually looks like what you posted (not really a CSV), then you will have some wrangling to do (see below). 如果您的CSV实际上看起来像您发布的内容(不是真正的CSV),那么您将有一些争吵(见下文)。 But if you can get pandas
to eat it: 但如果你能吃pandas
:
>>> import pandas as pd
>>> df = pd.DataFrame.from_csv('data.csv')
>>> result = df[(df.Sex=='female') & (df.Survived==False)]
Results in a new DataFrame
: 结果在新的DataFrame
:
>>> result
Survived Pclass Name Sex
PassengerID
1 0 3 mary female
You can do len(result)
to get the count you're after. 你可以做len(result)
得到你想要的计数。
If you're stuck with that nasty CSV, you can get your df
like so: 如果你坚持使用那个令人讨厌的CSV,你可以这样得到你的df
:
# Load using a different delimiter.
df = pd.DataFrame.from_csv('data.csv', sep="|")
# Rename the index.
df.index.names = ['PassID']
# Rename the columns, using X for the bogus one.
df.columns = ['Survived', 'Pclass', 'Name', 'Sex', 'X']
# Remove the 'extra' column.
del df['X']
I think this is what you need: 我想这就是你需要的:
import csv
def number_of_female_in_class_3(data):
# initialize sum variable
sum = 0
for row in data:
if row[4] == 'Female' and row[2] == '3':
# match
sum += 1
# return the result
return sum
# Load in the csv file
csv_file_object = csv.reader(open('train.csv', 'rb'), delimiter='|')
# skip the header
header = csv_file_object.next()
data = []
for row in csv_file_object:
# add each row of data to the data list, stripping excess whitespace
data.append(map(str.strip, row))
# print the result
print number_of_female_in_class_3(data)
Some explanation: 一些解释:
First of all in your file you have Female with an uppercase F, secondly you had your column numbers backwards (gender in column 5 and class in column 3) You need to initialize the sum variable to 0 before you start incrementing it. 首先,在你的文件中,你有一个大写字母F的女性,其次你有你的列号向后(第5列中的性别和第3列中的类)你需要在开始递增之前将sum变量初始化为0。 numpy and pandas are not needed here although you need to apply the strip function to every element in each row to remove excess spaces ( map(str.strip, row)
) and also pass delimiter='|'
这里不需要numpy和pandas,尽管你需要将strip函数应用于每行中的每个元素以删除多余的空格( map(str.strip, row)
)并传递delimiter='|'
into csv.reader
because the default delimiter is a comma. 到csv.reader
因为默认分隔符是逗号。 Lastly you need to return sum
at the end of you function. 最后,你需要在函数结束时return sum
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.