簡體   English   中英

CSV文件中的Python條件過濾

[英]Python conditional filtering in csv file

請幫忙! 我嘗試過不同的事情/程序包,編寫了一個程序,該程序需要4個輸入,並根據來自csv文件的輸入組合返回一組的寫作得分統計信息。 這是我的第一個項目,因此,我將不勝感激!

這是csv示例(共有200行):

id  gender  ses schtyp  prog        write
70  male    low public  general     52
121 female  middle  public  vocation    68
86  male    high    public  general     33
141 male    high    public  vocation    63      
172 male    middle  public  academic    47
113 male    middle  public  academic    44
50  male    middle  public  general     59
11  male    middle  public  academic    34      
84  male    middle  public  general     57      
48  male    middle  public  academic    57      
75  male    middle  public  vocation    60      
60  male    middle  public  academic    57  

這是我到目前為止的內容:

import csv
import numpy
csv_file_object=csv.reader(open('scores.csv', 'rU')) #reads file
header=csv_file_object.next() #skips header
data=[] #loads data into array for processing
for row in csv_file_object:
    data.append(row)
data=numpy.array(data)

#asks for inputs 
gender=raw_input('Enter gender [male/female]: ')
schtyp=raw_input('Enter school type [public/private]: ')
ses=raw_input('Enter socioeconomic status [low/middle/high]: ')
prog=raw_input('Enter program status [general/vocation/academic: ')

#makes them lower case and strings
prog=str(prog.lower())
gender=str(gender.lower())
schtyp=str(schtyp.lower())
ses=str(ses.lower())

我所缺少的是如何僅針對特定組過濾和獲取統計信息。 例如,假設我輸入的是男性,公共,中級和學術人員-我想獲得該子集的平均寫作得分。 我試過了pandas的groupby函數,但是那只能讓您了解廣泛群體的統計信息(例如公共與私人)。 我還嘗試了熊貓的DataFrame,但是那只能讓我過濾一次輸入,並且不確定如何獲得寫作成績。 任何提示將不勝感激!

同意Ramon的觀點 ,Pandas絕對是必經之路,一旦習慣了,它便具有非凡的過濾/子設置功能。 但是首先要把頭纏起來(或至少對我來說是困難的),所以可能很難,因此我從一些舊代碼中挖掘了一些需要子設置的示例。 下面的變量itu是一個Pandas DataFrame,其中包含不同國家/地區隨時間的數據。

# Subsetting by using True/False:
subset = itu['CntryName'] == 'Albania'  # returns True/False values
itu[subset]  # returns 1x144 DataFrame of only data for Albania
itu[itu['CntryName'] == 'Albania']  # one-line command, equivalent to the above two lines

# Pandas has many built-in functions like .isin() to provide params to filter on    
itu[itu.cntrycode.isin(['USA','FRA'])]  # returns where itu['cntrycode'] is 'USA' or 'FRA'
itu[itu.year.isin([2000,2001,2002])]  # Returns all of itu for only years 2000-2002
# Advanced subsetting can include logical operations:
itu[itu.cntrycode.isin(['USA','FRA']) & itu.year.isin([2000,2001,2002])]  # Both of above at same time

# Use .loc with two elements to simultaneously select by row/index & column:
itu.loc['USA','CntryName']
itu.iloc[204,0]
itu.loc[['USA','BHS'], ['CntryName', 'Year']]
itu.iloc[[204, 13], [0, 1]]

# Can do many operations at once, but this reduces "readability" of the code
itu[itu.cntrycode.isin(['USA','FRA']) & 
    itu.year.isin([2000,2001,2002])].loc[:, ['cntrycode','cntryname','year','mpen','fpen']]

# Finally, if you're comfortable with using map() and list comprehensions, 
you can do some advanced subsetting that includes evaluations & functions 
to determine what elements you want to select from the whole, such as all 
countries whose name begins with "United":
criterion = itu['CntryName'].map(lambda x: x.startswith('United'))
itu[criterion]['CntryName']  # gives us UAE, UK, & US

熊貓 我認為這將縮短您的csv解析工作,並為您提供所需的子集功能...

import pandas as pd
data = pd.read_csv('fileName.txt', delim_whitespace=True)

#get all of the male students
data[data['gender'] == 'male']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM