[英]how to filter csv in python
我有一個名為 film.csv 的文件 csv 每列的標題如下(帶有幾個示例行):
Year;Length;Title;Subject;Actor;Actress;Director;Popularity;Awards;*Image
1990;111;Tie Me Up! Tie Me Down!;Comedy;Banderas, Antonio;Abril, Victoria;Almodóvar, Pedro;68;No;NicholasCage.png
1991;113;High Heels;Comedy;Bosé, Miguel;Abril, Victoria;Almodóvar, Pedro;68;No;NicholasCage.png
1983;104;Dead Zone, The;Horror;Walken, Christopher;Adams, Brooke;Cronenberg, David;79;No;NicholasCage.png
1979;122;Cuba;Action;Connery, Sean;Adams, Brooke;Lester, Richard;6;No;seanConnery.png
1978;94;Days of Heaven;Drama;Gere, Richard;Adams, Brooke;Malick, Terrence;14;No;NicholasCage.png
1983;140;Octopussy;Action;Moore, Roger;Adams, Maud;Glen, John;68;No;NicholasCage.png
我需要用基本命令解析這個 csv(不使用 Pandas)
我將如何提取演員名字 = Richard、1985 年之前制作且獎項 = yes 的所有電影片名? (我已經能夠讓它顯示 lisy where awards == yes,但不是其余的)
我如何計算任何給定演員在列表中出現的次數?
file_name = "film.csv"
print('loading file')
lines = (line for line in open(file_name,encoding='cp1252')) #generator to capture lines
print('removing ;')
lists = (s.rstrip().split(";") for s in lines) #generators to capture lists containing values from lines
print('2-filter by awards')
sel = input()
if sel == '2':
cols=next(lists) #obtains only the header
print(cols)
collections = (dict(zip(cols,data)) for data in lists)
filtered = (col["Title"] for col in collections if col["Awards"][0]== "Y")
for item in filtered:
print(item)
# input()
#browse lists and index them per header values, then filter all movies that have been awarded
#using a new generator object
else:
要讀取和過濾數據,您可以使用下一個示例(我正在使用award == No
,因為您的示例中沒有帶有award == Yes
和其他條件的電影):
import csv
from collections import Counter
with open("data.csv", "r") as f_in:
reader = csv.DictReader(f_in, delimiter=";")
data = list(reader)
# extract all movie titles with the actor first name = Richard , made before year 1985 , and award = No
for d in data:
if (
d["Actor"].split(", ")[-1] == "Richard"
and int(d["Year"]) < 1985
and d["Awards"] == "No"
):
print(d)
印刷:
{
"Year": "1978",
"Length": "94",
"Title": "Days of Heaven",
"Subject": "Drama",
"Actor": "Gere, Richard",
"Actress": "Adams, Brooke",
"Director": "Malick, Terrence",
"Popularity": "14",
"Awards": "No",
"*Image": "NicholasCage.png",
}
要獲得演員的計數器,您可以使用collections.Counter
:
cnt = Counter(d["Actor"] for d in data)
print(cnt)
印刷:
Counter(
{
"Banderas, Antonio": 1,
"Bosé, Miguel": 1,
"Walken, Christopher": 1,
"Connery, Sean": 1,
"Gere, Richard": 1,
"Moore, Roger": 1,
}
)
這將打印出演員的名字是理查德的所有電影片名,在 1985 年之前制作並且獲獎 == 是的:
filter = {}
lines = open('test.csv', 'r').readlines()
columns = lines[0].strip().split(';')
lines.pop(0)
for i in lines:
x = i.strip().split(';')
# Checking if the movie was made before 1985
if int(x[columns.index('Year')]) < 1985:
# Checking if the actor's first name is Richard
if x[columns.index('Actor')].split(', ')[1] == 'Richard':
# Checking if awards == Yes
if x[columns.index('Awards')] == 'Yes':
# Printing out the title of the movie
print(x[columns.index('Title')])
計算任何給定的演員是否出現在列表中:
name = "Gere, Richard" # Given actor name
count = 0
for i in lines:
x = i.strip().split(';')
# Checking if the actor's name is the given name
if x[columns.index('Actor')] == name:
# If it is, add 1 to the count
count += 1
Output:計數:1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.