简体   繁体   English

如何过滤python中的csv

[英]how to filter csv in python

I have a csv file named film.csv the title of each column is as follows (with a couple of example rows):我有一个名为 film.csv 的文件 csv 每列的标题如下(带有几个示例行):

Year;Length;Title;Subject;Actor;Actress;Director;Popularity;Awards;*Image
1990;111;Tie Me Up! Tie Me Down!;Comedy;Banderas, Antonio;Abril, Victoria;Almodóvar, Pedro;68;No;NicholasCage.png
1991;113;High Heels;Comedy;Bosé, Miguel;Abril, Victoria;Almodóvar, Pedro;68;No;NicholasCage.png
1983;104;Dead Zone, The;Horror;Walken, Christopher;Adams, Brooke;Cronenberg, David;79;No;NicholasCage.png
1979;122;Cuba;Action;Connery, Sean;Adams, Brooke;Lester, Richard;6;No;seanConnery.png
1978;94;Days of Heaven;Drama;Gere, Richard;Adams, Brooke;Malick, Terrence;14;No;NicholasCage.png
1983;140;Octopussy;Action;Moore, Roger;Adams, Maud;Glen, John;68;No;NicholasCage.png

I need to parse this csv with basic command (not using Pandas)我需要用基本命令解析这个 csv(不使用 Pandas)

  1. How would I extract all movie titles with the actor first name = Richard, made before year 1985, and award = yes?我将如何提取演员名字 = Richard、1985 年之前制作且奖项 = yes 的所有电影片名? (I have been able to get it to show lisy where awards == yes, but not the rest) (我已经能够让它显示 lisy where awards == yes,但不是其余的)

  2. How can I count how many times any given actor appears in the list?我如何计算任何给定演员在列表中出现的次数?

file_name = "film.csv"
print('loading file')
lines = (line for line in open(file_name,encoding='cp1252')) #generator to capture lines
print('removing ;')
lists = (s.rstrip().split(";") for s in lines) #generators to capture lists containing values from lines

print('2-filter by awards')
sel = input()

if sel == '2': 
cols=next(lists) #obtains only the header
    print(cols)
    collections = (dict(zip(cols,data)) for data in lists)
    
    filtered = (col["Title"] for col in collections if col["Awards"][0]== "Y")
    for item in filtered:
        print(item)
    #   input()

        
#browse lists and index them per header values, then filter all movies that have been awarded
#using a new generator object
else: 
    

To read and filter the data you can use next example (I'm using award == No , because you don't have movie with award == Yes and other criteria in your example):要读取和过滤数据,您可以使用下一个示例(我正在使用award == No ,因为您的示例中没有带有award == Yes和其他条件的电影):

import csv
from collections import Counter

with open("data.csv", "r") as f_in:
    reader = csv.DictReader(f_in, delimiter=";")
    data = list(reader)

# extract all movie titles with the actor first name = Richard , made before year 1985 , and award = No

for d in data:
    if (
        d["Actor"].split(", ")[-1] == "Richard"
        and int(d["Year"]) < 1985
        and d["Awards"] == "No"
    ):
        print(d)

Prints:印刷:

{
    "Year": "1978",
    "Length": "94",
    "Title": "Days of Heaven",
    "Subject": "Drama",
    "Actor": "Gere, Richard",
    "Actress": "Adams, Brooke",
    "Director": "Malick, Terrence",
    "Popularity": "14",
    "Awards": "No",
    "*Image": "NicholasCage.png",
}

To get counter of actors you can use collections.Counter :要获得演员的计数器,您可以使用collections.Counter

cnt = Counter(d["Actor"] for d in data)
print(cnt)

Prints:印刷:

Counter(
    {
        "Banderas, Antonio": 1,
        "Bosé, Miguel": 1,
        "Walken, Christopher": 1,
        "Connery, Sean": 1,
        "Gere, Richard": 1,
        "Moore, Roger": 1,
    }
)

This will print out all movie titles that the actor's first name is Richard, made before 1985 and awards == Yes:这将打印出演员的名字是理查德的所有电影片名,在 1985 年之前制作并且获奖 == 是的:

filter = {}
lines = open('test.csv', 'r').readlines()
columns = lines[0].strip().split(';')

lines.pop(0)

for i in lines:
    x = i.strip().split(';')
    # Checking if the movie was made before 1985
    if int(x[columns.index('Year')]) < 1985:
        # Checking if the actor's first name is Richard
        if x[columns.index('Actor')].split(', ')[1] == 'Richard':
            # Checking if awards == Yes
            if x[columns.index('Awards')] == 'Yes':
                # Printing out the title of the movie
                print(x[columns.index('Title')])

Counting if any given actor appears in the list:计算任何给定的演员是否出现在列表中:

name = "Gere, Richard" #   Given actor name

count = 0
for i in lines:
    x = i.strip().split(';')
    # Checking if the actor's name is the given name
    if x[columns.index('Actor')] == name:
        # If it is, add 1 to the count
        count += 1

Output: count: 1 Output:计数:1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM