简体   繁体   English

计算导入的.csv文件中的多次出现

[英]Count multiple occurrences in imported .csv file

Starting from a large imported data set, I am trying to identify and print each line corresponding to a city that has at least 2 unique colleges/universities there. 从大量导入的数据集开始,我试图识别和打印与城市中至少有2个独特的大学/大学相对应的每一行。

So far (the relevant code): 到目前为止(相关代码):

for line in file:

    fields = line.split(",")
    ID, name, city = fields[0], fields[1], fields[3]
    count = line.count()

if line.count(city) >= 2:
    if line.count(ID) < 2:
    print "ID:", ID, "Name: ", name, "City: ", city

In other words, I want to be able to eliminate 1) any duplicate school listings (by ID - this file has many institutions appearing repeatedly), 2) any cities that do not have two or more institutions there. 换句话说,我希望能够消除1)任何重复的学校列表(通过ID-此文件有很多重复出现的机构),2)任何没有两个或更多机构的城市。

Thank you! 谢谢!

dicts come in handy when you want to order data by some key. 当您想通过某个键来订购数据时,字典就派上用场了。 In your case, nested dicts that first index by city and then by ID should do the trick. 在您的情况下,嵌套字典将首先实现按城市索引然后按ID索引的目的。

# will hold cities[city][ID] = [ID, name, city]
cities = {}

for line in file:
    fields = lines.split()
    ID, name, city = fields
    cities.setdefault(name, {})[ID] = fields

# 'cities' values are the IDs for that city. make a list if there are at least 2 ids
multi_schooled_cities = [ids_by_city.values() for ids_by_city in cities.values() if len(ids_by_city) >= 2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM