简体   繁体   English

Python 字典排序和打印

[英]Python dictionary ordering and printing

I have a file with mountains and their height and their mountains ranges.我有一个包含山脉及其高度和山脉范围的文件。 I have to extract from the csv file containing these data, the mountains ranges which each heights (as there is fewer mountains for each ranges in the csv), and I also have to count how many time each range appears.我必须从包含这些数据的 csv 文件中提取每个高度的山脉(因为 csv 中每个山脉的山脉较少),我还必须计算每个山脉出现的次数。 Then my program should print the 2 first mountain ranges and their heights然后我的程序应该打印前 2 个山脉和它们的高度

Here the guidelines:这里的指导方针:

Rewrite your function to use the collections module's Counter to count how many times each mountain range is mentioned.重写您的 function 以使用 collections 模块的计数器来计算每个山脉被提及的次数。 Each row contains a mountain, its height, and the range it is part of.每行包含一座山、它的高度和它所在的范围。 The ranges are still in the 3rd column of the mountains.csv file.范围仍在山的第 3 列中。csv 文件。 You can use more than one function if you want.如果需要,您可以使用多个 function。

        with open('mountains.csv', 'r') as handle:
            for line in handle:

Shishapangma,8027,Himalayas,

I tried to modify your code as little as possible based on the other comments.我试图根据其他评论尽可能少地修改您的代码。 This should achieve what you are looking for using your existing function/code.这应该可以使用现有的功能/代码实现您正在寻找的东西。

from collections import Counter
from collections import defaultdict
from statistics import mean

def mountain_ranges(filename):
    ranges = Counter()
    heights = defaultdict(list)  
    msg = "The average height of {} is {} meters."
    err_msg = "Error: File doesn't exist or is unreadable."
    dictionary = {}

    try:
        with open('mountains.csv', 'r') as handle:
            for line in handle:
                mnt, height, range_, _ = line.strip().split(',')
                ranges[range_] += 1
                heights[range_].append(int(height))

    except IOError:
        print(err_msg)
    print("The two most common ranges are:\n")
    print([r[0] for r in ranges.most_common(2)])

    for range_ in heights:
        print(msg.format(range_, mean(heights[range_])))

You add all key/value -pairs in dictionary to your ranges and heights variables every time you handle a line in csv file.每次处理 csv 文件中的一行时,都将字典中的所有键/值对添加到范围和高度变量中。

Try like this:试试这样:

with open('mountains.csv', 'r') as handle:
    for line in handle:
        key = line.replace('"', '').strip()
        k = key.split(",")
        # You always overwrited value here...
        # dictionary[k[2]] = k[1]
        if k[2] not in dictionary:
             dictionary[k[2]] = []
        dictionary[k[2]].append(k[1])
    # moved these loops out of upper for-loop
    for key, value in dictionary.items():
        ranges[key] += 1  
    for key, value in dictionary.items():
        heights[key].append(value)

And to get rid of that check if key exists use defaultdict from collections并摆脱该检查是否存在密钥使用collectionsdefaultdict

It is easier to use the csv module to read data like this.像这样使用csv 模块读取数据更容易。 Since order is meaningful, use a list:由于 order 有意义,请使用列表:

import csv
with open('mountains.csv') as f:
    data=[[mtn,int(height),range] for (mtn,height,range,_) in csv.reader(f)]

# the extra _ is because you have an extra blank field in your example 
# from the trailing ,

Or, if you cannot use csv, in this simple case, you can replicate the function like this:或者,如果您不能使用 csv,在这种简单的情况下,您可以像这样复制 function:

with open('mountain.csv') as f:
    data=[[mtn,int(height),rng] 
             for (mtn,height,rng,_) in (line.split(',') for line in f)]

That is it: Then you have the data needed to answer all other questions:就是这样:然后你就有了回答所有其他问题所需的数据:

print(data)

[['Mount Everest', 8848, 'Himalayas'], ['K2', 8611, 'Karakoram'], 
 ['Kangchenjunga', 8586, 'Himalayas'], ['Lhotse', 8516, 'Himalayas'], 
 ['Makalu', 8485, 'Himalayas'], ['Cho Oyu', 8201, 'Himalayas'], 
 ['Dhaulagiri', 8167, 'Himalayas'], ['Manaslu', 8163, 'Himalayas'], 
 ['Nanga Parbat', 8126, 'Himalayas'], ['Annapurna', 8091, 'Himalayas'], 
 ['Gasherbrum I', 8080, 'Karakoram'], ['Broad Peak', 8051, 'Karakoram'], 
 ['Gasherbrum II', 8035, 'Karakoram'], ['Shishapangma', 8027, 'Himalayas']]

from collections import Counter 
print(Counter(row[2] for row in data))

# prints Counter({'Himalayas': 10, 'Karakoram': 4})

print(f'Average height of all mountains: {sum(row[1] for row in data)/len(data)}')

# prints "Average height of all mountains: 8284.785714285714"

And by range:并按范围:

for rng, cnt in (Counter(row[2] for row in data).items()):
    print(f'Average for {rng}: {sum(row[1] for row in data if row[2]==rng)/cnt}')   

Prints:印刷:

Average for Himalayas: 8321.0
Average for Karakoram: 8194.25

You could try something like this:你可以尝试这样的事情:

from collections import Counter
from collections import defaultdict
from collections import OrderedDict
from statistics import mean   # this also exists in numpy if you prefer
# define your dicts inside the function, so they can be re-used each time it is called.


def mountain_ranges(filename):
    ranges = Counter()
    heights = defaultdict(list)
    msg = "The height of {} is {} meters."
    err_msg = "Error: File doesn't exist or is unreadable."
    dictionary = {}

    temp_list = []
    try:
        with open('mountains.csv', 'r') as handle:


            for row in handle:
                temp_list.append(row.split(','))

            for item in temp_list:
                    ranges[item[2].rstrip()] += 1  
                    heights[item[2].rstrip()].append(item[1]) 


    except IOError:
        print(err_msg)
    ranges = OrderedDict(ranges.most_common(2))
    print(ranges)
    print(heights)

Output: Output:

OrderedDict([('Himalayas', 10), ('Karakoram', 4)])
defaultdict(<class 'list'>, {'Himalayas': ['8848', '8586', '8516', '8485', '8201', '8167', '8163', '8126', '8091', '8027'], 'Karakoram': ['8611', '8080', '8051', '8035']})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM