简体   繁体   English

通过python中的嵌套列表进行二进制搜索

[英]Binary search through nested list in python

I have a homework question asking:我有一个家庭作业问题:

Write a function called readCountries that reads a file and returns a list of countries.编写一个名为 readCountries 的函数,它读取一个文件并返回一个国家列表。 The countries should be read from this file (countries.txt), which contains an incomplete list of countries with their area and population.应该从这个文件 (countries.txt) 中读取国家/地区,其中包含不完整的国家/地区及其面积和人口列表。 Each line in this file represents one country in the following format:此文件中的每一行代表一个国家,格式如下:

 name, area(in km2), population

When opening the file your function should handle any exceptions that may occur.打开文件时,您的函数应处理可能发生的任何异常。 Your function should completely read in the file, and separate the data into a 2-dimensional list.你的函数应该完全读入文件,并将数据分成一个二维列表。 You may need to split and strip the data as appropriate.您可能需要根据需要拆分和剥离数据。 Numbers should be converted to their correct types.数字应转换为正确的类型。 Your function should return this list so that you can use it in the remaining questions.您的函数应返回此列表,以便您可以在其余问题中使用它。

I have a text file called "countries.txt" with a list of a bunch of countries, their area, and their population.我有一个名为“countries.txt”的文本文件,其中包含一堆国家、地区和人口的列表。

Sample of "countries.txt": “countries.txt”示例:

Afghanistan,    647500.0,   25500100
Albania,    28748.0,    2821977
Algeria,    2381740.0,  38700000

This is the code I have and it works:这是我拥有的代码并且它有效:

def readCountries(filename):
    '''read a file and print it to the screen'''
    countryList = []
    for line in open(filename):
        with open(filename) as aFile:
            countries = aFile.read()
            countryList.append(line.strip().split())
    aFile.close()

    return countryList 

Sample of output when I ran the question:我运行问题时的输出示例:

>>> countryList = readCountries("countries.txt")
>>> countryList
[['Afghanistan,', '647500.0,', '25500100'], ['Albania,', '28748.0,', '2821977'], ['Algeria,', '2381740.0,', '38700000']

The next question asks:下一个问题问:

Write a function called printCountry that takes a string representing a country name as a parameter.编写一个名为 printCountry 的函数,它接受一个代表国家名称的字符串作为参数。 First call your answer from question 1 to get the list of countries, then do a binary search through the list and print the country's information if found.首先调用问题 1 中的答案以获取国家/地区列表,然后对列表进行二分搜索,如果找到,则打印该国家/地区的信息。 And should print out:并且应该打印出来:

 printCountry("Canada") Canada, Area: 9976140.0, Population: 35295770 printCountry("Winterfell") I'm sorry, could not find Winterfell in the country list.

But I can't figure it out.但我想不通。

When I tried to do the coding for this question, I typed:当我尝试为这个问题进行编码时,我输入了:

countryList = readCountries("countries.txt")  
def printCountry(name):
    lo, hi = 0, len(countryList) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        country = countryList[mid]
        test_name = country[0]
        if name > test_name:
            lo = mid + 1
        elif name < test_name:
            hi = mid - 1
        else:
            return country[0] + ", Area: " + str(country[1]) + ",    Population: " + str(country[2])
    return "I'm sorry can not find " + str(name)

and the result was:结果是:

>>> printCountry("Canada")
'Sorry can not find Canada'

even though Canada is in the text.即使加拿大在文本中。 Where did I go wrong?我哪里做错了?

Your binary search code is (mostly) ok, but there are a couple of problems in your code that reads in the list of countries.您的二进制搜索代码(大部分)没问题,但是您的代码中存在一些问题,可以读取国家/地区列表。

Your file opening & reading code is strange.你的文件打开和阅读代码很奇怪。 It's like you've combined two different approaches to reading data, so you are opening the file multiple times.这就像您结合了两种不同的数据读取方法,因此您要多次打开文件。

Fortunately, the effects of these lines:幸运的是,这些行的效果:

with open(filename) as aFile:
    countries = aFile.read()

don't affect the output of the readCountries function because you don't do anything else with countries .不影响输出readCountries工作,因为你没有做任何其他countries

Also, in the description of your assignment it says to "strip the data as appropriate. Numbers should be converted to their correct types", which your code doesn't do.此外,在您的作业描述中,它说“根据需要剥离数据。数字应转换为正确的类型”,而您的代码没有这样做。 And as my hint above implied, that means the country names in your list still had the commas attached to them, so the binary search couldn't find them (unless you included the comma in the search name).正如我上面的提示所暗示的那样,这意味着您列表中的国家/地区名称仍然带有逗号,因此二进制搜索无法找到它们(除非您在搜索名称中包含逗号)。

Anyway, here's a cleaned up version that's designed to run on Python 2.6 or later.无论如何,这里有一个经过清理的版本,旨在在 Python 2.6 或更高版本上运行。

from __future__ import print_function

def readCountries(filename):
    countryList = []
    with open(filename) as aFile:
        for line in aFile:
            line = line.strip().split()
            #Remove anny trailing commas from each field
            line = [s.rstrip(',') for s in line]
            #Convert area to float and population to int
            line = [line[0], float(line[1]), int(line[2])]
            #print line
            countryList.append(line)
    return countryList

countryList = readCountries("countries.txt")

def printCountry(name):
    lo, hi = 0, len(countryList) - 1
    while lo <= hi:
        mid = lo + (hi - lo) // 2
        country = countryList[mid]
        test_name = country[0]
        if name > test_name:
            lo = mid + 1
        elif name < test_name:
            hi = mid - 1
        else:
            print('  {0}, Area: {1}, Population: {2}'.format(*country))
            break
    else:
        print("  I'm sorry, could not find {0} in the country list.".format(name))

#tests
printCountry("Canada")
printCountry("Winterfell")

print('- ' * 20)

#make sure we can find the first & last countries.
printCountry("Afghanistan")
printCountry("Nowhere")

Here's the data file I ran it on:这是我运行它的数据文件:

countries.txt国家.txt

Afghanistan,    647500.0,   25500100
Albania,    28748.0,    2821977
Algeria,    2381740.0,  38700000
Canada,     9976140.0,  35295770
Nowhere,    1000.0      2345678

And this is the output it produced:这是它产生的输出:

  Canada, Area: 9976140.0, Population: 35295770
  I'm sorry, could not find Winterfell in the country list.
- - - - - - - - - - - - - - - - - - - - 
  Afghanistan, Area: 647500.0, Population: 25500100
  Nowhere, Area: 1000.0, Population: 2345678

确保在对列表进行二进制搜索之前对列表进行排序:

countryList.sort()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM