简体   繁体   English

从.txt文件读取统计信息并输出

[英]Reading statistics from a .txt file and outputting them

I am supposed to get certain information from a .txt file and output it. 我应该从.txt文件中获取某些信息并将其输出。 This is the information I need: 这是我需要的信息:

  • State with the maximum population 人口最多的州
  • State with the minimum population 人口最少的州
  • Average state population 州平均人口
  • State of Texas population 德克萨斯州人口

The DATA looks like: 数据看起来像:

Alabama
AL
4802982
Alaska
AK
721523
Arizona
AZ
6412700
Arkansas
AR
2926229
California
CA
37341989

This is my code that does not really do anything I need it to do: 这是我的代码,实际上并没有做我需要做的任何事情:

def main():
    # Open the StateCensus2010.txt file.
    census_file = open('StateCensus2010.txt', 'r')
    # Read the state name
    state_name = census_file.readline()

    while state_name != '':
        state_abv = census_file.readline()
        population = int(census_file.readline())

        state_name = state_name.rstrip('\n')
        state_abv = state_abv.rstrip('\n')

        print('State Name: ', state_name)
        print('State Abv.: ', state_abv)
        print('Population: ', population)
        print()

        state_name = census_file.readline()
    census_file.close()
main()

All I have it doing is reading the state name, abv and converting the population into an int. 我所要做的就是读取州名称,abv并将人口转换为int。 I don't need it to do anything of that, however I'm unsure how to do what the assignment is asking. 我不需要它来做任何事情,但是我不确定该怎么做作业。 Any hints would definitely be appreciated! 任何提示将不胜感激! I've been trying some things for the past few hours to no avail. 在过去的几个小时里,我一直在尝试某些事情,但没有成功。

Update: 更新:

This is my updated code however I'm receving the following error: 这是我的更新代码,但是我收到以下错误:

Traceback (most recent call last):
  File "main.py", line 13, in <module>
    if population > max_population:
TypeError: unorderable types: str() > int()

Code: 码:

with open('StateCensus2010.txt', 'r') as census_file:
    while True:
        try:
            state_name = census_file.readline()
            state_abv = census_file.readline()
            population = int(census_file.readline())
        except IOError:
            break

        # data processing here
        max_population = 0
        for population in census_file:
          if population > max_population:
            max_population = population

        print(max_population)

As the data is in consistent order; 由于数据的顺序一致; Statename, State Abv, Population. 州名,州平均,人口。 So you just need to read the lines one time, and display all three 3 information. 因此,您只需要阅读一行,并显示所有三个3信息。 Below is the sample code. 下面是示例代码。

average = 0.0
total = 0.0
state_min = 999999999999
state_max = 0
statename_min = ''
statename_max = ''
texas_population = 0
with open('StateCensus2010.txt','r') as file:
    # split new line, '\n' here means newline

    data = file.read().split('\n')

    # get the length of the data by using len() method
    # there are 50 states in the text file
    # each states have 3 information stored,
    # state name, state abreviation, population
    # that's why length of data which is 150/3 = 50 states
    state_total = len(data)/3 


    # this count is used as an index for the list 
    count = 0
    for i in range(int(state_total)):

        statename = data[count]
        state_abv = data[count+1]
        population = int(data[count+2])

        print('Statename : ',statename)
        print('State Abv : ',state_abv)
        print('Population: ',population)
        print()

        # sum all states population
        total += population

        if population > state_max:
            state_max = population
            statename_max = statename

        if population < state_min:
            state_min = population
            statename_min = statename

        if statename == 'Texas':
            texas_population = population


        # add 3 because we want to jump to next state
        # for example the first three lines is Alabama info
        # the next three lines is Alaska info and so on
        count += 3


    # divide the total population with number of states 
    average = total/state_total
    print(str(average))

    print('Lowest population state :', statename_min)
    print('Highest population state :', statename_max)
    print('Texas population :', texas_population)

This problem is pretty easy using pandas. 使用熊猫这个问题很容易。

Code: 码:

states = []
for line in data:
    states.append(
        dict(state=line.strip(),
             abbrev=next(data).strip(),
             pop=int(next(data)),
             )
    )

df = pd.DataFrame(states)
print(df)

print('\nmax population:\n', df.ix[df['pop'].idxmax()])
print('\nmin population:\n', df.ix[df['pop'].idxmin()])
print('\navg population:\n', df['pop'].mean())
print('\nAZ population:\n', df[df.abbrev == 'AZ'])

Test Data: 测试数据:

from io import StringIO
data = StringIO(u'\n'.join([x.strip() for x in """
    Alabama
    AL
    4802982
    Alaska
    AK
    721523
    Arizona
    AZ
    6412700
    Arkansas
    AR
    2926229
    California
    CA
    37341989
""".split('\n')[1:-1]]))

Results: 结果:

  abbrev       pop       state
0     AL   4802982     Alabama
1     AK    721523      Alaska
2     AZ   6412700     Arizona
3     AR   2926229    Arkansas
4     CA  37341989  California

max population:
abbrev            CA
pop         37341989
state     California
Name: 4, dtype: object

min population:
abbrev        AK
pop       721523
state     Alaska
Name: 1, dtype: object

avg population:
10441084.6

AZ population:
  abbrev      pop    state
2     AZ  6412700  Arizona

Please try this the earlier code was not python 3 compatible. 请尝试一下,早期的代码与python 3不兼容。 It supported python 2.7 它支持python 2.7

    def extract_data(state):
        total_population = 0
        for states, stats in state.items():
            population = stats.get('population')
            state_name = stats.get('state_name')
            states = states

        total_population = population + total_population

        if 'highest' not in vars():
            highest = population
            higherst_state_name = state_name
            highest_state = states

        if 'lowest' not in vars():
            lowest = population
            lowest_state_name = state_name
            lowest_state = states

        if highest < population:
            highest = population
            higherst_state_name = state_name
            highest_state = states        

        if lowest > population:
            lowest = population
            lowest_state_name = state_name
            lowest_state = states


    print(highest_state, highest)
    print(lowest_state, lowest)
    print(len(state))
    print(int(total_population/len(state)))
    print(state.get('TX').get('population'))

def main():
    # Open the StateCensus2010.txt file.
    census_file = open('states.txt', 'r')
    # Read the state name
    state_name = census_file.readline()
    state = {}


    while state_name != '':
        state_abv = census_file.readline()
        population = int(census_file.readline())
        state_name = state_name.rstrip('\n')
        state_abv = state_abv.rstrip('\n')

        if state_abv in state:
            state[state_abv].update({'population': population, 'state_name': state_name})
        else:
            state.setdefault(state_abv,{'population': population, 'state_name': state_name})

        state_name = census_file.readline()        
    census_file.close()
    return state

state=main()
extract_data(state)

Another pandas solution, from the interpreter: 来自解释器的另一种pandas解决方案:

>>> import pandas as pd
>>>
>>> records = [line.strip() for line in open('./your.txt', 'r')]
>>>
>>> df = pd.DataFrame([records[i:i+3] for i in range(0, len(records), 3)], 
...     columns=['State', 'Code', 'Pop']).dropna()
>>>
>>> df['Pop'] = df['Pop'].astype(int)
>>>
>>> df
        State Code       Pop
0     Alabama   AL   4802982
1      Alaska   AK    721523
2     Arizona   AZ   6412700
3    Arkansas   AR   2926229
4  California   CA  37341989
>>>
>>> df.ix[df['Pop'].idxmax()]
State    California
Code             CA
Pop        37341989
Name: 4, dtype: object
>>>
>>> df.ix[df['Pop'].idxmin()]
State    Alaska
Code         AK
Pop      721523
Name: 1, dtype: object
>>>
>>> df['Pop'].mean()
10441084.6
>>>
>>> df.ix[df['Code'] == 'AZ' ]
     State Code      Pop
2  Arizona   AZ  6412700

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM