简体   繁体   English

如何在Python 3中从文件创建频率表作为列表?

[英]How to create a frequency table from a file as a list in Python 3?

I am trying to take results from a survey (in a txt file). 我正在尝试从调查中获取结果(在txt文件中)。 This file includes results about major, age, sex, whether they work, whether they have children, and whether they own a PC (those last three are represented as 0 and 1 in the data file for no and yes). 该文件包含有关专业,年龄,性别,是否工作,是否有孩子以及是否拥有PC的结果(数据文件中的后三位分别用no和yes表示为0和1)。 I want to calculate and create relative frequency lists based on their age group and draw a bar chart for them. 我想根据他们的年龄段来计算和创建相对频率列表,并为其绘制条形图。 I need to read the file into memory and break up the lines into fields. 我需要将文件读入内存并将行分成字段。 I need to create three lists based on Work, Have Children, and Own PC. 我需要根据工作,有孩子和自己的PC创建三个列表。 The age group is a string, I need a count "column" that counts the number of yes answers, I need a column representing the total responses for each age group, and I need a percent column for each age group which is the count/total*100. 年龄段是一个字符串,我需要一个计数“列”来计数是答案的数量,我需要一个列来代表每个年龄段的总答复,并且我需要一个百分比列来显示每个年龄段的计数/合计* 100 The ideal output should look like this: 理想的输出应如下所示:

[["<=20",6,7,85.71429],["21-23",5,6,83.33333],...,[">=30",5,12,41.66667]]

I have written code for this but it's not returning anything at all, and I'm not sure if the way that I went about this was correct (I am a beginner and am trying to learn by myself). 我已经为此编写了代码,但是它根本不返回任何东西,而且我不确定我的处理方式是否正确(我是初学者,正在尝试自己学习)。

def process_file(filename,index):
    infile=open(filename, "r")
    frequencyTable = [["<=20", 0,0,0,],["21-23",0,0,0],["24-26",0,0,0],["27-29",0,0,0],[">=30",0,0,0]]

    firstLine = True
    for line in filename:
        if firstLine == True:
            firstLine = False
        continue
        columns = line.split(',')
        columns[-1] = columns[-1].split("\n")[0]
        if int(columns[1] <= 20):
            Table[0][1] +=int(columns(ColumnNum))
            Table[0][2] += 1
            Table[0][3] = (Table[0][1]/Table[0][2]) *100
        elif int(columns[1]) >=21 and int(columns[1]) <=23:
            Table[1][1] +=int(columns[index])
            Table[1][2] += 1
            Table[1][3] = (Table[1][1]/Table[1][2]) *100
        elif int(columns[1]) >=24 and int(columns[1]) <=26:
            Table[2][1] +=int(columns[index])
            Table[2][2] += 1
            Table[2][3] = (Table[2][1]/Table[2][2]) *100
        elif int(columns[1]) >=27 and int(columns[1]) <=29:
            Table[3][1] +=int(columns[index])
            Table[3][2] += 1
            Table[3][3] = (Table[3][1]/Table[3][2]) *100
        elif int(columns[1]) >=30:
            Table[4][1] +=int(columns[index])
            Table[4][2] += 1
            Table[4][3] = (Table[4][1]/Table[4][2]) *100
            firstline = False
    return Table
    infile.close()

If anyone has any input on this, any and all help is appreciated! 如果有人对此有任何意见,我们将不胜感激!

I won't rewrite your entire code, but you might want to try using a dictionary 我不会重写您的整个代码,但是您可能想尝试使用字典

results={'major': [],'age': [],'sex': [],'work': [],'kids': [],'pc': []}

then using something like 然后使用类似

freq={}

for k,v in results:
    for i in set(v): freq[k]=v.count(i)

Three main things to look at. 要看的三件事。

  • Print. 打印。 if you want to see an output you are going to need to print. 如果要查看输出,则需要打印。
  • Return. 返回。 Return shoud be inside a function, it doesnt work otherwise. 返回应该在函数内部,否则它将不起作用。
  • Indenting. 缩进。 Indentation is meaningful in python so you need to be careful with it. 缩进在python中很有意义,因此您需要谨慎使用。

Now, i couldn't quite work out the structure of your text file from your question. 现在,我不能完全根据您的问题得出文本文件的结构。 if you give me a direct example i can show you how to get that code working precisely. 如果您给我一个直接的例子,我可以向您展示如何使该代码准确地工作。 However i've added a simple example: 但是,我添加了一个简单的示例:

Given a text file like: 给定一个文本文件,如:

50,m,1,0
14,f,0,1
30,f,1,1
90,f,1,0

Representing age, gender, has children, and has pc. 代表年龄,性别,有孩子和个人电脑。

def process_file(filename):
    infile=open(filename, "r")
    Table = [["<=20", 0,0,0,0],[">=20",0,0,0,0]]
    for line in infile:
        columns = line.split(',')
        if(int(columns[0]) <=20):
            if(columns[1]== "m"):
                Table[0][1] += 1
            else:
                Table[0][2] +=1
            Table[0][3] += int(columns[2])
            Table[0][4] += int(columns[3])
        else:
            if(columns[1]== "m"):
                Table[1][1] += 1
            else:
                Table[1][2] +=1
            Table[1][3] += int(columns[2])
            Table[1][4] += int(columns[3])
    return Table

print(process_file("test.txt"))

Gives an output of: 给出以下输出:

[['<=20',0,1,0,1]['>=20',1,2,3,1]]

Meaning 含义

[Age group, men, women, has kids, has pc]

Improving past base functionality. 改善过去的基本功能。

Why both my example and your program a very simple it would be considered good design to consider improving your abstractions. 为什么我的示例和您的程序都非常简单,但考虑改进您的抽象被认为是好的设计。 Things like the dictionary which has been mentioned will help with this. 诸如已提到的字典之类的事情将对此有所帮助。 So would things like classes and objects. 类和对象之类的东西也是如此。

_AGES = (
    (20, "<= 20"),
    (23, "21-23"),
    (26, "24-26"),
    (29, "27-29"),
    (9999, ">= 30"),
)

def age_group(age:int) -> str:
    if age < 0:
        raise ValueError("Age must be a positive number")

    for ai in _AGES:
        if age <= ai[0]:
            return ai[1]

    raise ValueError("Age {} is completely out of range")

def process_file(path, column):

    tabulation = {}

    with open(path, 'r') as infile:

        infile.readline()

        for line in infile:
            columns = line.strip().split(',')
            group = age_group(int(columns[1]))

            if group not in tabulation:
                tabulation[group] = {
                    "count": 0,
                    "sum": 0,
                    "percentage": 0,
                }

            tabgroup = tabulation[group]
            tabgroup['count'] += 1
            tabgroup['sum'] += int(columns[column])
            tabgroup['percentage'] = (100.0 * tabgroup['sum'] / tabgroup['count'])

    return tabulation

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM