如何在 Python 中处理来自 xlsx 文件的数据

Question

These are the named ranges in an uploaded xlsx sheet, the titles are annoying and I wanted to classify them for easier calling in throughout the code.这些是上传的 xlsx 表中的命名范围，标题很烦人，我想对它们进行分类，以便在整个代码中更容易调用。

Fairly new, and unsure how I would be able to make the below look cleaner and be more efficient if I was going to add more named ranges相当新，如果我要添加更多命名范围，我不确定如何使下面看起来更干净和更高效

VIC_Male = 'Estimated Resident Population ;  Male ;  Victoria ;'
QL_Male = 'Estimated Resident Population ;  Male ;  Queensland ;'
SA_Male = 'Estimated Resident Population ;  Male ;  South Australia ;'
WA_Male = 'Estimated Resident Population ;  Male ;  Western Australia ;'
TAS_Male = 'Estimated Resident Population ;  Male ;  Tasmania ;'
NT_Male = 'Estimated Resident Population ;  Male ;  Northern Territory ;'
ACT_Male = 'Estimated Resident Population ;  Male ;  Australian Capital Territory ;'
TOTAL_Male = 'Estimated Resident Population ;  Male ;  Australia ;'
NSW_Female = 'Estimated Resident Population ;  Female ;  New South Wales ;'
VIC_Female = 'Estimated Resident Population ;  Female ;  Victoria ;'
QL_Female = 'Estimated Resident Population ;  Female ;  Queensland ;'
SA_Female = 'Estimated Resident Population ;  Female ;  South Australia ;'
WA_Female = 'Estimated Resident Population ;  Female ;  Western Australia ;'
TAS_Female = 'Estimated Resident Population ;  Female ;  Tasmania ;'
NT_Female = 'Estimated Resident Population ;  Female ;  Northern Territory ;'
ACT_Female = 'Estimated Resident Population ;  Female ;  Australian Capital Territory ;'
TOTAL_Female = 'Estimated Resident Population ;  Female ;  Australia ;'
NSW_Persons = 'Estimated Resident Population ;  Persons ;  New South Wales ;'
VIC_Persons = 'Estimated Resident Population ;  Persons ;  Victoria ;'
QL_Persons = 'Estimated Resident Population ;  Persons ;  Queensland ;'
SA_Persons = 'Estimated Resident Population ;  Persons ;  South Australia ;'
WA_Persons = 'Estimated Resident Population ;  Persons ;  Western Australia ;'
TAS_Persons = 'Estimated Resident Population ;  Persons ;  Tasmania ;'
NT_Persons = 'Estimated Resident Population ;  Persons ;  Northern Territory ;'
ACT_Persons = 'Estimated Resident Population ;  Persons ;  Australian Capital Territory ;'
TOTAL_Persons = 'Estimated Resident Population ;  Persons ;  Australia ;'```

Answer 1

Let's say you have this csv file (I added titles here in the first line but you can also have the same file without title, in the code bellow i commented the line that you can remove if you do not have titles):假设你有这个 csv 文件（我在第一行添加了标题，但你也可以有没有标题的相同文件，在下面的代码中，我评论了如果你没有标题可以删除的行）：

"ResultType;Gender;Country
Estimated Resident Population ;  Male ;  Victoria ;
Estimated Resident Population ;  Male ;  Queensland ;
Estimated Resident Population ;  Male ;  South Australia ;
Estimated Resident Population ;  Male ;  Western Australia ;
Estimated Resident Population ;  Male ;  Tasmania ;
Estimated Resident Population ;  Male ;  Northern Territory ;
"

You can begin by making a data structure that corresponds to your data:您可以从创建与您的数据对应的数据结构开始：


class Record():
    def __init__(self, ResultType, Gender, Country):
        self.ResultType = ResultType
        self.Gender = Gender
        self.Country = Country

Then create an empty list然后创建一个空列表

My_records = []

Then open the csv file with the csv library and for each line of it create an instance of your data structure (here the Record class).然后使用 csv 库打开 csv 文件，并为它的每一行创建数据结构的实例（此处为Record类）。

with open('records.txt') as csv_file:

    csv_reader = csv.reader(csv_file, delimiter=';')
    line_count = 0
    for row in csv_reader:
        #You can remove this part if your csv file has no column name lines
        if line_count == 0:
            print(f'Column names are {", ".join(row)}') #
            line_count += 1
        else:
            instance = Record(row[0], row[1], row[2])
            My_records.append(instance)

All in one:一体：


import csv

class Record():
    def __init__(self, ResultType, Gender, Country):
        self.ResultType = ResultType
        self.Gender = Gender
        self.Country = Country
My_records = []
with open('records.txt') as csv_file:

    csv_reader = csv.reader(csv_file, delimiter=';')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        else:
            instance = Record(row[0], row[1], row[2])
            My_records.append(instance)

Now the My_records list is a list filled with each line of your CSV file as an instance of class Record.现在My_records列表是一个列表，其中包含 CSV 文件的每一行作为 class 记录的实例。 Thus you can manipulate it as you wish.因此，您可以随心所欲地操纵它。

For example:例如：

All_countries = set([record.Country.strip() for record in My_records])
print(All_countries)

OUTPUT (All unique country present in your data): OUTPUT （您的数据中存在所有唯一国家/地区）：

{'Northern Territory', 'Tasmania', 'South Australia', 'Queensland', 'Western Australia', 'Australia', 'Australian Capital Territory', 'New South Wales', 'Victoria'}

Of course you have many usefull libraries that aims to deal with those kind of stuff, like pandas but here I gave you example in plain python (using the csv included library though)当然，您有许多有用的库旨在处理这些东西，例如pandas但在这里我以普通 python 为您提供示例（尽管使用 csv 包含的库）

By the way, if your file is an xls file, those libraries (like pandas) has this kind of tools (but you'll have to pip install pandas first):顺便说一句，如果您的文件是 xls 文件，那么这些库（如 pandas）具有这种工具（但您必须先pip install pandas ）：


import pandas as pd
dfs = pd.read_excel("record.xlsx", sheet_name="sheet1")

This code will actually replace the lines in the above example:此代码实际上将替换上面示例中的行：


with open('records.txt') as csv_file:

    csv_reader = csv.reader(csv_file, delimiter=';')
    ...

The rest is the same. rest 也是一样。

Answer 2

Maybe something like this:也许是这样的：

VIC_Male       = 'Estimated Resident Population ;  Male ;  Victoria ;'
QL_Male        = 'Estimated Resident Population ;  Male ;  Queensland ;'
SA_Male        = 'Estimated Resident Population ;  Male ;  South Australia ;'
WA_Male        = 'Estimated Resident Population ;  Male ;  Western Australia ;'
TAS_Male       = 'Estimated Resident Population ;  Male ;  Tasmania ;'
NT_Male        = 'Estimated Resident Population ;  Male ;  Northern Territory ;'
ACT_Male       = 'Estimated Resident Population ;  Male ;  Australian Capital Territory ;'
TOTAL_Male     = 'Estimated Resident Population ;  Male ;  Australia ;'
NSW_Female     = 'Estimated Resident Population ;  Female ;  New South Wales ;'
VIC_Female     = 'Estimated Resident Population ;  Female ;  Victoria ;'
QL_Female      = 'Estimated Resident Population ;  Female ;  Queensland ;'
SA_Female      = 'Estimated Resident Population ;  Female ;  South Australia ;'
WA_Female      = 'Estimated Resident Population ;  Female ;  Western Australia ;'
TAS_Female     = 'Estimated Resident Population ;  Female ;  Tasmania ;'
NT_Female      = 'Estimated Resident Population ;  Female ;  Northern Territory ;'
ACT_Female     = 'Estimated Resident Population ;  Female ;  Australian Capital Territory ;'
TOTAL_Female   = 'Estimated Resident Population ;  Female ;  Australia ;'
NSW_Persons    = 'Estimated Resident Population ;  Persons ;  New South Wales ;'
VIC_Persons    = 'Estimated Resident Population ;  Persons ;  Victoria ;'
QL_Persons     = 'Estimated Resident Population ;  Persons ;  Queensland ;'
SA_Persons     = 'Estimated Resident Population ;  Persons ;  South Australia ;'
WA_Persons     = 'Estimated Resident Population ;  Persons ;  Western Australia ;'
TAS_Persons    = 'Estimated Resident Population ;  Persons ;  Tasmania ;'
NT_Persons     = 'Estimated Resident Population ;  Persons ;  Northern Territory ;'
ACT_Persons    = 'Estimated Resident Population ;  Persons ;  Australian Capital Territory ;'
TOTAL_Persons  = 'Estimated Resident Population ;  Persons ;  Australia ;'```

如何在 Python 中处理来自 xlsx 文件的数据

问题描述

2 个解决方案

解决方案1
2 2020-07-15 07:16:14

解决方案2
0 2020-07-15 06:22:35

如何在 Python 中处理来自 xlsx 文件的数据

问题描述

2 个解决方案

解决方案1 2 2020-07-15 07:16:14

解决方案2 0 2020-07-15 06:22:35

解决方案1
2 2020-07-15 07:16:14

解决方案2
0 2020-07-15 06:22:35