简体   繁体   English

Python:如何从 xlsx 文件中抓取数据的语法?

[英]Python: How do I syntax data scraping from xlsx file?

Currently I am scraping some data from xlsx file.目前我正在从 xlsx 文件中抓取一些数据。 My code works, but looks like a mess - at least for me.我的代码有效,但看起来一团糟——至少对我来说是这样。 So I am unsure if my code is ok according to PEP8.所以我不确定我的代码是否符合 PEP8。

from openpyxl import load_workbook
[...]
        for row in sheet.iter_rows():
            id = row[0].value
            name = row[1].value
            second_name = row[2].value
            # ignore the following
            # middle_name = row[3].value
            city = row[4].value
            address = row[5].value
            field_x = row[7].value
            field_y = row[10].value
            some_function_to_save_to_database(id, name, second_name, ...)

etc. (Please note that for some of those values I do extra-validation etc).等(请注意,对于其中一些值,我会进行额外验证等)。 So it works but it feels a bit "clunky".所以它有效,但感觉有点“笨重”。 Obviously I could pass them directly to function, making it some_function_to_save_to_database(row[0].value, row[1].value, ...) , but is it any better?显然我可以将它们直接传递给函数,使其成为some_function_to_save_to_database(row[0].value, row[1].value, ...) ,但它会更好吗? Feels like I lose readability a lot in this one.感觉我在这本书中失去了很多可读性。

So my question is as follows: Is it good approach or should I map those fields field names to row order?所以我的问题如下:这是好方法还是我应该将这些字段字段名称映射到行顺序? What is proper way to style this kind of scraping?什么是设计这种刮擦风格的正确方法?

Your code does not violate PEP8.您的代码不违反 PEP8。 However, it's a little cumbersome.不过,这有点麻烦。 And it's not easy to maintain if the data changed.如果数据发生变化,也不容易维护。 Maybe you can try:也许你可以试试:

DATA_INDEX_MAP = {
    'id' : 0,
    'name' : 1,
    'second_name' : 2,
    'city' : 4,
    'address' : 5,
    'field_x' : 7,
    'field_y' : 10
}

def get_data_from_row(row):
    return {key:row[DATA_INDEX_MAP[key]].value for key in DATA_INDEX_MAP}

for row in sheet.iter_rows():
    data = get_data_from_row(row)
    some_function_to_save_to_database(**data)

Then what you need to do is just to modify DATA_INDEX_MAP .那么你需要做的就是修改DATA_INDEX_MAP

A lighter alternative to the dict in LiuChang's answer : LiuChang's answer 中dict一种更轻松的替代方案:

from operator import itemgetter

get_data = itemgetter(0, 1, 2, 4, 5, 7, 10)
for row in sheet.iter_rows():
    data = [x.value for x in get_data(row)]
    some_function_to_save_to_database(*data))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python根据句子中的关键字从xlsx文件中过滤数据? - How do I filter data from an xlsx file based on key words in a sentence using python? Python:如何从一个XLSX中搜索一个字符串,使其位于另一个XLSX文件中? - Python: How do I search a string from one XLSX to be in another XLSX file? 如何在 Python 中处理来自 xlsx 文件的数据 - How to handle data from xlsx file in Python 如何通过 uniqueid 从 xlsx 文件中提取数据并使用 Python 将该数据写入另一个具有相同列名的 xlsx 文件? - How can I pull data by uniqueid from an xlsx file and write that data to another xlsx file with the same column name using Python? 如何一次对 Python 中的多个图像执行 OCR,并将所有数据打印到 XLSX 文件中? - How do I perform OCR on multiple images in Python at once, and print all that data into an XLSX file? 如何通过 Openpyxl 从 xlsx 文件导入 Python 类? - How do I import from xlsx file via Openpyxl into a Python Class? 如何使用python从xlsx文件加载数据 - How to load data from an xlsx file using python 如何使用 Python 将数据从 txt 文件复制并粘贴到 XLSX 作为值? - How to copy data from txt file and paste to XLSX as value with Python? 如何从python中的数据创建.csv或.xlsx文件 - How to create .csv or .xlsx file from a data in python 从网站(metacritc)抓取数据时如何匹配数组长度 - how do I match up array length when scraping data from a website (metacritc) Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM