[英]Python: How do I syntax data scraping from xlsx file?
Currently I am scraping some data from xlsx file.目前我正在从 xlsx 文件中抓取一些数据。 My code works, but looks like a mess - at least for me.
我的代码有效,但看起来一团糟——至少对我来说是这样。 So I am unsure if my code is ok according to PEP8.
所以我不确定我的代码是否符合 PEP8。
from openpyxl import load_workbook
[...]
for row in sheet.iter_rows():
id = row[0].value
name = row[1].value
second_name = row[2].value
# ignore the following
# middle_name = row[3].value
city = row[4].value
address = row[5].value
field_x = row[7].value
field_y = row[10].value
some_function_to_save_to_database(id, name, second_name, ...)
etc. (Please note that for some of those values I do extra-validation etc).等(请注意,对于其中一些值,我会进行额外验证等)。 So it works but it feels a bit "clunky".
所以它有效,但感觉有点“笨重”。 Obviously I could pass them directly to function, making it
some_function_to_save_to_database(row[0].value, row[1].value, ...)
, but is it any better?显然我可以将它们直接传递给函数,使其成为
some_function_to_save_to_database(row[0].value, row[1].value, ...)
,但它会更好吗? Feels like I lose readability a lot in this one.感觉我在这本书中失去了很多可读性。
So my question is as follows: Is it good approach or should I map those fields field names to row order?所以我的问题如下:这是好方法还是我应该将这些字段字段名称映射到行顺序? What is proper way to style this kind of scraping?
什么是设计这种刮擦风格的正确方法?
Your code does not violate PEP8.您的代码不违反 PEP8。 However, it's a little cumbersome.
不过,这有点麻烦。 And it's not easy to maintain if the data changed.
如果数据发生变化,也不容易维护。 Maybe you can try:
也许你可以试试:
DATA_INDEX_MAP = {
'id' : 0,
'name' : 1,
'second_name' : 2,
'city' : 4,
'address' : 5,
'field_x' : 7,
'field_y' : 10
}
def get_data_from_row(row):
return {key:row[DATA_INDEX_MAP[key]].value for key in DATA_INDEX_MAP}
for row in sheet.iter_rows():
data = get_data_from_row(row)
some_function_to_save_to_database(**data)
Then what you need to do is just to modify DATA_INDEX_MAP
.那么你需要做的就是修改
DATA_INDEX_MAP
。
A lighter alternative to the dict
in LiuChang's answer : LiuChang's answer 中
dict
一种更轻松的替代方案:
from operator import itemgetter
get_data = itemgetter(0, 1, 2, 4, 5, 7, 10)
for row in sheet.iter_rows():
data = [x.value for x in get_data(row)]
some_function_to_save_to_database(*data))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.