[英]iterate through file list of docx to extract and process table
I am facing 3000 docx in several directories and subdirectories.我在几个目录和子目录中面临 3000 个 docx。 I have to prepare a list which consists of the filename and extracted information from the tables in the docx.
我必须准备一个列表,其中包含文件名和从 docx 中的表中提取的信息。 I have successfully added all the docx to the list
targets_in_dir
separating it from non relevant files.我已成功将所有 docx 添加到列表
targets_in_dir
中,将其与不相关的文件分开。
Question: I would like to iterate through targets_in_dir
extract all tables from the docx,问题:我想遍历
targets_in_dir
从 docx 中提取所有表,
len_target =len(targets_in_dir)
file_processed=[]
string_tables=[]
for i in len_target:
doc = docx.Document(targets_in_dir[i])
file_processed.append(targets_ind[i])
for table in doc.tables:
for row in table.rows:
for cell in row.cells:
str.split('MANUFACTURER')
string_tables.append(cell.text)
I get the error 'int' object is not iterable
我收到错误
'int' object is not iterable
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-39-4847866a9234> in <module>
4 string_tables=[]
5
----> 6 for i in len_target:
7
8 doc = docx.Document(targets_in_dir[i])
TypeError: 'int' object is not iterable
What am I doing wrong?我究竟做错了什么?
It looks like you are trying to iterate through len_target = len(targets_in_dir)
, which is an int.看起来您正在尝试遍历
len_target = len(targets_in_dir)
,这是一个 int。 Because int
is not an iterable object, your for-loop fails.因为
int
不是可迭代的 object,所以您的 for 循环失败。
You need to iterate through an iterable object for the for
loop to work.您需要遍历可迭代的 object 才能使
for
循环正常工作。
fixing it to将其固定为
for i in range(len_target):
# do stuff
or或者
for i in targets_in_dir:
# do stuff
is a good place to start.是一个很好的起点。
Also, your file_processed.append(targets_ind[i])
has a typo.此外,您的
file_processed.append(targets_ind[i])
有错字。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.