遍历docx的文件列表以提取和处理表

Question

I am facing 3000 docx in several directories and subdirectories.我在几个目录和子目录中面临 3000 个 docx。 I have to prepare a list which consists of the filename and extracted information from the tables in the docx.我必须准备一个列表，其中包含文件名和从 docx 中的表中提取的信息。 I have successfully added all the docx to the list targets_in_dir separating it from non relevant files.我已成功将所有 docx 添加到列表targets_in_dir中，将其与不相关的文件分开。

Question: I would like to iterate through targets_in_dir extract all tables from the docx,问题：我想遍历targets_in_dir从 docx 中提取所有表，

len_target =len(targets_in_dir)
file_processed=[]
string_tables=[]

for i in len_target:

    doc = docx.Document(targets_in_dir[i])
    file_processed.append(targets_ind[i])

    for table in doc.tables:
        for row in table.rows:
            for cell in row.cells:
                str.split('MANUFACTURER')
                string_tables.append(cell.text)

I get the error 'int' object is not iterable我收到错误'int' object is not iterable

 ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-39-4847866a9234> in <module>
      4 string_tables=[]
      5 
----> 6 for i in len_target:
      7 
      8     doc = docx.Document(targets_in_dir[i])

TypeError: 'int' object is not iterable

What am I doing wrong?我究竟做错了什么？

Answer 1

It looks like you are trying to iterate through len_target = len(targets_in_dir) , which is an int.看起来您正在尝试遍历len_target = len(targets_in_dir) ，这是一个 int。 Because int is not an iterable object, your for-loop fails.因为int不是可迭代的 object，所以您的 for 循环失败。
You need to iterate through an iterable object for the for loop to work.您需要遍历可迭代的 object 才能使for循环正常工作。
fixing it to将其固定为

for i in range(len_target):
    # do stuff

or或者

for i in targets_in_dir:
    # do stuff

is a good place to start.是一个很好的起点。

Also, your file_processed.append(targets_ind[i]) has a typo.此外，您的file_processed.append(targets_ind[i])有错字。

遍历docx的文件列表以提取和处理表

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-03-12 14:58:30

遍历docx的文件列表以提取和处理表

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-03-12 14:58:30

解决方案1
0 已采纳 2021-03-12 14:58:30