![](/img/trans.png)
[英]How to extract a Word table from multiple files using python docx
[英]Extract a Word table from multiple docx files using python docx
我有很多具有相同表结构的 word 文件,我需要将它们提取并保存到 csv/excel 中,作为每个 word.docx 的单独工作表 (in.xls)。
下面只提取第一个表..并没有遍历整个docx..有没有办法我们可以遍历整个.doc和文件夹中的所有文件
import os
from docx import Document
import pandas as pd
folder = 'C:/Users/trans/downloads/test'
file_names = [f for f in os.listdir(folder) if f.endswith(".docx") ]
file_names = [os.path.join(folder, file) for file in file_names]
print(file_names)
tables = []
for file in file_names:
document = Document(file)
for table in document.tables:
df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
if cell.text:
df[i][j] = cell.text
tables.append(pd.DataFrame(df))
print(df)
for nr, i in enumerate(tables):
i.to_csv('C:/Users/trans/downloads/test/'"table_" + str(nr) + ".csv")
您所需要的只是安装“docx2txt”库并导入它,然后按照以下说明进行操作。 Go 到此链接
import glob
from docx import Document
import pandas as pd
folder = 'C:/Users/trans/downloads/test'
file_names = glob.glob(folder + '/*.docx')
tables = []
for file in file_names:
document = Document(file)
for table in document.tables:
df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
if cell.text:
df[i][j] = cell.text
tables.append(pd.DataFrame(df))`
for index, table in enumerate(tables):
table.to_csv('C:/Users/trans/downloads/test/table_' + str(index) + ".csv")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.