[英]Write tables from Word (.docx) to Excel (.xlsx) using xlsxwriter
I am trying to parse a word (.docx) for tables, then copy these tables over to excel using xlsxwriter.我正在尝试为表格解析一个单词(.docx),然后使用 xlsxwriter 将这些表格复制到 excel。 This is my code:这是我的代码:
from docx.api import Document
import xlsxwriter
document = Document('/Users/xxx/Documents/xxx/Clauses Sample - Copy v1 - for merge.docx')
tables = document.tables
wb = xlsxwriter.Workbook('C:/Users/xxx/Documents/xxx/test clause retrieval.xlsx')
Sheet1 = wb.add_worksheet("Compliance")
index_row = 0
print(len(tables))
for table in document.tables:
data = []
keys = None
for i, row in enumerate(table.rows):
text = (cell.text for cell in row.cells)
if i == 0:
keys = tuple(text)
continue
row_data = dict(zip(keys, text))
data.append(row_data)
#print (data)
#big_data.append(data)
Sheet1.write(index_row,0, str(row_data))
index_row = index_row + 1
print(row_data)
wb.close()
This is my desired output:这是我想要的 output:
However, here is my actual output:但是,这是我的实际 output:
I am aware that my current output produces a list of string instead.我知道我当前的 output 会生成一个字符串列表。
Is there anyway that I can get my desired output using xlsxwriter?无论如何,我可以使用 xlsxwriter 获得我想要的 output 吗? Any help is greatly appreciated任何帮助是极大的赞赏
I would go using pandas
package, instead of xlsxwriter
, as follows:我会 go 使用pandas
package 代替xlsxwriter
,如下所示:
from docx.api import Document
import pandas as pd
document = Document("D:/tmp/test.docx")
tables = document.tables
df = pd.DataFrame()
for table in document.tables:
for row in table.rows:
text = [cell.text for cell in row.cells]
df = df.append([text], ignore_index=True)
df.columns = ["Column1", "Column2"]
df.to_excel("D:/tmp/test.xlsx")
print df
Which outputs the following that is inserted in the excel:它输出插入 excel 中的以下内容:
>>>
Column1 Column2
0 Hello TEST
1 Est Ting
2 Gg ff
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.