使用 xlsxwriter 将表格从 Word (.docx) 写入 Excel (.xlsx)

Question

I am trying to parse a word (.docx) for tables, then copy these tables over to excel using xlsxwriter.我正在尝试为表格解析一个单词（.docx），然后使用 xlsxwriter 将这些表格复制到 excel。 This is my code:这是我的代码：

from docx.api import Document
import xlsxwriter

document = Document('/Users/xxx/Documents/xxx/Clauses Sample - Copy v1 - for merge.docx')
tables = document.tables

wb = xlsxwriter.Workbook('C:/Users/xxx/Documents/xxx/test clause retrieval.xlsx')
Sheet1 = wb.add_worksheet("Compliance")
index_row = 0

print(len(tables))

for table in document.tables:
data = []
keys = None
for i, row in enumerate(table.rows):
    text = (cell.text for cell in row.cells)

    if i == 0:
        keys = tuple(text)
        continue
    row_data = dict(zip(keys, text))
    data.append(row_data)
    #print (data)
    #big_data.append(data)
    Sheet1.write(index_row,0, str(row_data))      
    index_row = index_row + 1

print(row_data)

wb.close()

This is my desired output:这是我想要的 output：

However, here is my actual output:但是，这是我的实际 output：

I am aware that my current output produces a list of string instead.我知道我当前的 output 会生成一个字符串列表。

Is there anyway that I can get my desired output using xlsxwriter?无论如何，我可以使用 xlsxwriter 获得我想要的 output 吗？ Any help is greatly appreciated任何帮助是极大的赞赏

Answer 1

I would go using pandas package, instead of xlsxwriter , as follows:我会 go 使用pandas package 代替xlsxwriter ，如下所示：

from docx.api import Document
import pandas as pd

document = Document("D:/tmp/test.docx")
tables = document.tables
df = pd.DataFrame()

for table in document.tables:
    for row in table.rows:
        text = [cell.text for cell in row.cells]
        df = df.append([text], ignore_index=True)

df.columns = ["Column1", "Column2"]    
df.to_excel("D:/tmp/test.xlsx")
print df

Which outputs the following that is inserted in the excel:它输出插入 excel 中的以下内容：

>>> 
  Column1 Column2
0   Hello    TEST
1     Est    Ting
2      Gg      ff

Answer 2

This is the portion of my code update that allowed me to get the output I want:这是我的代码更新的一部分，它允许我获得我想要的 output：

for row in block.rows:
        for x, cell in enumerate(row.cells):
            print(cell.text)
            Sheet1.write(index_row, x, cell.text)
        index_row += 1

Output : Output ：

使用 xlsxwriter 将表格从 Word (.docx) 写入 Excel (.xlsx)

问题描述

2 个解决方案

解决方案1
1 2020-05-27 18:32:28

解决方案2
1 已采纳 2020-05-29 06:05:28

使用 xlsxwriter 将表格从 Word (.docx) 写入 Excel (.xlsx)

问题描述

2 个解决方案

解决方案1 1 2020-05-27 18:32:28

解决方案2 1 已采纳 2020-05-29 06:05:28

解决方案1
1 2020-05-27 18:32:28

解决方案2
1 已采纳 2020-05-29 06:05:28