简体   繁体   English

python -docx 从word docx中提取表格

[英]python -docx to extract table from word docx

I know this is a repeated question but the other answers did not work for me.我知道这是一个重复的问题,但其他答案对我不起作用。 I have a word file that consists of one table.我有一个包含一张表的 word 文件。 I want that table as an output of my python program.我想要那个表作为我的 python 程序的输出。 I'm using python 3.6 and I have installed python -docx as well.我正在使用 python 3.6,我也安装了 python -docx。 Here is my code for the data extraction这是我的数据提取代码

from docx.api import Document

document = Document('test_word.docx')
table = document.tables[0]

data = []

keys = None
for i, row in enumerate(table.rows):
    text = (cell.text for cell in row.cells)

    if i == 0:
        keys = tuple(text)
        continue
    row_data = dict(zip(keys, text))
    data.append(row_data)
    print (data)

I want the result that exactly looks like the word docx file.我想要的结果与 docx 文件完全一样。 Thanks in advance提前致谢

Your code works fine for me.你的代码对我来说很好。 How about inserting it into a dataframe?将它插入数据帧怎么样?

import pandas as pd
from docx.api import Document

document = Document('test_word.docx')
table = document.tables[0]

data = []

keys = None
for i, row in enumerate(table.rows):
    text = (cell.text for cell in row.cells)

    if i == 0:
        keys = tuple(text)
        continue
    row_data = dict(zip(keys, text))
    data.append(row_data)
    print (data)

df = pd.DataFrame(data)

How can i display particular row and column in that table?如何在该表中显示特定的行和列? We can extract rows and cols based on index with iloc我们可以使用 iloc 根据索引提取行和列

# iloc[row,columns] 
df.iloc[0,:].tolist() # [5,6,7,8]  - row index 0
df.iloc[:,0].tolist() # [5,9,13,17]  - column index 0
df.iloc[0,0] # 5  - cell(0,0)
df.iloc[1:,2].tolist() # [11,15,19]  - column index 2, but skip first row

and so on...等等...

However, if your columns have names (in this case it is numbers) you can do it like this:但是,如果您的列有名称(在这种情况下是数字),您可以这样做:

#df["name"].tolist() 
df[1].tolist() # [5,6,7,8] - column with name 1 

print(df)

prints, which is how the table looks like in my sample doc.打印,这就是我的示例文档中表格的样子。

    1   2   3   4
0   5   6   7   8
1   9   10  11  12
2   13  14  15  16
3   17  18  19  20

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM