简体   繁体   中英

Extracting tables from a word doc

Is there any tool to extract all tables from a word documents and converting them to a csv file or any excel extension file using python or vba

note that the word file contains both text and tables.

You can use pandas with python-docx . Per this answer you can extract all tables from a document and put them in a list:

from docx import Document
import pandas as pd
document = Document('test.docx')

tables = []
for table in document.tables:
    df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
    for i, row in enumerate(table.rows):
        for j, cell in enumerate(row.cells):
            if cell.text:
                df[i][j] = cell.text
    tables.append(pd.DataFrame(df))

You can then save the tables to csv files by looping through the list:

for nr, i in enumerate(tables):
    i.to_csv("table_" + str(nr) + ".csv")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM