简体   繁体   English

如何通过python读取同一文件夹中多个docx文件中的表格

[英]How to read tables in multiple docx files in a same folder by python

I have one folder called "Test_Plan".我有一个名为“Test_Plan”的文件夹。 It consist multiple docx files and each docx file has multiple tables.它由多个 docx 文件组成,每个 docx 文件有多个表。 My question is how can I read the whole docx files and give the output?我的问题是如何读取整个 docx 文件并给出输出? For example, all docx files has multiple tables, I'm picking one docx file and give the output like例如,所有 docx 文件都有多个表,我选择一个 docx 文件并给出如下输出

(ie) (IE)
Total Number of Tables: 52桌子总数:52
Total Number of YES Automations: 6 YES 自动化总数:6
Total Number of NO Automations: 5 NO 自动化总数:5

Like this I need to automate the whole number of files in that "Test_Plan" folder.像这样,我需要自动化“Test_Plan”文件夹中的全部文件。 Hope you understand my question.希望你明白我的问题。

My code for read tables from single docx file:我从单个 docx 文件中读取表格的代码:

#Module to retrive the word documents

from docx import Document
doc = Document("sample2.docx")


#Reading the tables in the particular docx

i = 0
for t in doc.tables:
    for ro in t.rows:
        if ro.cells[0].text=="ID" :
            i=i+1
print("Total Number of Tables: ", i)


#Counting the values of Automation
 # This will count how many yes automation

j=0
for table in doc.tables:
    for ro in table.rows:
        if ro.cells[0].text=="Automated Test Case" and (ro.cells[2].text=="yes" or ro.cells[2].text=="Yes"):
            j=j+1
print("Total Number of YES Automations: ", j)


#This part is used to count the No automation values

k = 0
for t in doc.tables:
    for ro in t.rows:
        if ro.cells[0].text=="Automated Test Case" and (ro.cells[2].text=="no" or ro.cells[2].text=="No"):
            k=k+1
print("Total Number of NO Automations: ", k)

Output:输出:

在此处输入图片说明

You can use glob to find all your files, eg:您可以使用 glob 查找所有文件,例如:

import glob
for name in glob.glob('Test_Plan/*.docx'):
    doc = Document(name)
    ...

glob will return a list of file names that match the given pattern. glob 将返回与给定模式匹配的文件名列表。 You can loop through that list, as shown above by the for loop and open every file in turn.您可以遍历该列表,如上面的 for 循环所示,依次打开每个文件。 After opening the files you can just plug in your code.打开文件后,您只需插入代码即可。 Of course, you will have to initialize your variables before the loop.当然,您必须在循环之前初始化变量。

For splitting the file names I would suggest to use the following approach:对于拆分文件名,我建议使用以下方法:

import os.path

path, filename = os.path.split(input)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python-docx在docx文件中写入多个表? - How to write multiple tables in docx file using python-docx? 通过 python 读取 Docx 文件 - Read Docx files via python Python - 如何读取以 xyz 开头的文件夹中的多个文件? - Python - How do you read multiple files in a folder starting by xyz? 如何从python中的不同文件夹读取多个文件 - How to read multiple files from different folder in python 如何将存储在文件夹中的多个 json 文件读取到 Python 中的不同字典中? - How to read multiple json files stored in a folder into different dictionaries in Python? 如何从 python 中的文件夹中读取多个 NetCDF 文件 - How to read multiple NetCDF files from a folder in python 如何使用 Python 从 azure blob 读取 docx 文件 - How to read docx files from azure blob using Python 如何使用带有多个 URL(输入)的 Selenium、Bs4 和 Docx 使用 Python 进行 WebScrape 到多个输出 Docx 文件? - How to WebScrape with Python using Selenium, Bs4 & Docx with Multiple URLs(Input) into Multiple Output Docx Files? 如何从 Python 中的多个 docx 文件创建语料库 - How to create corpus from multiple docx files in Python 如何使用 python docx 从多个文件中提取 Word 表 - How to extract a Word table from multiple files using python docx
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM