[英]How to read tables in multiple docx files in a same folder by python
I have one folder called "Test_Plan".我有一个名为“Test_Plan”的文件夹。 It consist multiple docx files and each docx file has multiple tables.它由多个 docx 文件组成,每个 docx 文件有多个表。 My question is how can I read the whole docx files and give the output?我的问题是如何读取整个 docx 文件并给出输出? For example, all docx files has multiple tables, I'm picking one docx file and give the output like例如,所有 docx 文件都有多个表,我选择一个 docx 文件并给出如下输出
(ie) (IE)
Total Number of Tables: 52桌子总数:52
Total Number of YES Automations: 6 YES 自动化总数:6
Total Number of NO Automations: 5 NO 自动化总数:5
Like this I need to automate the whole number of files in that "Test_Plan" folder.像这样,我需要自动化“Test_Plan”文件夹中的全部文件。 Hope you understand my question.希望你明白我的问题。
My code for read tables from single docx file:我从单个 docx 文件中读取表格的代码:
#Module to retrive the word documents
from docx import Document
doc = Document("sample2.docx")
#Reading the tables in the particular docx
i = 0
for t in doc.tables:
for ro in t.rows:
if ro.cells[0].text=="ID" :
i=i+1
print("Total Number of Tables: ", i)
#Counting the values of Automation
# This will count how many yes automation
j=0
for table in doc.tables:
for ro in table.rows:
if ro.cells[0].text=="Automated Test Case" and (ro.cells[2].text=="yes" or ro.cells[2].text=="Yes"):
j=j+1
print("Total Number of YES Automations: ", j)
#This part is used to count the No automation values
k = 0
for t in doc.tables:
for ro in t.rows:
if ro.cells[0].text=="Automated Test Case" and (ro.cells[2].text=="no" or ro.cells[2].text=="No"):
k=k+1
print("Total Number of NO Automations: ", k)
Output:输出:
You can use glob to find all your files, eg:您可以使用 glob 查找所有文件,例如:
import glob
for name in glob.glob('Test_Plan/*.docx'):
doc = Document(name)
...
glob will return a list of file names that match the given pattern. glob 将返回与给定模式匹配的文件名列表。 You can loop through that list, as shown above by the for loop and open every file in turn.您可以遍历该列表,如上面的 for 循环所示,依次打开每个文件。 After opening the files you can just plug in your code.打开文件后,您只需插入代码即可。 Of course, you will have to initialize your variables before the loop.当然,您必须在循环之前初始化变量。
For splitting the file names I would suggest to use the following approach:对于拆分文件名,我建议使用以下方法:
import os.path
path, filename = os.path.split(input)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.