[英]How can I write a python scripts using pandas to iterate over Excel .xlsx files with multiple sheets?
I have some Excel.Xlsx files.我有一些 Excel.Xlsx 文件。 Each file contains multiple sheets.
每个文件包含多张工作表。 I have used the following code to read and extract data from the files:
我使用以下代码从文件中读取和提取数据:
import pandas as pd
file = pd.ExcelFile('my_file.xlsx')
file.sheet_names #Displays the sheet names
df = file.parse('Sheet1') #To parse Sheet1
df.columns #To list columns
My interest is the email columns in each sheet.我感兴趣的是每张纸中的 email 列。 I have been doing this almost manually with the code above.
我一直在使用上面的代码几乎手动执行此操作。 I need a code to automatically iterate over the sheets and extract all emails.
我需要一个代码来自动遍历工作表并提取所有电子邮件。 Help!
帮助!
You can pass over all files and all sheets with a for loop:您可以使用 for 循环传递所有文件和所有工作表:
import pandas as pd
import os
emails = []
files_dir = "/your_path_to_the_xlsx_files"
for file in os.listdir(files_dir):
excel = pd.ExcelFile(os.path.join(files_dir,file))
for sheet in excel.sheet_names:
df = excel.parse(sheet)
if 'email' not in df.columns:
continue
emails.extend(df['email'].tolist())
Now you have all the emails in the emails list.现在您拥有电子邮件列表中的所有电子邮件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.