简体   繁体   English

使用 openpyxl 选择多列

[英]select multiple columns using openpyxl

I am new to python and trying to read excel file using openpyxl.我是 python 新手,并尝试使用 openpyxl 读取 excel 文件。

My Excel file has 20 columns and I need to only extract the 5 columns .我的 Excel 文件有 20 列,我只需要提取 5 列。

My Excel file我的 Excel 文件

      CustNumber CustName Serial Number CreateDate ModifiedDate Organization CustAddress Phone
         1       XYZ       1010         01-01-2021   01-10-2021    test1     101 parklane  234
         2       ABC       1012         01-01-2021   01-10-2021    test2     102 texchlane 234
         3       CDF        1010         01-01-2021   01-10-2021   test1     101 parklane  234
         4       ASC       1012         01-01-2021   01-10-2021    test2     102 texchlane 234

Output输出

   CustNumber CustName              CreateDate ModifiedDate Organization    CustAddress 
         1       XYZ                01-01-2021   01-10-2021    test1     101 parklane  
         3       CDF                01-01-2021   01-10-2021    test1     101 parklane  

I need to select some columns from the excel file and filter the records in the excel file where Organization = test1.我需要从 excel 文件中选择一些列并过滤 excel 文件中组织 = test1 的记录。

I want to do it in openpyxl not in pandas.I am able to read one column but not sure how can I read multiple columns and then filter the file to extract only test1 records.我想在 openpyxl 中而不是在 Pandas 中执行此操作。我能够读取一列,但不确定如何读取多列,然后过滤文件以仅提取 test1 记录。

My code我的代码

  import openpyxl
  book = openpyxl.load_workbook('Book1.xlsx')
  sheet = book['SSH_CERT']
  column_name = 'Description'
  for column_cell in sheet.iter_cols(1, sheet.max_column): 
  if column_cell[0].value == column_name:    
   j = 0
    for data in column_cell[1:]:   
        print(data.value)
    break
     
     

Thanks谢谢

This would be 1 line of code with pandas but since you want the openpyxl solution here it is:这将是带有 Pandas 的 1 行代码,但由于您想要这里的 openpyxl 解决方案,因此它是:

import openpyxl

book = openpyxl.load_workbook('Book1.xlsx')
sheet = book['SSH_CERT']
# enter column names you want to be removed
column_names = ['CreatedDate']
for cell in sheet[1]:
    if cell.value in column_names:
        sheet.delete_cols(cell.column, 1)

book.save(filename='book1_res.xlsx')

This searches all columns and removes anything stored in column_names这将搜索所有列并删除存储在column_names中的任何内容

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM