简体   繁体   English

在python中使用xlrd、xlwt和xlutils从excel文件中删除行

[英]In python removing rows from a excel file using xlrd, xlwt, and xlutils

Hello everyone and thank you in advance.大家好,提前谢谢你。

I have a python script where I am opening a template excel file, adding data (while preserving the style) and saving again.我有一个 python 脚本,我在其中打开模板 excel 文件,添加数据(同时保留样式)并再次保存。 I would like to be able to remove rows that I did not edit before saving out the new xls file.我希望能够在保存新的 xls 文件之前删除我没有编辑的行。 My template xls file has a footer so I want to delete the extra rows before the footer.我的模板 xls 文件有一个页脚,所以我想删除页脚之前的额外行。

Here is how I am loading the xls template:这是我加载 xls 模板的方式:

self.inBook = xlrd.open_workbook(file_path, formatting_info=True)
self.outBook = xlutils.copy.copy(self.inBook)
self.outBookCopy = xlutils.copy.copy(self.inBook)

I then write the info to outBook while grabbing the style from outBookCopy and applying it to each row that I modify in outbook.然后我将信息写入 outBook,同时从 outBookCopy 中获取样式并将其应用于我在 outbook 中修改的每一行。

so how do I delete rows from outBook before writing it?那么如何在写之前从 outBook 中删除行? Thanks everyone!感谢大家!

I achieved using Pandas package....我使用 Pandas 包实现了....

import pandas as pd

#Read from Excel
xl= pd.ExcelFile("test.xls")

#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])

#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)

dfs = dfs[dfs['Name'] != '']

#Updating the excel sheet with the updated DataFrame

dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)

xlwt does not provide a simple interface for doing this, but I've had success with a somewhat similar problem (inserting multiple copies of a row into a copied workbook) by directly changing the worksheet's rows attribute and the row numbers on the row and cell objects. xlwt 没有为此提供简单的界面,但我通过直接更改工作表的行属性以及行和单元格上的行号,成功解决了一些类似的问题(将行的多个副本插入复制的工作簿)对象。

Given the number of rows you want to delete and the starting number of the first row you want to keep, something like this might work:给定要删除的行数和要保留的第一行的起始编号,这样的操作可能有效:

rows_to_move = worksheet.rows[first_kept_row:]
for row in rows_to_move:
    new_row_number = row._Row__idx - number_to_delete
    row._Row__idx = new_row_number
    for cell in row._Row__cells.values():
        if cell:
            cell.rowx = new_row_number
    worksheet.rows[new_row_number] = row
# now delete any remaining rows
del worksheet.rows[new_row_number + 1:]

Do you have merged ranges in the rows you want to delete, or below them?您是否在要删除的行中或在它们下方合并了范围? If so you'll also need to run through the worksheet's merged_ranges attribute and update the rows for them.如果是这样,您还需要运行工作表的 merge_ranges 属性并更新它们的行。 Also, if you have more rows to delete than rows in your footer, you'll need to此外,如果要删除的行数多于页脚中的行数,则需要

As a side note - I was able to write text to my worksheet and preserve the predefined style thus:作为旁注 - 我能够将文本写入我的工作表并因此保留预定义的样式:

def write_with_style(ws, row, col, value):
    if ws.rows[row]._Row__cells[col]:
        old_xf_idx = ws.rows[row]._Row__cells[col].xf_idx
        ws.write(row, col, value)
        ws.rows[row]._Row__cells[col].xf_idx = old_xf_idx
    else:
        ws.write(row, col, value)

That might let you skip having two copies of your spreadsheet open at once.这可能会让您跳过同时打开电子表格的两个副本。

For those of us still stuck with xlrd / xlwt / xlutils , here's a filter you could use:对于我们这些仍然坚持使用xlrd / xlwt / xlutils ,这里有一个您可以使用的过滤器

from xlutils.filter import BaseFilter

class RowFilter(BaseFilter):
    rows_to_exclude: "Iterable[int]"
    _next_output_row: int

    def __init__(
            self,
            rows_to_exclude: "Iterable[int]",
    ):
        self.rows_to_exclude = rows_to_exclude
        self._next_output_row = -1

    def _should_include_row(self, rdrowx):
        return rdrowx not in self.rows_to_exclude

    def row(self, rdrowx, wtrowx):
        if self._should_include_row(rdrowx):
            # Proceed with writing out the row to the output file
            self._next_output_row += 1
            self.next.row(
                rdrowx, self._next_output_row,
            )

    # After `row()` has been called, `cell()` is called for each cell of the row
    def cell(self, rdrowx, rdcolx, wtrowx, wtcolx):
        if self._should_include_row(rdrowx):
            self.next.cell(
                rdrowx, rdcolx, self._next_output_row, wtcolx,
            )

Then put it to use with eg:然后将其与例如一起使用:

from xlrd import open_workbook
from xlutils.filter import DirectoryWriter, XLRDReader

xlutils.filter.process(
    XLRDReader(open_workbook("input_filename.xls", "output_filename.xls")),
    RowFilter([3, 4, 5]),
    DirectoryWriter("output_dir"),
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM