簡體   English   中英

在 Python 中使用 openpyxl 將行插入 Excel 電子表格

[英]Insert row into Excel spreadsheet using openpyxl in Python

我正在尋找使用 openpyxl 將行插入電子表格的最佳方法。

實際上,我有一個電子表格(Excel 2007),它有一個標題行,后面是(最多)幾千行數據。 我希望將該行作為實際數據的第一行插入,所以在標題之后。 我的理解是 append 函數適用於在文件末尾添加內容。

閱讀 openpyxl 和 xlrd(和 xlwt)的文檔,除了手動循環內容並插入新工作表(在插入所需的行之后)之外,我找不到任何明確的方法來做到這一點。

鑒於我迄今為止對 Python 的有限經驗,我試圖了解這是否確實是最好的選擇(最 Pythonic!),如果是這樣,有人可以提供一個明確的例子。 具體來說,我可以使用 openpyxl 讀寫行還是必須訪問單元格? 另外我可以(過)寫相同的文件(名稱)嗎?

== 根據此處的反饋更新到功能齊全的版本:groups.google.com/forum/#!topic/openpyxl-users/wHGecdQg3Iw。 ==

正如其他人所指出的, openpyxl不提供此功能,但我已經擴展了Worksheet類,如下所示以實現插入行。 希望這對其他人有用。

def insert_rows(self, row_idx, cnt, above=False, copy_style=True, fill_formulae=True):
    """Inserts new (empty) rows into worksheet at specified row index.

    :param row_idx: Row index specifying where to insert new rows.
    :param cnt: Number of rows to insert.
    :param above: Set True to insert rows above specified row index.
    :param copy_style: Set True if new rows should copy style of immediately above row.
    :param fill_formulae: Set True if new rows should take on formula from immediately above row, filled with references new to rows.

    Usage:

    * insert_rows(2, 10, above=True, copy_style=False)

    """
    CELL_RE  = re.compile("(?P<col>\$?[A-Z]+)(?P<row>\$?\d+)")

    row_idx = row_idx - 1 if above else row_idx

    def replace(m):
        row = m.group('row')
        prefix = "$" if row.find("$") != -1 else ""
        row = int(row.replace("$",""))
        row += cnt if row > row_idx else 0
        return m.group('col') + prefix + str(row)

    # First, we shift all cells down cnt rows...
    old_cells = set()
    old_fas   = set()
    new_cells = dict()
    new_fas   = dict()
    for c in self._cells.values():

        old_coor = c.coordinate

        # Shift all references to anything below row_idx
        if c.data_type == Cell.TYPE_FORMULA:
            c.value = CELL_RE.sub(
                replace,
                c.value
            )
            # Here, we need to properly update the formula references to reflect new row indices
            if old_coor in self.formula_attributes and 'ref' in self.formula_attributes[old_coor]:
                self.formula_attributes[old_coor]['ref'] = CELL_RE.sub(
                    replace,
                    self.formula_attributes[old_coor]['ref']
                )

        # Do the magic to set up our actual shift    
        if c.row > row_idx:
            old_coor = c.coordinate
            old_cells.add((c.row,c.col_idx))
            c.row += cnt
            new_cells[(c.row,c.col_idx)] = c
            if old_coor in self.formula_attributes:
                old_fas.add(old_coor)
                fa = self.formula_attributes[old_coor].copy()
                new_fas[c.coordinate] = fa

    for coor in old_cells:
        del self._cells[coor]
    self._cells.update(new_cells)

    for fa in old_fas:
        del self.formula_attributes[fa]
    self.formula_attributes.update(new_fas)

    # Next, we need to shift all the Row Dimensions below our new rows down by cnt...
    for row in range(len(self.row_dimensions)-1+cnt,row_idx+cnt,-1):
        new_rd = copy.copy(self.row_dimensions[row-cnt])
        new_rd.index = row
        self.row_dimensions[row] = new_rd
        del self.row_dimensions[row-cnt]

    # Now, create our new rows, with all the pretty cells
    row_idx += 1
    for row in range(row_idx,row_idx+cnt):
        # Create a Row Dimension for our new row
        new_rd = copy.copy(self.row_dimensions[row-1])
        new_rd.index = row
        self.row_dimensions[row] = new_rd
        for col in range(1,self.max_column):
            col = get_column_letter(col)
            cell = self.cell('%s%d'%(col,row))
            cell.value = None
            source = self.cell('%s%d'%(col,row-1))
            if copy_style:
                cell.number_format = source.number_format
                cell.font      = source.font.copy()
                cell.alignment = source.alignment.copy()
                cell.border    = source.border.copy()
                cell.fill      = source.fill.copy()
            if fill_formulae and source.data_type == Cell.TYPE_FORMULA:
                s_coor = source.coordinate
                if s_coor in self.formula_attributes and 'ref' not in self.formula_attributes[s_coor]:
                    fa = self.formula_attributes[s_coor].copy()
                    self.formula_attributes[cell.coordinate] = fa
                # print("Copying formula from cell %s%d to %s%d"%(col,row-1,col,row))
                cell.value = re.sub(
                    "(\$?[A-Z]{1,3}\$?)%d"%(row - 1),
                    lambda m: m.group(1) + str(row),
                    source.value
                )   
                cell.data_type = Cell.TYPE_FORMULA

    # Check for Merged Cell Ranges that need to be expanded to contain new cells
    for cr_idx, cr in enumerate(self.merged_cell_ranges):
        self.merged_cell_ranges[cr_idx] = CELL_RE.sub(
            replace,
            cr
        )

Worksheet.insert_rows = insert_rows

添加適用於openpyxl的最新版本 v2.5+ 的答案:

現在有一個insert_rows()insert_cols()

insert_rows(idx, amount=1)

在 row==idx 之前插入一行或多行

用我現在用來達到預期結果的代碼來回答這個問題。 請注意,我在位置 1 處手動插入行,但這應該很容易根據特定需求進行調整。 您還可以輕松地對其進行調整以插入多行,並簡單地從相關位置開始填充其余數據。

另外,請注意,由於下游依賴性,我們手動指定“Sheet1”中的數據,並且數據被復制到插入工作簿開頭的新工作表中,同時將原始工作表重命名為“Sheet1.5” .

編輯:我還(稍后)添加了對 format_code 的更改,以解決此處默認復制操作刪除所有格式的問題: new_cell.style.number_format.format_code = 'mm/dd/yyyy' 我找不到任何可以設置的文檔,這更像是一個反復試驗的案例!

最后,不要忘記這個例子是在原來的基礎上保存。 您可以更改適用的保存路徑以避免這種情況。

    import openpyxl

    wb = openpyxl.load_workbook(file)
    old_sheet = wb.get_sheet_by_name('Sheet1')
    old_sheet.title = 'Sheet1.5'
    max_row = old_sheet.get_highest_row()
    max_col = old_sheet.get_highest_column()
    wb.create_sheet(0, 'Sheet1')

    new_sheet = wb.get_sheet_by_name('Sheet1')

    # Do the header.
    for col_num in range(0, max_col):
        new_sheet.cell(row=0, column=col_num).value = old_sheet.cell(row=0, column=col_num).value

    # The row to be inserted. We're manually populating each cell.
    new_sheet.cell(row=1, column=0).value = 'DUMMY'
    new_sheet.cell(row=1, column=1).value = 'DUMMY'

    # Now do the rest of it. Note the row offset.
    for row_num in range(1, max_row):
        for col_num in range (0, max_col):
            new_sheet.cell(row = (row_num + 1), column = col_num).value = old_sheet.cell(row = row_num, column = col_num).value

    wb.save(file)

Openpyxl 工作表在執行行或列級別操作時功能有限。 Worksheet 與行/列相關的唯一屬性是屬性row_dimensionscolumn_dimensions ,它們分別為每一行和每一列存儲“RowDimensions”和“ColumnDimensions”對象。 這些字典也用於get_highest_row()get_highest_column()等函數。

其他一切都在單元格級別上運行,在字典_cells中跟蹤 Cell 對象(以及在字典_styles中跟蹤它們的樣式)。 大多數看起來像是在行或列級別上執行任何操作的函數實際上是在一系列單元格上運行的(例如前面提到的append() )。

最簡單的做法就是您的建議:創建一個新工作表,附加標題行,附加新數據行,附加舊數據行,刪除舊工作表,然后將新工作表重命名為舊工作表。 這種方法可能出現的問題是行/列維度屬性和單元格樣式的丟失,除非您也專門復制它們。

或者,您可以創建自己的插入行或列的函數。

我有大量非常簡單的工作表,需要從中刪除列。 由於您要求提供明確的示例,因此我將提供我快速組合在一起的功能:

from openpyxl.cell import get_column_letter

def ws_delete_column(sheet, del_column):

    for row_num in range(1, sheet.get_highest_row()+1):
        for col_num in range(del_column, sheet.get_highest_column()+1):

            coordinate = '%s%s' % (get_column_letter(col_num),
                                   row_num)
            adj_coordinate = '%s%s' % (get_column_letter(col_num + 1),
                                       row_num)

            # Handle Styles.
            # This is important to do if you have any differing
            # 'types' of data being stored, as you may otherwise get
            # an output Worksheet that's got improperly formatted cells.
            # Or worse, an error gets thrown because you tried to copy
            # a string value into a cell that's styled as a date.

            if adj_coordinate in sheet._styles:
                sheet._styles[coordinate] = sheet._styles[adj_coordinate]
                sheet._styles.pop(adj_coordinate, None)
            else:
                sheet._styles.pop(coordinate, None)

            if adj_coordinate in sheet._cells:
                sheet._cells[coordinate] = sheet._cells[adj_coordinate]
                sheet._cells[coordinate].column = get_column_letter(col_num)
                sheet._cells[coordinate].row = row_num
                sheet._cells[coordinate].coordinate = coordinate

                sheet._cells.pop(adj_coordinate, None)
            else:
                sheet._cells.pop(coordinate, None)

        # sheet.garbage_collect()

我將我正在使用的工作表以及我想要刪除的列號傳遞給它,然后它就消失了。 我知道這不是您想要的,但我希望這些信息對您有所幫助!

編輯:注意到有人給了另一個投票,並認為我應該更新它。 Openpyxl 中的坐標系統在過去的幾年中經歷了一些變化,為_cell中的項目引入了coordinate屬性。 這也需要編輯,否則行將留空(而不是刪除),Excel 將拋出有關文件問題的錯誤。 這適用於 Openpyxl 2.2.3(未經更高版本測試)

從 openpyxl 1.5 開始,您現在可以使用 .insert_rows(idx, row_qty)

from openpyxl import load_workbook
wb = load_workbook('excel_template.xlsx')
ws = wb.active
ws.insert_rows(14, 10)

如果您在 Excel 中手動執行此操作,它將不會獲取 idx 行的格式。 之后您將應用正確的格式,即單元格顏色。

在 Python 中使用 openpyxl 將行插入 Excel 電子表格

下面的代碼可以幫助你: -

import openpyxl

file = "xyz.xlsx"
#loading XL sheet bassed on file name provided by user
book = openpyxl.load_workbook(file)
#opening sheet whose index no is 0
sheet = book.worksheets[0]

#insert_rows(idx, amount=1) Insert row or rows before row==idx, amount will be no of 
#rows you want to add and it's optional
sheet.insert_rows(13)

對於插入列 openpyxl 也有類似的功能 ieinsert_cols(idx, amount=1)

我編寫了一個函數,它可以使用 openpyxl 在電子表格或整個 2D 表中的任何位置插入整行。

函數的每一行都用注釋解釋,但如果你只想插入一行,只需讓你的行等於 [row]。 即如果 row = [1,2,3,4,5] 然后將您的輸入設置為 [[1,2,3,4,5]]。 如果您希望將此行插入電子表格的第一行 (A1),則 Start = [1,1]。

您確實可以覆蓋文件名,如底部的示例所示。

def InputList(Start, List): #This function is to input an array/list from a input start point; len(Start) must equal 2, where Start = [1,1] is cell 1A. List must be a two dimensional array; if you wish to input a single row then this can be done where len(List) == 1, e.g. List = [[1,2,3,4]]
    x = 0 #Sets up a veriable to go through List columns
    y = 0 #Sets up a veriable to go through List rows
    l = 0 #Sets up a veriable to count addional columns against Start[1] to allow for column reset on each new row
    for row in List: #For every row in List
        l = 0 #Set additonal columns to zero
        for cell in row: #For every cell in row
            ws.cell(row=Start[0], column=Start[1]).value = List[y][x] #Set value for current cell
            x = x + 1 #Move to next data input (List) column
            Start[1] = Start[1] + 1 #Move to next Excel column
            l = l + 1 #Count addional row length
        y = y + 1 #Move to next Excel row
        Start[0] = Start[0] + 1 #Move to next Excel row
        x = 0 #Move back to first column of input data (ready for next row)
        Start[1] = Start[1] - l #Reset Excel column back to orignal start column, ready to write next row

在第 7 行開頭插入單行的示例:

from openpyxl import load_workbook
wb = load_workbook('New3.xlsx')
ws = wb.active

def InputList(Start, List): #This function is to input an array/list from a input start point; len(Start) must equal 2, where Start = [1,1] is cell 1A. List must be a two dimensional array; if you wish to input a single row then this can be done where len(List) == 1, e.g. List = [[1,2,3,4]]
    x = 0 #Sets up a veriable to go through List columns
    y = 0 #Sets up a veriable to go through List rows
    l = 0 #Sets up a veriable to count addional columns against Start[1] to allow for column reset on each new row
    for row in List: #For every row in List
        l = 0 #Set additonal columns to zero
        for cell in row: #For every cell in row
            ws.cell(row=Start[0], column=Start[1]).value = List[y][x] #Set value for current cell
            x = x + 1 #Move to next data input (List) column
            Start[1] = Start[1] + 1 #Move to next Excel column
            l = l + 1 #Count addional row length
        y = y + 1 #Move to next Excel row
        Start[0] = Start[0] + 1 #Move to next Excel row
        x = 0 #Move back to first column of input data (ready for next row)
        Start[1] = Start[1] - l #Reset Excel column back to orignal start column, ready to write next row

test = [[1,2,3,4]]
InputList([7,1], test)

wb.save('New3.xlsx')

我采用了達拉斯解決方案並添加了對合並單元格的支持:

    def insert_rows(self, row_idx, cnt, above=False, copy_style=True, fill_formulae=True):
        skip_list = []
        try:
            idx = row_idx - 1 if above else row_idx
            for (new, old) in zip(range(self.max_row+cnt,idx+cnt,-1),range(self.max_row,idx,-1)):
                for c_idx in range(1,self.max_column):
                  col = self.cell(row=1, column=c_idx).column #get_column_letter(c_idx)
                  print("Copying %s%d to %s%d."%(col,old,col,new))
                  source = self["%s%d"%(col,old)]
                  target = self["%s%d"%(col,new)]
                  if source.coordinate in skip_list:
                      continue

                  if source.coordinate in self.merged_cells:
                      # This is a merged cell
                      for _range in self.merged_cell_ranges:
                          merged_cells_list = [x for x in cells_from_range(_range)][0]
                          if source.coordinate in merged_cells_list:
                              skip_list = merged_cells_list
                              self.unmerge_cells(_range)
                              new_range = re.sub(str(old),str(new),_range)
                              self.merge_cells(new_range)
                              break

                  if source.data_type == Cell.TYPE_FORMULA:
                    target.value = re.sub(
                      "(\$?[A-Z]{1,3})%d"%(old),
                      lambda m: m.group(1) + str(new),
                      source.value
                    )
                  else:
                    target.value = source.value
                  target.number_format = source.number_format
                  target.font   = source.font.copy()
                  target.alignment = source.alignment.copy()
                  target.border = source.border.copy()
                  target.fill   = source.fill.copy()
            idx = idx + 1
            for row in range(idx,idx+cnt):
                for c_idx in range(1,self.max_column):
                  col = self.cell(row=1, column=c_idx).column #get_column_letter(c_idx)
                  #print("Clearing value in cell %s%d"%(col,row))
                  cell = self["%s%d"%(col,row)]
                  cell.value = None
                  source = self["%s%d"%(col,row-1)]
                  if copy_style:
                    cell.number_format = source.number_format
                    cell.font      = source.font.copy()
                    cell.alignment = source.alignment.copy()
                    cell.border    = source.border.copy()
                    cell.fill      = source.fill.copy()
                  if fill_formulae and source.data_type == Cell.TYPE_FORMULA:
                    #print("Copying formula from cell %s%d to %s%d"%(col,row-1,col,row))
                    cell.value = re.sub(
                      "(\$?[A-Z]{1,3})%d"%(row - 1),
                      lambda m: m.group(1) + str(row),
                      source.value
                    )

編輯尼克的解決方案,此版本采用起始行、要插入的行數和文件名,並插入必要數量的空白行。

#! python 3

import openpyxl, sys

my_start = int(sys.argv[1])
my_rows = int(sys.argv[2])
str_wb = str(sys.argv[3])

wb = openpyxl.load_workbook(str_wb)
old_sheet = wb.get_sheet_by_name('Sheet')
mcol = old_sheet.max_column
mrow = old_sheet.max_row
old_sheet.title = 'Sheet1.5'
wb.create_sheet(index=0, title='Sheet')

new_sheet = wb.get_sheet_by_name('Sheet')

for row_num in range(1, my_start):
    for col_num in range(1, mcol + 1):
        new_sheet.cell(row = row_num, column = col_num).value = old_sheet.cell(row = row_num, column = col_num).value

for row_num in range(my_start + my_rows, mrow + my_rows):
    for col_num in range(1, mcol + 1):
        new_sheet.cell(row = (row_num + my_rows), column = col_num).value = old_sheet.cell(row = row_num, column = col_num).value

wb.save(str_wb)

這對我有用:

    openpyxl.worksheet.worksheet.Worksheet.insert_rows(wbs,idx=row,amount=2)

在 row==idx 之前插入 2 行

請參閱:http: //openpyxl.readthedocs.io/en/stable/api/openpyxl.worksheet.worksheet.html

我已經成功地使用了達拉斯的答案,盡管對 openpyxl 3.0.9 做了一些修改。 我在這里為其他想知道如何在 2022 年做到這一點的人發布代碼。

不同之處在於:

  • 添加導入
  • Cell.TYPE_FORMULA更改為TYPE_FORMULA
  • 在需要的地方使用str()int()添加類型轉換
  • 更新定義的名稱

我是 Python 新手,因此請隨意提出任何更改建議,但這就是我讓它工作的方式。

import copy
import re
from openpyxl.utils import get_column_letter
from openpyxl.cell.cell import TYPE_FORMULA

#https://stackoverflow.com/questions/17299364/insert-row-into-excel-spreadsheet-using-openpyxl-in-python#71195832
def insert_rows(self, row_idx, cnt, above=True, copy_style=True, fill_formulae=True):
    """Inserts new (empty) rows into worksheet at specified row index.

    :param self: Worksheet
    :param row_idx: Row index specifying where to insert new rows.
    :param cnt: Number of rows to insert.
    :param above: Set True to insert rows above specified row index.
    :param copy_style: Set True if new rows should copy style of immediately above row.
    :param fill_formulae: Set True if new rows should take on formula from immediately above row, filled with references new to rows.

    Usage:

    * insert_rows(2, 10, above=True, copy_style=False)

    """
    CELL_RE  = re.compile("(?P<col>\$?[A-Z]+)(?P<row>\$?\d+)")

    row_idx = row_idx - 1 if above else row_idx

    def replace(m):
        row = m.group('row')
        prefix = "$" if row.find("$") != -1 else ""
        row = int(row.replace("$",""))
        row += cnt if row > row_idx else 0
        return m.group('col') + prefix + str(row)

    # First, we shift all cells down cnt rows...
    old_cells = set()
    old_fas   = set()
    new_cells = dict()
    new_fas   = dict()
    for c in self._cells.values():

        old_coor = c.coordinate

        # Shift all references to anything below row_idx
        if c.data_type == TYPE_FORMULA:
            c.value = CELL_RE.sub(
                replace,
                c.value
            )
            # Here, we need to properly update the formula references to reflect new row indices
            if old_coor in self.formula_attributes and 'ref' in self.formula_attributes[old_coor]:
                self.formula_attributes[old_coor]['ref'] = CELL_RE.sub(
                    replace,
                    self.formula_attributes[old_coor]['ref']
                )

        # Do the magic to set up our actual shift    
        if c.row > row_idx:
            old_coor = c.coordinate
            old_cells.add((c.row,c.column))
            c.row += cnt
            new_cells[(c.row,c.column)] = c
            if old_coor in self.formula_attributes:
                old_fas.add(old_coor)
                fa = self.formula_attributes[old_coor].copy()
                new_fas[c.coordinate] = fa

    for coor in old_cells:
        del self._cells[coor]
    self._cells.update(new_cells)

    for fa in old_fas:
        del self.formula_attributes[fa]
    self.formula_attributes.update(new_fas)

    # Next, we need to shift all the Row Dimensions below our new rows down by cnt...
    for row in range(len(self.row_dimensions)-1+cnt,row_idx+cnt,-1):
        new_rd = copy.copy(self.row_dimensions[row-cnt])
        new_rd.index = row
        self.row_dimensions[row] = new_rd
        del self.row_dimensions[row-cnt]

    # Now, create our new rows, with all the pretty cells
    row_idx += 1
    for row in range(row_idx,row_idx+cnt):
        # Create a Row Dimension for our new row
        new_rd = copy.copy(self.row_dimensions[row-1])
        new_rd.index = row
        self.row_dimensions[row] = new_rd
        for col in range(1,self.max_column):
            col = get_column_letter(col)
            cell = self[str(col)+str(row)]
            cell.value = None
            source = self[str(col)+str(row-1)]
            if copy_style:
                cell.number_format = source.number_format
                cell.font      = copy.copy(source.font)
                cell.alignment = copy.copy(source.alignment)
                cell.border    = copy.copy(source.border)
                cell.fill      = copy.copy(source.fill)
            if fill_formulae and source.data_type == TYPE_FORMULA:
                s_coor = source.coordinate
                if s_coor in self.formula_attributes and 'ref' not in self.formula_attributes[s_coor]:
                    fa = self.formula_attributes[s_coor].copy()
                    self.formula_attributes[cell.coordinate] = fa
                # print("Copying formula from cell %s%d to %s%d"%(col,row-1,col,row))
                cell.value = re.sub(
                    "(\$?[A-Z]{1,3}\$?)%d"%(row - 1),
                    lambda m: m.group(1) + str(row),
                    source.value
                )   
                cell.data_type = TYPE_FORMULA

    # Check for Merged Cell Ranges that need to be expanded to contain new cells
    for cr_idx, cr in enumerate(self.merged_cells.ranges):
        self.merged_cells.ranges[cr_idx] = CELL_RE.sub(
            replace,
            str(cr)
        )

    # Update all defined names
    wb :Workbook = self.parent
    for definedName in wb.defined_names.definedName:
        ref :str = definedName.attr_text
        parts = ref.split("!")
        if parts[0].strip("'") == self.title:
            definedName.attr_text = CELL_RE.sub(replace, ref)

不幸的是,沒有更好的方法來讀取文件,並使用像 xlwt 這樣的庫來寫出一個新的 excel 文件(在頂部插入新行)。 Excel 不像您可以讀取和追加的數據庫那樣工作。 不幸的是,您只需要讀入信息並在內存中進行操作,然后寫出本質上是一個新文件的內容。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM