简体   繁体   English

从 Excel 中删除行

[英]Delete rows from Excel

Following are the approaches I tried:以下是我尝试过的方法:

A) I tried to delete rows from an excel sheet using Microsoft.Office.Interop.Excel . A) 我尝试使用Microsoft.Office.Interop.Excel从 excel 工作表中删除行。 I'm doing this in a script task within a SSIS package.我在 SSIS package 中的脚本任务中执行此操作。

I added the library to the GAC, since it was raising an error: Could not load Library .我将库添加到 GAC,因为它引发了一个错误: Could not load Library

Now it's raises this error saying: Retrieving the COM class factory for component with CLSID {00024500-0000-0000-C000-000000000046} failed due to the following error: 80040154 .现在它引发了这个错误: Retrieving the COM class factory for component with CLSID {00024500-0000-0000-C000-000000000046} failed due to the following error: 80040154

Googling this tells me I need MS Office installed for it to work, which I don't want coz the server I deploy this solution on is definitely not going to have MS Office installed on it.谷歌搜索这告诉我我需要安装 MS Office 才能工作,我不希望因为我部署此解决方案的服务器肯定不会安装 MS Office。 I'm no expert, but I would like to know why such operations are not possible, by simply adding reference to a dll?我不是专家,但我想知道为什么不能通过简单地添加对 dll 的引用来进行此类操作? Why is it mandatory to install MS Office.为什么必须安装 MS Office。

B) I also tried Oledb jet provider, but this one doesn't allow deleting of rows. B) 我也试过 Oledb jet 供应商,但这个供应商不允许删除行。 The only operations it supports is Insert, Update and Select.它支持的唯一操作是插入、更新和 Select。

Things I have come across on the web:我在 web 上遇到的事情:

A) A SO Questions' answer suggests to use Npoi , but I can't totally rely on that, because what's free library today can become paid in future. A) SO Questions 的回答建议使用Npoi ,但我不能完全依赖它,因为今天免费的图书馆将来可能会收费。

B) Also I have come across EPP Plus library. B) 我也遇到过EPP Plus库。 I have used it and understand that it's based on a GNU public license, but I'm apprehensive on using it because it may become a paid tool in future.我已经使用过它并且知道它基于 GNU 公共许可证,但我担心使用它因为它将来可能成为付费工具。

C) I have also come across people using Open XML SDK by Microsoft. C)我也遇到过使用 Microsoft Open XML SDK的人。 Before I get my hands dirty in this, I would love if someone up front tells me whether I should be using this.在我亲自动手之前,如果前面有人告诉我是否应该使用它,我会很高兴。 Not that I'm lazy to try it out myself but what what would be helpful to me before I start is, does this SDK need any external programs installed on the machine.并不是我懒得自己尝试,而是在我开始之前对我有帮助的是,这个 SDK 是否需要在机器上安装任何外部程序。 Coz it requires me to install an msi to be able to us it.因为它需要我安装一个 msi 才能使用它。

Is there a work around to do this using Microsoft COM components?是否可以使用 Microsoft COM 组件来解决此问题? I'm not asking a subjective question here.我不是在这里问一个主观的问题。 I want to know technical obstacles, if any when I use the above three researched tools.我想知道使用上述三个研究工具时的技术障碍,如果有的话。

Thanks in advance提前致谢

The point is with Interop that you indeed must have office installed. 关于Interop的要点是,您确实必须安装Office。 So bluntly said, you cannot use Interop. 坦率地说,您不能使用Interop。 If you only need to support xlsx files, you can do it in xml. 如果只需要支持xlsx文件,则可以在xml中进行。

See this and this link for more details about unpacking xlsx files, editing and repacking. 这个这个链接了解拆包XLSX文件,编辑和重新包装的更多细节。 The only thing you need than is something to unzip it and your own xml handling code. 唯一需要的就是解压缩它和自己的xml处理代码。

If the requirement is to also support xls files you have a bit of a problem. 如果要求还支持xls文件,那么您会遇到一些问题。 I tried this in the past without any additional installations but did not succeed, so I decided to only support xlsx. 我过去没有进行任何其他安装就尝试了此操作,但是没有成功,因此我决定仅支持xlsx。 I either needed some .msi files or office installed on the server. 我需要一些.msi文件或服务器上安装的Office。

You're saying that you are using a script task in SSIS; 您是说您在SSIS中使用脚本任务; then why not import the excel file you want to delete the values from it (preferably into a database or keep it cached into a datatable) and then generate a new xls file with just the data you want to keep. 那么为什么不导入要从中删除值的excel文件(最好将其删除(最好是保存到数据库中或将其缓存到数据表中)),然后仅使用要保留的数据生成一个新的xls文件。

OR don't use the script task at all and use, inside a data flow, a configured excel source combined with a script component (which is basically the same thing as a script task just that you can use this one only in a data flow) and do all your work there. 或者根本不使用脚本任务,而是在数据流中使用配置好的excel源以及脚本组件(与脚本任务基本相同,只是您只能在数据流中使用此脚本任务) ),然后在那里做所有工作。 If you have a dynamic connection to the excel file, you can always use variables (parameters if you're on DataTools) to configure such a connection. 如果您具有与excel文件的动态连接,则始终可以使用变量(如果使用DataTools,则使用参数)来配置这种连接。

Good luck! 祝好运!

If you want to use Microsoft.Office.Interop.Excel then, yes, you do need Excel on the server. 如果要使用Microsoft.Office.Interop.Excel,则可以,服务器上确实需要Excel。 Therefore, so long as you only want to deal with xlsx based workbooks / 2007+ then I would suggest that OpenXML is the way to go. 因此,只要您只想处理基于xlsx的工作簿/ 2007 +,那么我建议您使用OpenXML。 It's a bit of a learning curve and you get to realise how much work Excel does for you in the background but is not too bad once you get used to it. 这是一个学习曲线,您会意识到Excel在后台为您做了多少工作,但是一旦习惯了它就不会太糟糕。

A very quick sample knocked up in LINQPad : LINQPad中敲出一个非常快速的样本:

void Main()
{
    string fileName = @"c:\temp\delete-row-openxml.xlsx";

    using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fileName, true))
    {
        // Get the necessary bits of the doc
        WorkbookPart workbookPart = doc.WorkbookPart;
        SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
        SharedStringTable sst = sstpart.SharedStringTable;

        // Get the first worksheet
        WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
        Worksheet sheet = worksheetPart.Worksheet;

        var rows = sheet.Descendants<Row>();

        foreach (Row row in rows.Where(r => ShouldDeleteRow(r, sst)))
        {
            row.Remove();
        }
    }
}

private bool ShouldDeleteRow(Row row, SharedStringTable sst)
{
    // Whatever logic to apply to decide whether to remove a row or not
    string txt = GetCellText(row.Elements<Cell>().FirstOrDefault(), sst);
    return (txt == "Row 3");
}

// Basic way to get the text of a cell - need to use the SharedStringTable
private string GetCellText(Cell cell, SharedStringTable sst)
{
    if (cell == null)
        return "";

    if ((cell.DataType != null) && (cell.DataType == CellValues.SharedString))
    {
        int ssid = int.Parse(cell.CellValue.Text);
        string str = sst.ChildElements[ssid].InnerText;
        return str;
    }
    else if (cell.CellValue != null)
    {
        return cell.CellValue.Text;
    }
    return "";
}

Note that this will clear the row not shuffle up all the other rows. 请注意,这将清除该行,而不会使所有其他行洗牌。 To do that you'd need to provide some logic to adjust row indexes of the remaining rows. 为此,您需要提供一些逻辑来调整其余行的行索引。

To answer a little more of the OP question - the OpenXML msi is all that is needed apart from the standard .Net framework. 要回答更多的OP问题-除了标准.Net框架外,仅需要OpenXML msi。 The sample needs a reference to WindowsBase.dll for the packaging API and using statements for DocumentFormat.OpenXml.Packaging and DocumentFormat.OpenXml.Spreadsheet. 该示例需要对WindowsBase.dll的打包API引用,并需要对DocumentFormat.OpenXml.Packaging和DocumentFormat.OpenXml.Spreadsheet使用语句。 The OpenXML API package can be referenced in VS via Nuget too so you don't even need to install the msi if you don't want. OpenXML API包也可以通过nuget在VS中引用,因此,如果您不需要,甚至不需要安装msi。 But it makes sense to do so IMHO. 但是这样做是有道理的恕我直言。

One other item that you will find VERY useful is the OpenXML tools msi. 您会发现非常有用的另一项是OpenXML工具msi。 This lets you open a Word or Excel doc and see the XML layout inside - most helpful. 这使您可以打开Word或Excel文档并查看其中的XML布局-最有用。

This is how I managed to remove rows in excel and move up the data这就是我设法删除 excel 中的行并向上移动数据的方法

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Spreadsheet;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text.RegularExpressions;


using (SpreadsheetDocument document = SpreadsheetDocument.Open(pathToFile, true))
{
   WorkbookPart wbPart = document.WorkbookPart;

   var worksheet = wbPart.WorksheetParts.First().Worksheet;
   var rows = worksheet.GetFirstChild<SheetData>().Elements<Row>();

   
   // Skip headers 
   foreach (var row in rows.Skip(1))
   {
      if (/* some condition on which rows to delete*/)
      {
         row.Remove();
      }
   }

   // Fix all row indexes
   string cr;
   for (int i = 2; i < rows.Count(); i++)
   {
      var newCurrentRowIndex = rows.ElementAt(i - 1).RowIndex.Value + 1;
      var currentRow = rows.ElementAt(i);

      currentRow.RowIndex.Value = updatedRowIndex;
      IEnumerable<Cell> cells = currentRow.Elements<Cell>().ToList();

      if (cells != null)
      {
         foreach (Cell cell in cells)
         {
            cr = cell.CellReference.Value;
            cr = Regex.Replace(cell.CellReference.Value, @"[\d-]", "");
            cell.CellReference.Value = $"{cr}{updatedRowIndex}";
         }
      }
   }

   worksheet.Save();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM