简体   繁体   English

SSIS(C#或VB):删除目录中所有Excel文件中的1-12行

[英]SSIS (C# or VB): delete rows 1-12 in all excel files in directory

Before importing data from multiple excel files I need to get rid of first 12 rows in each worksheet. 从多个Excel文件导入数据之前,我需要删除每个工作表中的前12行。 I am going to use the code from this solution for bulk processing script task. 我将使用此解决方案中 的代码进行批量处理脚本任务。

Questions: 问题:

  • What code should I insert into the script to delete rows? 我应该在脚本中插入什么代码以删除行? (I suppose right after //Load the DataTable with Sheet Data so we can get the column header ); (我想在//Load the DataTable with Sheet Data so we can get the column header ); or 要么
  • How to modify this code to make it read excel files starting from Row 13; 如何修改此代码以使其从第13行开始读取excel文件; or, alternatively 或者
  • What SSIS task should I insert before the script for bulk row deletion? 在批量删除行的脚本之前,我应该插入什么SSIS任务?

This is a method for looping through sheets: 这是一种用于遍历工作表的方法:

Create a data flow task to read sheet names into ADO object. 创建一个数据流任务以将工作表名称读入ADO对象。

数据流

First item is a script component as a source. 第一项是脚本组件作为源。 I have a variable for connection string to the Excel Spreadsheet 我有一个用于Excel电子表格连接字符串的变量

connstr

Created an Output of SheetName 创建了SheetName的输出

输出设定

Here's the code to read tab names: 这是读取标签名称的代码: C#

You are basically opening the spreadsheet with oleDB. 您基本上是使用oleDB打开电子表格。 Putting the table names into a data table 将表名称放入数据表

Looping through the data table and writing out the rows to output. 遍历数据表并写出要输出的行。

Make sure to close the Connection!!! 确保关闭连接!!! This may cause errors later if you don't. 如果不这样做,以后可能会导致错误。

The next step is a conditional split as for some reason the result has duplicates of tab names and they all end in an '_'. 下一步是条件拆分,因为某些原因,结果中的选项卡名称重复,并且它们都以“ _”结尾。

有条件拆分

Next step is deriving a column to clean the sheet name of exta "'" 下一步是派生一个列以清除exta“'”的工作表名称

DerivedCol

Create a Variable of type Object: I named mine ADO_Sheets 创建对象类型的变量:我将其命名为ADO_Sheets

Insert a recordset destination object: 1. Set the variable to the variable you just created 2. Map the columns for clean Sheet 插入一个记录集目标对象:1.将变量设置为刚创建的变量。2.映射干净工作表的列

Now back to the Control Flow and set up a foreach loop control: 现在回到控制流并设置一个foreach循环控件: 在此处输入图片说明

Configure the foreach... Enumerator: Foreach ADO Enumerator Source: ADO_Sheets Variable Mapping: Set to a variable called SheetName 配置foreach ...枚举器:Foreach ADO枚举器源:ADO_Sheets变量映射:设置为名为SheetName的变量

I have a Function Task inside the loop but it is more for ease of understanding, it could have been down in the variables: 我在循环中有一个功能任务,但是为了便于理解,它可能在变量中: SQL

This variable is now your select for extracting the data off that page. 现在,您可以选择此变量来提取该页面上的数据。

Last is the data flow task you want to run. 最后是您要运行的数据流任务。

Lot's of work, but I use this so often I thought I would share!!! 很多工作,但是我经常使用这个,我想我会分享!!!

Adding info about connection strings to Excel (xlsx) 将有关连接字符串的信息添加到Excel(xlsx)

Excel 2010 Xlsx files Connect to Excel 2007 (and later) files with the Xlsx file extension. Excel 2010 Xlsx文件使用Xlsx文件扩展名连接到Excel 2007(及更高版本)文件。 That is the Office Open XML format with macros disabled. 那是禁用宏的Office Open XML格式。

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=c:\\myFolder\\myExcel2007file.xlsx; Provider = Microsoft.ACE.OLEDB.12.0;数据源= c:\\ myFolder \\ myExcel2007file.xlsx; Extended Properties="Excel 12.0 Xml;HDR=YES"; 扩展属性=“ Excel 12.0 Xml; HDR = YES”;

"HDR=Yes;" “HDR =是;” indicates that the first row contains columnnames, not data. 表示第一行包含列名,而不是数据。 "HDR=No;" “HDR =无;” indicates the opposite. 表示相反。

Source: https://www.connectionstrings.com/ace-oledb-12-0/ 资料来源: https : //www.connectionstrings.com/ace-oledb-12-0/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM