简体   繁体   English

使用SSIS从Excel导入到SQL

[英]Import from Excel to SQL using SSIS

I am automating some things at work so I decided to encompass it in a SSIS package. 我正在使某些工作自动化,因此我决定将其包含在SSIS包中。 I have been working on this for months and one of the problems I faced at the beginning resurfaced. 我已经为此工作了几个月,而我刚开始遇到的问题之一浮出水面。

I receive a report through email, which is downloaded renamed and placed into L:\\MACROS\\SSIS\\Input (this is done through a C# application I created). 我通过电子邮件收到一份报告,该报告已重命名下载并放入L:\\ MACROS \\ SSIS \\ Input(这是通过我创建的C#应用​​程序完成的)。

I then import the data from that report into SQL. 然后,我将数据从该报告导入SQL。

The problem exists here, as I try to get the data from the xls file a specific column has 1 of 2 behaviours. 这里存在问题,因为我尝试从xls文件中获取数据,特定列具有2行为中的1。 If the top row of data is only numeric, it will automatically assign this as numeric and only import numeric values, anything non-numeric is transformed into null. 如果数据的第一行仅是数字,它将自动将其分配为数字,并且仅导入数字值,任何非数字都将转换为null。

This column is the invoice number which usually is numeric, but there is a world region where they will be non-numeric (ie: "MAGI:1326564" I get this error message when I open my data flow object: 此列是发票编号,通常是数字,但是在世界范围内,它们将是非数字的(即:“ MAGI:1326564”)打开数据流对象时,收到以下错误消息:

TITLE: Microsoft Visual Studio 标题:Microsoft Visual Studio

The metadata of the following output columns does not match the metadata of the external columns with which the output columns are associated: 以下输出列的元数据与输出列关联的外部列的元数据不匹配:

Output "Excel Source Output": "F11" 输出“ Excel Source Output”:“ F11”

Do you want to replace the metadata of the output columns with the metadata of the external columns? 是否要用外部列的元数据替换输出列的元数据?

------------------------------ BUTTONS: - - - - - - - - - - - - - - - 纽扣:

&Yes &No &是&否

I can either get the numerics or the non-numeric values. 我可以获取数字或非数字值。

Now, as I wanted a permanent fix I thought about just using C# to create a separate column for non-numeric and delete them from the original column. 现在,由于需要永久性修复,我想到了仅使用C#为非数字创建单独的列并将其从原始列中删除的想法。

That way I have a reusable method of fixing the above issue. 这样,我就拥有解决上述问题的可重用方法。

        try
        {
            //Start Excel and get Application object.
            oXL = new Microsoft.Office.Interop.Excel.Application();
            oXL.Visible = false;

            oWB = (Microsoft.Office.Interop.Excel._Workbook)(oXL.Workbooks.Open(@"L:\MACROS\SSIS\Input\A2_POST_ADVICE_FOR_DUTY_LINES.xls"));
            oSheet = (Microsoft.Office.Interop.Excel._Worksheet)oWB.ActiveSheet;


        /*    int nInLastRow = oSheet.Cells.Find("*", System.Reflection.Missing.Value,
            System.Reflection.Missing.Value, System.Reflection.Missing.Value, Microsoft.Office.Interop.Excel.XlSearchOrder.xlByRows, Microsoft.Office.Interop.Excel.XlSearchDirection.xlPrevious, false, System.Reflection.Missing.Value, System.Reflection.Missing.Value).Row;
            */

              var j = 7;


            var cellValue = (string)(oSheet.Cells[7, 11] as Microsoft.Office.Interop.Excel.Range).Value; 


            //        while (j < 20)/*nInLastRow)*/
            //        {
            i = 0;
                foreach (char value in cellValue)
                {
                    bool digit = char.IsDigit(value);
                    if (digit == true)
                    {
                        i = i + 1;
                    }
                    else { i = i + 0; }

                }
                if (i > 1)
                {
                    oSheet.Cells[j, 22] = cellValue;
                    //oSheet.Cells[j, 11].Clear();
                }

            // Close the workbook, tell it to save and give the path.

            //   j = j + 1;
            //        }

            oXL.DisplayAlerts = false;

            oWB.SaveAs(@"L:\MACROS\SSIS\Input\A2_POST_ADVICE_FOR_DUTY_LINES.xls", Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing, Microsoft.Office.Interop.Excel.XlSaveAsAccessMode.xlNoChange, Type.Missing, Type.Missing, Type.Missing,Type.Missing, Type.Missing);


            oWB.Close();

            // Now quit the application.

            oXL.Quit();

            // Call the garbage collector to collect and wait for finalizers to finish.

            GC.Collect();

            GC.WaitForPendingFinalizers();

            // Release the COM objects that have been instantiated.

            Marshal.FinalReleaseComObject(oWB);

            Marshal.FinalReleaseComObject(oSheet);
            //  Marshal.FinalReleaseComObject(oRng);

            Marshal.FinalReleaseComObject(oXL);
        }

        catch (Exception theException)
        {
            String errorMessage;
            errorMessage = "Error: ";
            errorMessage = String.Concat(errorMessage, theException.Message);
            errorMessage = String.Concat(errorMessage, " Line: ");
            errorMessage = String.Concat(errorMessage, theException.Source);

            MessageBox.Show(errorMessage, "Error");
        }

I keep getting an error message while running C# 我在运行C#时不断收到错误消息

"Cannot convert type double to string. “无法将double类型转换为字符串。

The code was working before implementing the loop (for 2 tries), after implemented the loop it wouldn't work anymore so I commented out the loop but I still get the same error. 代码在执行循环之前(两次尝试)有效,在执行循环之后不再起作用,因此我注释掉了循环,但仍然遇到相同的错误。

I have also changed: 我也改变了:

            var cellValue = (string)(oSheet.Cells[7, 11] as Microsoft.Office.Interop.Excel.Range).Value; 

to

var cellValue = (oSheet.Cells[7, 11] as Microsoft.Office.Interop.Excel.Range).Value.ToString();

with this change it worked for 2 tests and wouldn't work anymore. 进行此更改后,它可以进行2次测试,并且不再可用。

If I change it to: 如果我将其更改为:

string cellValue =  "MA1352564";

it will execute what I wanted it to do so I have narrowed it down to the issue being trying to convert the value of the cell to a string so that it can there check if the characters in the string are digits or not. 它会执行我想要执行的操作,因此我将其范围缩小到试图将单元格的值转换为字符串以便可以在那里检查字符串中的字符是否为数字的问题。

I am looking for either a different solution to my import problem or any ideas on how to fix the C# section of code :) 我正在寻找导入问题的其他解决方案,或有关如何解决代码的C#部分的任何想法:)

EDIT: I forgot to mention that if I enable data viewer in the flow, the data coming out of excel is already stripped off the non-numeric data.... 编辑:我忘了提及,如果我在流程中启用了数据查看器,则来自excel的数据已经剥离了非数字数据。

EDIT2: EDIT2:

After using the suggested options I get this error: 使用建议的选项后,出现此错误:

Error: 0xC0202009 at DataInputUni, Excel Source [12]: SSIS Error Code DTS_E_OLEDBERROR. 错误:DataInputUni处的0xC0202009,Excel源[12]:SSIS错误代码DTS_E_OLEDBERROR。 An OLE DB error has occurred. 发生OLE DB错误。 Error code: 0x80040E21. 错误代码:0x80040E21。 An OLE DB record is available. OLE DB记录可用。 Source: "Microsoft JET Database Engine" Hresult: 0x80040E21 Description: "Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.". 源:“ Microsoft JET数据库引擎”结果:0x80040E21说明:“多步OLE DB操作生成错误。请检查每个OLE DB状态值(如果可用)。未完成工作。”。 Error: 0xC0208265 at DataInputUni, Excel Source [12]: Failed to retrieve long data for column "F11". 错误:DataInputUni,Excel Source [12]处错误0xC0208265:无法检索列“ F11”的长数据。 Error: 0xC020901C at DataInputUni, Excel Source [12]: There was an error with Excel Source.Outputs[Excel Source Output].Columns[F11] on Excel Source.Outputs[Excel Source Output]. 错误:DataInputUni处的错误0xC020901C,Excel Source [12]:Excel Source.Outputs [Excel Source Output] .Excel Source.Outputs [Excel Source Output]上的列[F11]发生错误。 The column status returned was: "DBSTATUS_UNAVAILABLE". 返回的列状态为:“ DBSTATUS_UNAVAILABLE”。 Error: 0xC0209029 at DataInputUni, Excel Source [12]: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. 错误:DataInputUni处的错误0xC0209029,Excel源[12]:SSIS错误代码DTS_E_INDUCEDTRANSFORMFAILUREONERROR。 The "Excel Source.Outputs[Excel Source Output].Columns[F11]" failed because error code 0xC0209071 occurred, and the error row disposition on "Excel Source.Outputs[Excel Source Output].Columns[F11]" specifies failure on error. “ Excel Source.Outputs [Excel源输出] .Columns [F11]”失败,因为发生了错误代码0xC0209071,并且“ Excel Source.Outputs [Excel Source Output] .Columns [F11]”上的错误行配置指定错误失败。 An error occurred on the specified object of the specified component. 指定组件的指定对象发生错误。 There may be error messages posted before this with more information about the failure. 在此之前可能会发布错误消息,其中包含有关失败的更多信息。 Error: 0xC0047038 at DataInputUni, SSIS.Pipeline: SSIS Error Code DTS_E_PRIMEOUTPUTFAILED. 错误:DataInputUni,SSIS处的错误:0xC0047038。管道:SSIS错误代码DTS_E_PRIMEOUTPUTFAILED。 The PrimeOutput method on Excel Source returned error code 0xC0209029. Excel Source上的PrimeOutput方法返回错误代码0xC0209029。 The component returned a failure code when the pipeline engine called PrimeOutput(). 当管道引擎调用PrimeOutput()时,该组件返回失败代码。 The meaning of the failure code is defined by the component, but the error is fatal and the pipeline stopped executing. 故障代码的含义由组件定义,但是错误是致命的,并且管道停止执行。 There may be error messages posted before this with more information about the failure. 在此之前可能会发布错误消息,其中包含有关失败的更多信息。

It sounds like the Excel driver isn't reading enough data when guessing the datatype. 听起来好像Excel驱动程序在猜测数据类型时没有读取足够的数据。 In addition to setting ;Extended Properties="IMEX=1" in the connection string as per the comments, set the TypeGuessRows registry key to 0 according to which version of office, probably located at one of the following keys: 除了根据注释在连接字符串中设置;Extended Properties="IMEX=1" ,还可以根据Office版本将TypeGuessRows注册表项设置为0,可能位于以下项之一:

  • HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Jet\\4.0\\Engines\\Excel\\TypeGuessRows
  • HKEY_LOCAL_MACHINE\\SOFTWARE\\Wow6432Node\\Microsoft\\Jet\\4.0\\Engines\\Excel\\TypeGuessRows
  • HKEY_LOCAL_MACHINE\\Software\\Microsoft\\Office\\ OFFICE NUMERICAL VERSION \\Access Connectivity Engine\\Engines\\Excel\\TypeGuessRows HKEY_LOCAL_MACHINE\\Software\\Microsoft\\Office\\ OFFICE NUMERICAL VERSION \\Access Connectivity Engine\\Engines\\Excel\\TypeGuessRows
  • HKEY_LOCAL_MACHINE\\SOFTWARE\\Wow6432Node\\Microsoft\\Office\\ OFFICE NUMERICAL VERSION \\Access Connectivity Engine\\Engines\\Excel\\TypeGuessRows HKEY_LOCAL_MACHINE\\SOFTWARE\\Wow6432Node\\Microsoft\\Office\\ OFFICE NUMERICAL VERSION \\Access Connectivity Engine\\Engines\\Excel\\TypeGuessRows

Setting TypeGuessRows to 0 causes the entire column to be scanned when guessing the datatype. 将TypeGuessRows设置为0会导致在猜测数据类型时扫描整个列。 Setting IMEX=1 causes data to be returned as text (this can be altered in the registry) when mixed values are encountered. 当遇到混合值时,设置IMEX = 1会导致数据以文本形式返回(可以在注册表中更改)。 Omitting IMEX=1 causes data that does not match the guessed datatype to be returned as null.. IMEX is thus less important than TypeGuessRows, as setting it can only make a reasonable difference if enough variety is encountered in the first 8 rows (default scan) for columns that exhibit variety 省略IMEX = 1会使与猜测的数据类型不匹配的数据返回为null。因此,IMEX不如TypeGuessRows重要,因为设置IMEX仅当在前8行中遇到足够多的变化时才可以产生合理的区别(默认扫描) )用于显示多样性的列

http://microsoft-ssis.blogspot.com/2011/06/mixed-data-types-in-excel-column.html http://microsoft-ssis.blogspot.com/2011/06/mixed-data-types-in-excel-column.html

Thanks to Caius Jard for his answer. 感谢Caius Jard的回答。 I found a solution for my problem, I tried changing the output file format of the report to CSV but this made it worse lol. 我找到了解决问题的方法,我尝试将报告的输出文件格式更改为CSV,但这使情况更糟。 with CSV it simply would not scan the cells at all and assign everything as string which caused issues with importing. 使用CSV时,它根本不会扫描所有单元格并将所有内容分配为字符串,这会导致导入问题。 I then tried using .xlsx (2007 excel) which meant a new connection manager and got this as the connection string: 然后,我尝试使用.xlsx(2007 excel)表示新的连接管理器,并将其作为连接字符串:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=L:\MACROS\SSIS\Input\A2_POST_TEST20190103214110525.xlsx;Extended Properties="EXCEL 12.0 XML;HDR=NO";

Instead of adding again what Caius suggested I tried changing it to this: 我没有再添加Caius的建议,而是尝试将其更改为:

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=L:\MACROS\SSIS\Input\A2_POST_TEST20190103214110525.xlsx;Extended Properties="EXCEL 12.0 XML;HDR=NO;IMEX=1";

this fixed my problem! 这解决了我的问题!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM