简体   繁体   English

SSIS OPENROWSET 查询平面文件

[英]SSIS OPENROWSET query flat file

I currently have a variable name called InvoiceFileName that is creating .csv files through a foreach loop.我目前有一个名为 InvoiceFileName 的变量名,它通过 foreach 循环创建 .csv 文件。 A list of .csv is then outputted to a folder.然后将 .csv 列表输出到文件夹中。
I will then need to query off of each .csv file to select the header and the first row of data for each .csv.然后我需要查询每个 .csv 文件以选择每个 .csv 的标题和第一行数据。 I believe I need to use the OPENROWSET to query off of the .csv.我相信我需要使用OPENROWSET来查询 .csv。 I have 2 questions.我有2个问题。

  1. What is the syntax to query off of the variable name InvoiceFileName.查询变量名 InvoiceFileName 的语法是什么。
  2. Is it possible to select the header field and first row of data OPENROWSET without inserting into a table.是否可以在不插入表格的情况下选择标题字段和第一行数据OPENROWSET

Below is a simple OPENROWSET that only provides the header of the file.下面是一个简单的OPENROWSET ,它只提供文件的标题。

SELECT 
top 1 *
FROM OPENROWSET(BULK N'\\myservername\f$\reports\Invoices\CokeFiles\54ASBSd.csv', SINGLE_CLOB) AS Report 

What kind of privs do you have on the database?你对数据库有什么样的权限? If you have or can get slightly elevated privs, you can use BULK INSERT and xp_cmdShell to accomplish this, but like @scsimon said, you will have to use dynamic sql.如果您拥有或可以获得稍微提升的权限,您可以使用BULK INSERTxp_cmdShell来完成此操作,但就像@scsimon 所说的那样,您将不得不使用动态 sql。 Here's a quick example:这是一个快速示例:

-----------------------------------------------------------------------------------------------------
-- Set up your variables
-----------------------------------------------------------------------------------------------------
DECLARE 
    @folderPath AS VARCHAR(100) = '\\some\folder\path\here\',
    @cmd AS VARCHAR(150), -- Will populate this with a command to get a list of files in a directory
    @InvoiceFileName AS VARCHAR(100), -- Will be used in cursor loop
    @targetTable AS VARCHAR(50) = 'SomeTable',
    @fieldTerminator AS CHAR(1) = ',',
    @rowTerminator AS CHAR(2) = '\n'
-----------------------------------------------------------------------------------------------------
-- Create a temp table to store the file names
-----------------------------------------------------------------------------------------------------
IF OBJECT_ID('tempdb..#FILE_LIST') IS NOT NULL
    DROP TABLE #FILE_LIST
--
CREATE TABLE #FILE_LIST(FILE_NAME VARCHAR(255))

-----------------------------------------------------------------------------------------------------
-- Get a list of the files and store them in the temp table:
-- NOTE: this DOES require elevated permissions
-----------------------------------------------------------------------------------------------------
SET @cmd = 'dir "' + @folderPath + '" /b'
--
INSERT INTO #FILE_LIST(FILE_NAME)
EXEC Master..xp_cmdShell @cmd

--------------------------------------------------------------------------------
-- Here we remove any null values
--------------------------------------------------------------------------------
DELETE #FILE_LIST WHERE FILE_NAME IS NULL

-----------------------------------------------------------------------------------------------------
-- Set up our cursor and loop through the files 
-----------------------------------------------------------------------------------------------------
DECLARE c1 CURSOR FOR SELECT FILE_NAME FROM #FILE_LIST
OPEN c1
FETCH NEXT FROM c1 INTO @InvoiceFileName
WHILE @@FETCH_STATUS <> -1
    BEGIN -- Begin WHILE loop
        BEGIN TRY
            -- Bulk insert won't take a variable name, so dynamically generate the 
            --  SQL statement and execute it instead:
            SET @sql = 'BULK INSERT ' + @targetTable + ' FROM ''' + @InvoiceFileName + ''' '
                + '     WITH ( 
                        FIELDTERMINATOR = ''' + @fieldTerminator + ''', 
                        ROWTERMINATOR = ''' + @rowTerminator + ''', 
                        FIRSTROW = 1,
                        LASTROW = 2
                    ) '
            EXEC (@sql)
        END TRY
        BEGIN CATCH
            -- Handle errors here
        END CATCH
        -- Continue your loop
        FETCH NEXT FROM c1 INTO @path,@filename
    END -- End WHILE loop

-- Do what you need to do here with the data in your target table

A few disclaimers:一些免责声明:

  1. I have not tested this code.我没有测试过这段代码。 Only copied from a slightly more complex proc I've used in the past that works for exactly this kind of scenario.只从我过去使用过的稍微复杂一点的 proc 复制过来,它正好适用于这种情况。
  2. You will need elevated privs for BULK INSERT and xp_cmdShell .您需要提升BULK INSERTxp_cmdShell
  3. I know people frown on using xp_cmdShell (and for good reason) but this is a quick and dirty solution making a lot of assumptions about what your environment is like.我知道人们不xp_cmdShell使用xp_cmdShell (并且有充分的理由),但这是一个快速而肮脏的解决方案,对您的环境做出了很多假设。
  4. This is assuming you're not grabbing the data as you get each file in your variable.这是假设您没有在获取变量中的每个文件时获取数据。 If you are, you can skip the first part of this code.如果是,您可以跳过此代码的第一部分。
  5. This code also assumes you are doing your own error handling in places other than the one try/catch block you see.这段代码还假设您在除了看到的一个 try/catch 块之外的地方进行自己的错误处理。 I've omitted a lot of that for simplicity.为简单起见,我省略了很多。

For doing this through SSIS, ideally you'd probably need to use a format file for the bulk operation , but you'd have to have consistently formatted files and remove the SINGLE_CLOB option as well.为了通过 SSIS 执行此操作,理想情况下您可能需要使用格式文件进行批量操作,但您必须具有一致的格式文件并删除 SINGLE_CLOB 选项。 A really hacky and non-ideal way to do this would be to do something like this:这样做的一个非常hacky和非理想的方法是做这样的事情:

Let's say your file contains this data:假设您的文件包含以下数据:

Col1,Col2,Col3,Col4
Here's,The,First,Line
Here's,The,Second,Line
Here's,The,Third,Line
Here's,The,Fourth,Line

Then you could basically just parse the data doing something like this:然后你基本上可以像这样解析数据:

SELECT SUBSTRING(OnlyColumn, 0, CHARINDEX(CHAR(10), OnlyColumn, CHARINDEX(CHAR(10), OnlyColumn, 0)+1) )
FROM OPENROWSET(BULK '\\location\of\myFile.csv', SINGLE_CLOB) AS Report (OnlyColumn)

And your result would be this:你的结果是这样的:

Col1,Col2,Col3,Col4  Here's,The,First,Line 

This is obviously dependent on your line endings being consistent, but if you want the results in a single column and single row (as is the behavior of the bulk operation with the SINGLE_CLOB option), that should get you what you need.这显然取决于您的行尾是否一致,但是如果您希望结果在单列和单行中(就像使用 SINGLE_CLOB 选项的批量操作的行为一样),那应该可以满足您的需求。

You can take a look at the solution on this SO post for info on how to pass the SSIS variable value as a parameter to your query.您可以查看此 SO 帖子上的解决方案,了解有关如何将 SSIS 变量值作为参数传递给查询的信息。

Use a Foreach Loop container to query all files in a folder.使用 Foreach 循环容器查询文件夹中的所有文件。 You can use wildcards for the file name, or user the variables in your DTS to set the properties of the components.您可以使用通配符作为文件名,或使用 DTS 中的变量来设置组件的属性。

Inside the loop container you place a Data Flow Task with your source file connection, your transformations, and your destination.在循环容器内,您可以将数据流任务与源文件连接、转换和目标一起放置。

You can modify the file names and paths of all these objects by setting their properties to variables in your DTS.您可以通过将所有这些对象的属性设置为 DTS 中的变量来修改所有这些对象的文件名和路径。

With an Expresion Task inside the loop, you can change the path of the CSV file connection.通过循环内的表达任务,您可以更改 CSV 文件连接的路径。

Foreach 循环编辑器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM