简体繁体 English

将数据从 SAP BW Open Hub 导入 Azure 数据湖时，如何在数据工厂中使用文件名前缀？

[英]How to use file name prefix in Data Factory when importing data into Azure data lake from SAP BW Open Hub?

原文 2021-07-14 21:06:40 7 1 azure-data-factory/ azure-data-factory-2/ azure-data-lake/ azure-data-factory-pipeline/ sap-bw

I have a source of SAP BW Open Hub in data factory and a sink of Azure data lake gen2 and am using a copy activity to move the data.我在数据工厂中有一个 SAP BW Open Hub 源和一个 Azure 数据湖 gen2 接收器，我正在使用复制活动来移动数据。

I am attempting to transfer the data to the lake and split into numerous files, with 200000 rows per file.我试图将数据传输到湖中并拆分为多个文件，每个文件有 200000 行。 I would also like to be able to prefix all of the filenames eg 'cust_', so the files would be something along the lines of cust_1, cust_2, cust_3 etc.我还希望能够为所有文件名添加前缀，例如“cust_”，因此文件将类似于 cust_1、cust_2、cust_3 等。

This method only seems to be an issue when using SAP BW Open Hub as a source (it works fine when using SQL Server as a source. Please see the warning message below. After checking with out internal SAP BW team, they assure me that the data is in a tabular format, and no explicit partition is enabled, so there shouldn't be an issue.当使用 SAP BW Open Hub 作为源时，此方法似乎只是一个问题（使用 SQL Server 作为源时它工作正常。请参阅下面的警告消息。与内部 SAP BW 团队核对后，他们向我保证数据是表格格式，没有启用显式分区，所以应该没有问题。

When executing the copy activity, the files are transferred to the lake but the file name prefix setting is ignored, and the filenames instead are set automatically, as below (the name seems to be automatically made up of the SAP BW Open Hub table and the request ID):执行复制活动时，文件被传输到湖，但文件名前缀设置被忽略，而是自动设置文件名，如下所示（名称似乎自动由 SAP BW Open Hub 表和请求 ID）：

Here is the source config:这是源配置：

All other properties on the other tabs are set to default and have been unchanged.其他选项卡上的所有其他属性均设置为默认值且未更改。

QUESTION : without using a data flow, is there any way to split the files when pulling from SAP BW Open Hub and also be able to dictate the filenames in the lake?问题：在不使用数据流的情况下，从 SAP BW Open Hub 提取文件时是否有任何方法可以拆分文件，并且还能够指定湖中的文件名？

1 个解决方案

I tried to reproduce the issue and it works fine with a work around.我试图重现该问题，并且通过变通可以正常工作。 Instead of splitting the data while copying from SAP BW to Azure data lake storage, you can just simply copy the entire exact data (without partition) into the Azure SQL Database.无需在从 SAP BW 复制到 Azure 数据湖存储时拆分数据，只需将整个准确数据（无分区）复制到 Azure SQL 数据库即可。 Please follow copy data from SAP Business warehouse by using azure data factory (make sure to use Azure SQL Database as sink).请按照使用 azure 数据工厂从 SAP 业务仓库复制数据（确保使用 Azure SQL 数据库作为接收器）。

Now the data is in you Azure SQL Database, you can now simply use the copy activity to copy the data to Azure data lake storage.现在数据在您的 Azure SQL 数据库中，您现在可以简单地使用复制活动将数据复制到 Azure 数据湖存储。

In source configuration, keep “Partition option” as None.在源配置中，保持“分区选项”为无。

Source Config:源配置：