简体   繁体   English

如何在Informatica PowerCenter中使用单个映射处理具有不同但相似格式的多个文件?

[英]How do I use a Single Mapping to Handle Multiple Files with Different but Similar Formats in Informatica PowerCenter?

I have some files that I would like to consolidate into a single database table. 我有一些文件要合并到一个数据库表中。 The files have similar but different formats. 文件具有相似但不同的格式。 The files look something like this: 这些文件如下所示:

FileOne: FileOne:

  • ColA : string ColA:字符串
  • ColB : string ColB:字符串
  • ColC : string ColC:字符串

FileTwo: FileTwo传送:

  • ColAA : string ColAA:字符串
  • ColBB : string ColBB:字符串
  • ColCC : string ColCC:字符串

FileThree: FileThree:

  • Col01 : string col01:字符串
  • Col02 : string Col02:字符串
  • Col03 : string col03:字符串

The destination table looks like this: 目标表如下所示:

TableDestination: TableDestination:

  • ColFirst : string ColFirst:字符串
  • ColSecond : string ColSecond:字符串
  • ColThird : string ColThird:字符串

I want to develop a mapping that ETLs these three files into this one database, but because the column names are different, it looks like I'll have to develop three different mappings, or three different sources, or three different somethings. 我想开发一种映射,将这三个文件ETL导入到这个数据库中,但是由于列名不同,因此看来我将不得不开发三种不同的映射,三种不同的源或三种不同的东西。 The problem is that my example is contrived: I actually have many different files that all have different formats and column names, but the data is all very similar. 问题在于我的示例是人为设计的:实际上我有许多不同的文件,它们都有不同的格式和列名,但是数据非常相似。

I would like to develop a single mapping or workflow that can handle all of this by only adding a table that holds the column mappings. 我想开发一个单独的映射或工作流,只需添加一个保存列映射的表即可处理所有这些。 Such a table would look like this based on the sample files and sample table above: 根据上面的示例文件和示例表,这样的表将如下所示:

TableMappings: 使用TableMappings:

使用TableMappings

In this way, to edit a column mapping I only have to make an edit this this TableMappings table. 这样,要编辑列映射,我只需要对此TableMappings表进行编辑。 I wouldn't have to make any changes at all to the mapping or workflow. 我完全不需要对映射或工作流进行任何更改。 Nor would I have to redeploy an application. 我也不必重新部署应用程序。

What would a mapping or workflow look like that could take advantage of something like this? 可以利用这样的映射或工作流看起来像什么? I assume there'd be a flat file source that takes files from a folder. 我假设会有一个平面文件源,它从文件夹中获取文件。 There would be something in the middle that uses this TableMappings table to map column names. 中间将使用此TableMappings表映射列名称。 Finally there would be a relational data object that represents my destination database table "TableDestination". 最后,将有一个关系数据对象,它代表我的目标数据库表“ TableDestination”。 I don't know how to put this together though. 我不知道如何将它们放在一起。

With Flat Files as source, column names are not important. 使用Flat Files作为源,列名并不重要。 It doesn't even matter if the column count is matching. 列数是否匹配甚至都没有关系。 If the actual file will have more columns then Source Definition , only the first n columns will be read (with n being the number of ports in Source Definition ). 如果实际文件中的列将比Source Definition ,则仅读取前n列(其中n是Source Definition的端口数)。 In the opposite situation, the extra ports will contain null values. 在相反的情况下,额外的端口将包含空值。

Having said that, loading multiple flat files is easy. 话虽如此,加载多个平面文件很容易。

The problem would be if the column order is different and you want this additional static table that would define the column mapping. 问题可能是,如果列顺序不同,并且您想要此额外的静态表来定义列映射。 This is doable, ie Java Transformation can be used to do the column mapping. 这是可行的,即可以使用Java Transformation进行列映射。 But the whole solution is too complex for me to describe it here. 但是整个解决方案太复杂了,我无法在此处进行描述。 I can try to answewr some precise, specific quesitons - but I'm not able to prepare and paste here the complete solution. 我可以尝试回答一些精确的特定问题-但我无法在此处准备和粘贴完整的解决方案。

This can also be done using expression. 也可以使用表达式来完成。 You'd need a "generic" Source Definition (eg Column1 , Column2 , ..., ColumnN ) and in each port an expression that checks which port should be returned. 您将需要一个“通用” Source Definition (例如Column1Column2 ,..., ColumnN ),并在每个端口中使用一个表达式来检查应返回哪个端口。 Eg 例如

DECODE (SUBSTR(TargetColumnOrder,X,1), '1', Column1, '2', Column2, ... 'N', ColumnN)

with X being the port index. X是端口索引。

The above assumes a bit different structure of the mappings table: 上面假设映射表的结构有些不同:

FileName | TargetColumnOrder
----------------------------
FileOne  | 231
FileTwo  | 527

Note 1: If there can be different number of columns, you need to check if Length(TargetColumnOrder) is not less than port index, otherwise SUBSTRING will not work. 注意1:如果可以有不同的列数,则需要检查Length(TargetColumnOrder)是否不小于端口索引,否则SUBSTRING将不起作用。

Note 2: The above solution has not been tested or even implemented. 注意2:以上解决方案尚未经过测试或实施。 Please treat this as a general description rather than exact code base. 请将此作为一般描述而不是确切的代码库。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM