[英]Azure Data Factory removing spaces from column names of csv file
I'm a bit new to azure data factory so apologies if I'm missing anything obvious.我对 azure 数据工厂有点陌生,如果我遗漏了任何明显的东西,我深表歉意。 I've done several searches and I can't find anything that quite fits.
我已经进行了几次搜索,但找不到任何非常合适的内容。
So the situation is that we have an existing pipeline that will take the path to a csv file and pass this in as a delimited data set.所以情况是我们有一个现有的管道,它将采用 csv 文件的路径并将其作为分隔数据集传递。 As a sink it is using a parquet data set.
作为接收器,它使用镶木地板数据集。 This is a generic process that we can pass any delimited file into and it will output it as parquet.
这是一个通用过程,我们可以将任何分隔文件传递到其中,并将其输出为 parquet。
This has been working well but now we have started receiving files with spaces and special characters in the header which causes the output to parquet to fail.这一直运行良好,但现在我们已经开始接收标题中带有空格和特殊字符的文件,这会导致输出拼花失败。 Unfortunately we don't have control over the format of the files we receive so I can't handle this at source.
不幸的是,我们无法控制收到的文件的格式,因此我无法在源头上处理此问题。
What I would like to do is on ingestion of the file replace any spaces and other special characters in the header with an underscore.我想做的是在摄取文件时用下划线替换标题中的任何空格和其他特殊字符。 If I were doing this on premise I could quickly create a powershell script to do it.
如果我在内部执行此操作,我可以快速创建一个 powershell 脚本来执行此操作。 I had thought about creating a custom task in AFD to call a powershell script to do this in the blob storage but that seems more complicated than it should be.
我曾想过在 AFD 中创建一个自定义任务来调用 powershell 脚本在 blob 存储中执行此操作,但这似乎比应有的更复杂。 Is there something else I can do to get this process working while keeping it generic?
我还能做些什么来让这个过程正常工作,同时保持它的通用性?
As @Joel Cochran mentioned, you can use the below expression in Select transformation to replace space and special characters in the header.正如@Joel Cochran 提到的,您可以在选择转换中使用以下表达式来替换标题中的空格和特殊字符。
regexReplace($$,'[^a-zA-Z]','_')
Source:来源:
In Select transformation , remove the auto mappings and add new rule base mapping to use this expression.在选择转换中,删除自动映射并添加新的规则库映射以使用此表达式。
preview :预览:
You can change the output filename not directly in the Copy activity, assuming you are using this activity.假设您正在使用此活动,您可以不直接在复制活动中更改输出文件名。
The workaround is to use a parameter for the filename output that you can cleanup.解决方法是使用一个参数作为您可以清理的文件名输出。
The function could look like this:该函数可能如下所示:
@replace(@item.name, ' ', '_')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.