简体   繁体   English

Azure 数据工厂从 csv 文件的列名中删除空格

[英]Azure Data Factory removing spaces from column names of csv file

I'm a bit new to azure data factory so apologies if I'm missing anything obvious.我对 azure 数据工厂有点陌生,如果我遗漏了任何明显的东西,我深表歉意。 I've done several searches and I can't find anything that quite fits.我已经进行了几次搜索,但找不到任何非常合适的内容。

So the situation is that we have an existing pipeline that will take the path to a csv file and pass this in as a delimited data set.所以情况是我们有一个现有的管道,它将采用 csv 文件的路径并将其作为分隔数据集传递。 As a sink it is using a parquet data set.作为接收器,它使用镶木地板数据集。 This is a generic process that we can pass any delimited file into and it will output it as parquet.这是一个通用过程,我们可以将任何分隔文件传递到其中,并将其输出为 parquet。

This has been working well but now we have started receiving files with spaces and special characters in the header which causes the output to parquet to fail.这一直运行良好,但现在我们已经开始接收标题中带有空格和特殊字符的文件,这会导致输出拼花失败。 Unfortunately we don't have control over the format of the files we receive so I can't handle this at source.不幸的是,我们无法控制收到的文件的格式,因此我无法在源头上处理此问题。

What I would like to do is on ingestion of the file replace any spaces and other special characters in the header with an underscore.我想做的是在摄取文件时用下划线替换标题中的任何空格和其他特殊字符。 If I were doing this on premise I could quickly create a powershell script to do it.如果我在内部执行此操作,我可以快速创建一个 powershell 脚本来执行此操作。 I had thought about creating a custom task in AFD to call a powershell script to do this in the blob storage but that seems more complicated than it should be.我曾想过在 AFD 中创建一个自定义任务来调用 powershell 脚本在 blob 存储中执行此操作,但这似乎比应有的更复杂。 Is there something else I can do to get this process working while keeping it generic?我还能做些什么来让这个过程正常工作,同时保持它的通用性?

As @Joel Cochran mentioned, you can use the below expression in Select transformation to replace space and special characters in the header.正如@Joel Cochran 提到的,您可以在选择转换中使用以下表达式来替换标题中的空格和特殊字符。

regexReplace($$,'[^a-zA-Z]','_')

Source:来源:

在此处输入图片说明

In Select transformation , remove the auto mappings and add new rule base mapping to use this expression.选择转换中,删除自动映射并添加新的规则库映射以使用此表达式。

在此处输入图片说明

preview :预览

在此处输入图片说明

You can change the output filename not directly in the Copy activity, assuming you are using this activity.假设您正在使用此活动,您可以不直接在复制活动中更改输出文件名。

The workaround is to use a parameter for the filename output that you can cleanup.解决方法是使用一个参数作为您可以清理的文件名输出。

  1. You can use the Get Metadata activity to get all filenames from the source csv files.您可以使用获取元数据活动从源 csv 文件中获取所有文件名。
  2. Then loop over these files with a foreach activity.然后使用 foreach 活动遍历这些文件。
  3. Within the foreach activity you can set the output filename with the new name with the cleaned value.在 foreach 活动中,您可以使用带有清除值的新名称设置输出文件名。

The function could look like this:该函数可能如下所示:

@replace(@item.name, ' ', '_')

More information on the replace function 有关替换功能的更多信息

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 Azure 数据工厂将数据导出到 csv 文件时从列中删除多余的逗号 - Removing extra comma from a column while exporting the data into csv file using Azure Data Factory 从另一个 CSV 文件(Azure 数据工厂)向 CSV 文件添加列 - Add column to CSV File from another CSV File (Azure Data Factory) 将文件名从 Foreach 传递到数据流 - Azure 数据工厂 - Passing File names from Foreach to Data Flow - Azure Data Factory 从 Azure 数据工厂中的 FTP 服务器中删除 CSV 文件 - Delete CSV File from FTP server in Azure Data Factory Azure 数据工厂创建一个空的 csv 文件 - Azure Data Factory to create an empty csv file csv 文件数据集的 getMetadata 活动中的结构显示 azure 数据工厂中 integer 列的字符串数据类型 - structure in getMetadata activity for csv file dataset show string datatypes for integer column in azure data factory 如何在没有 Azure 数据工厂的情况下将 csv 文件从 blob 存储加载到 azure sql 数据库 - How to load csv file from blob storage to azure sql database without Azure Data Factory 使用 Azure 数据工厂从 SFTP 获取 ZIP 文件到 Azure Datalake 并将其存储为 ZCC8D68D68C551C4ADEAFDE6 格式 - Get ZIP file from SFTP to Azure Datalake using Azure data factory and store it into CSV format 使用Azure Data Factory从Excel或CSV文件复制数据并在输入文件上执行转换 - Copy data with Azure Data Factory from Excel or CSV File and perform transformation on the input file Azure 数据工厂将一个 CSV 文件拆分为多个 CSV 文件? - Azure data factory to split a CSV file into multiple CSV files?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM