简体繁体 English

Pentaho 勺子搜索并替换行中的特殊字符

[英]Pentaho spoon search and replace especial character in rows

原文 2022-05-06 20:29:54 6 1 data-cleaning/ data-warehouse/ pentaho-data-integration/ stage/ spoon

I have a csv file with mime type US-ASCII and one column in the dataset look like this:我有一个 csv 文件，MIME 类型为 US-ASCII，数据集中的一列如下所示：

id ID	V_name V_name
210001 210001	cha?ne des Puys cha?ne des Puys
210030 210030	M?los米洛斯
213004 213004	G?ll?会吗？
213021 213021	S?phan沙凡
221110 221110	Afd?ra阿夫德拉

And so on.等等。

I would like to change those characters to:我想将这些字符更改为：

id ID	V_name V_name
210001 210001	chaine des Puys链德普伊斯
210030 210030	Milos米洛斯
213004 213004	Gollu咕噜
213021 213021	Suphan素攀
221110 221110	Afdera阿夫德拉

The thing is that there are 95 rows of this kind, how can I search and replace those rows?问题是有 95 行这样的行，我该如何搜索和替换这些行？ I using the suite PDI spoon.我使用套件 PDI 勺子。 Thanks in advance.提前致谢。

1 个解决方案

As @Iłya Bursov has stated, the source file you are reading doesn't provide the correct characters, it is providing the?正如@Iłya Bursov 所说，您正在阅读的源文件没有提供正确的字符，它提供了？ in the source, so if you want to correct it, you'll have to do it manually.在源代码中，所以如果你想更正它，你必须手动完成。

I don't think it is worth it, unless you know you are going to get always the same set of V_name over time and different files.我认为这不值得，除非您知道随着时间的推移您将始终获得同一组 V_name 和不同的文件。 In that case you could create a file to correlate the V_name in your source with the?在那种情况下，您可以创建一个文件来将源代码中的V_name与？ characters to a V_name_corrected with the correct display for the characters.字符到V_name_corrected并正确显示字符。 This seems to be an exercise, so I would let the names as they are.这似乎是一个练习，所以我会让名字保持原样。 In real life, I would contact the provider of the file with the incorrect character set to tell them that they need to correct the generation of the file to provide the correct characters in the file.在现实生活中，我会联系错误字符集文件的提供者，告诉他们需要更正文件的生成以提供文件中的正确字符。