[英]Replacing all non-alphanumeric + punctuation characters with empty strings
I'm working on a regular expression in Talend inside a tReplace component我正在 TReplace 组件内处理 Talend 中的正则表达式
I'm moving data from Oracle to Redshift and I'm having issues with DDL length because some characters are not supported (I guess)我正在将数据从 Oracle 移动到 Redshift,我遇到了 DDL 长度问题,因为不支持某些字符(我猜)
I have product names like我有像这样的产品名称
175/65 R14 Efficiency +
175/65 R14 效率 +
XXX N° 5 H7DC
XXX N° 5 H7DC
And they have to stay like this.他们必须保持这样。 But sometimes I have NBSP inside labels or even worse sometimes
但有时我的标签内有 NBSP,有时甚至更糟
I saw this list of punctuation online [,"#$%&'()*+.-:/;?<=>?@[\]^_{|}~°]我在网上看到这个标点符号列表 [,"#$%&'()*+.-:/;?<=>?@[\]^_{|}~°]
and I need to add it to my already existent Regex "[^A-Za-z0-9]"我需要将它添加到我已经存在的正则表达式“[^A-Za-z0-9]”
TLDR ; TLDR ; Can someone help me writing a REGEX to replace everything in a column except [A-Za-z0-9] and the punctuation list above?
有人可以帮我写一个 REGEX 来替换列中除 [A-Za-z0-9] 和上面的标点符号列表之外的所有内容吗? It must be able to be use in the following code (As I'm using Talend and it's java interpreted)
它必须能够在以下代码中使用(因为我正在使用 Talend 并且它是 java 解释的)
StringUtils.replaceAll(row1.label, "[^A-Za-z0-9]", "");
StringUtils.replaceAll(row1.label, "[^A-Za-z0-9]", "");
I ended up finding the solution thanks to the help of the answers above.由于上述答案的帮助,我最终找到了解决方案。
I used:我用了:
[^\p{Alnum}\p{Punct}\s]
[^\p{Alnum}\p{Punct}\s]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.