简体   繁体   English

删除Hive中的前导特殊字符

[英]Removing leading special characters in Hive

I am trying to remove leading special characters (could be -"$&^@_) from "Persi és Levon Cnatówóeez using Hive. 我正在尝试使用Hive从"Persi és Levon Cnatówóeez删除前导特殊字符(可能是-“ $&^ @ _)。

select REGEXP_REPLACE('“Persi és Levon Cnatówóeez', '[^a-zA-Z0-9]+', '') but this removes all special characters. select REGEXP_REPLACE('“Persi és Levon Cnatówóeez', '[^a-zA-Z0-9]+', '')但这会删除所有特殊字符。

I am expecting an output similar to 我期望输出类似于

Persi és Levon Cnatówóeez

Try this: 尝试这个:

select REGEXP_REPLACE('"Persi és Levon Cnatówóeez', '[^a-zA-Z0-9\u00E0-\u00FC ]+', '');

I tried it on Hive and it replaces any character that is not a letter (a-zA-Z) a number (0-9) or an accented character (\à-\ü). 我在Hive上尝试过,它将替换不是字母(a-zA-Z),数字(0-9)或带重音符号(\\ u00E0- \\ u00FC)的任何字符。

0: jdbc:hive2://localhost:10000> select REGEXP_REPLACE('"Persi és Levon Cnatówóeez', '[^a-zA-Z0-9\u00E0-\u00FC ]+', '');
+----------------------------+--+
|            _c0             |
+----------------------------+--+
| Persi és Levon Cnatówóeez  |
+----------------------------+--+
1 row selected (0.104 seconds)
0: jdbc:hive2://localhost:10000>

From the Hive documentation: 从Hive文档中:

regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT) regexp_replace(字符串INITIAL_STRING,字符串PATTERN,字符串REPLACEMENT)

Returns the string resulting from replacing all substrings in INITIAL_STRING that match the java regular expression syntax defined in PATTERN with instances of REPLACEMENT. 返回将替换INITIAL_STRING中所有与PATTERN中定义的Java正则表达式语法匹配的子字符串替换为REPLACEMENT的实例所产生的字符串。 For example, regexp_replace("foobar", "oo|ar", "") returns 'fb.' 例如,regexp_replace(“ foobar”,“ oo | ar”,“”)返回'fb'。 Note that some care is necessary in using predefined character classes: using '\\s' as the second argument will match the letter s; 请注意,使用预定义的字符类时必须格外小心:使用'\\ s'作为第二个参数将与字母s匹配; '\\s' is necessary to match whitespace, etc. '\\ s'是匹配空格等所必需的。

Reference: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF 参考: https : //cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

You should do something like this: 您应该执行以下操作:

select REGEXP_REPLACE('“Persi és Levon Cnatówóeez', '^[\!-\/\[-\`]+', '')

I haven't Hive right know to try this code, but the idea should be correct. 我还没有Hive知道尝试这段代码的权利,但是这个想法应该是正确的。 In the second field you must put what you want to substitute, not what you want to keep in your string. 在第二个字段中,您必须放置要替换的内容,而不是要保留在字符串中的内容。 In this specific case, this should remove (substitute with empty string '') every consequent character in the beginning of the line, that is in the range from ! 在这种情况下,这应该删除(用空字符串”代替)行开头的所有后续字符,即范围从! to /, or in the range [ to ` referring to the ASCII table. 到/,或在[到`的范围内,引用ASCII表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM