从字符串中删除多余的零

Question

我想写一个正则表达式来从字符串中删除多余的零。

REGEXP_REPLACE(REGEXP_REPLACE("Input_String","^0+", ''),'0+$','')如果input_string = 120然后output_string = 12而不是120失败。

以下是预期的输入与输出：

120--> 120
12--> 12
120.00--> 120
000329.0--> 329
14.4200--> 14.42
000430--> 430 
0.24000--> 0.24
0.100--> 0.1
1.0--> 1

Answer 1

最简单的方法是使用BigDecimal ：

String stripped = new BigDecimal(input).stripTrailingZeros().toString();

编辑：这实际上不适用于000430 ：它的字符串表示是4.3E+2 。

您可以通过确保scale至少为零来解决此问题：

BigDecimal b = new BigDecimal(input).stripTrailingZeros();
if (b.scale() < 0) {
  b = b.setScale(0, RoundingMode.UNNECESSARY);
}
String stripped = b.toString();

Answer 2

正则表达式并不总是最好的工具。 在实际代码中，我会使用 Andy 的解决方案。 现在，如果你真的想用正则表达式来做，这是一种分解它的可能方法：

字符串的开头： ^
尽可能多地取 0 : 0*
此处开始捕获： (
[0-9] 尽可能多： [0-9]*
字符点（必须转义）： \\\\.
[0-9] 尽可能少： [0-9]*?
这里就结束了拍摄： )
尽可能多的 0 : 0*
字符串结尾： $

这是代码。 注意：它不处理整数，但可以用类似的方式处理它们

Pattern pattern = Pattern.compile("^0*([0-9]*\\.[0-9]*?)0*$");
Matcher matcher = pattern.matcher("010.02010");

if(matcher.find()) {
    System.out.println("group 1 : " + matcher.group(1));
}

输出：

group 1 : 10.0201

如您所见，解析为 BigDecimal 更具可读性。 此外，使用正则表达式不一定更有效。

Answer 3

如果您需要在 Hive 中执行相同操作，请使用强制转换为十进制（调整到所需的最大精度/比例）：

select cast(str as decimal(30,5)) as fixed_number
from
(--test dataset
select stack(9, 
'120',
'12',
'120.00',
'000329.0',
'14.4200',
'000430',
'0.24000',
'0.100',
'1.0'
) as str
)s;

结果：

OK
120
12
120
329
14.42
430
0.24
0.1
1
Time taken: 0.519 seconds, Fetched: 9 row(s)

Answer 4

只需将此文件 delete_ending_zeroes_udf.py 保存在 hadoop 系统中，内容如下。

delete_ending_zeroes_udf.py

import sys
import string
import re

def delete_ending_zeroes(x):
    if '.' in x:
        y = re.sub("0+$","", str(x))
        if len(y.split('.')[1])==0:
            y = y.split('.')[0]
    else:
        y = re.sub("^0+","", str(x))
    return y



while True:
    line = sys.stdin.readline()
    if not line:
        break

    line = string.strip(line, "\n ")
    Input_String = line.strip()
    outut_string = delete_ending_zeroes(Input_String)
    print("\t".join([Input_String, outut_string]))

#

并在hive编译中编写如下代码

add file hdfs:///delete_ending_zeroes_udf.py; SELECT TRANSFORM (Input_String) USING 'python delete_ending_zeroes_udf.py' AS (outut_string string) FROM <your_hive_table>

参考： https : //acadgild.com/blog/hive-udf-python

从字符串中删除多余的零

问题描述

4 个解决方案

解决方案1
3 2019-05-17 07:58:33

解决方案2
1 2019-05-17 08:24:47

解决方案3
1 2019-05-17 08:46:57

解决方案4
0 2019-05-17 08:05:47

delete_ending_zeroes_udf.py

从字符串中删除多余的零

问题描述

4 个解决方案

解决方案1 3 2019-05-17 07:58:33

解决方案2 1 2019-05-17 08:24:47

解决方案3 1 2019-05-17 08:46:57

解决方案4 0 2019-05-17 08:05:47

delete_ending_zeroes_udf.py

解决方案1
3 2019-05-17 07:58:33

解决方案2
1 2019-05-17 08:24:47

解决方案3
1 2019-05-17 08:46:57

解决方案4
0 2019-05-17 08:05:47