简体   繁体   中英

remove extra zeros from string

I would like write a regex expression to remove extra zeros from a string.

REGEXP_REPLACE(REGEXP_REPLACE("Input_String","^0+", ''),'0+$','') fails if input_string = 120 then output_string = 12 instead of 120 .

Below is the expected input vs output:

120--> 120
12--> 12
120.00--> 120
000329.0--> 329
14.4200--> 14.42
000430--> 430 
0.24000--> 0.24
0.100--> 0.1
1.0--> 1

The easiest way is to use BigDecimal :

String stripped = new BigDecimal(input).stripTrailingZeros().toString();

Edit: this doesn't actually work for 000430 : the string representation of that is 4.3E+2 .

You can fix this by ensuring that the scale is at least zero:

BigDecimal b = new BigDecimal(input).stripTrailingZeros();
if (b.scale() < 0) {
  b = b.setScale(0, RoundingMode.UNNECESSARY);
}
String stripped = b.toString();

Regex are not always the best tool for that. In a real code, I would use Andy's solution. Now, if you really want to do it with a regex, here is one possible way to decompose it :

  • beginning of the string : ^
  • take as much 0 as possible : 0*
  • Start capture here : (
  • [0-9] as much as possible : [0-9]*
  • character dot (must be escaped) : \\\\.
  • [0-9] as few as possible : [0-9]*?
  • End capture here : )
  • as much 0 as possible : 0*
  • end of the string : $

Here is the code. Note : it does not handle integers but they can be handled in a similar way

Pattern pattern = Pattern.compile("^0*([0-9]*\\.[0-9]*?)0*$");
Matcher matcher = pattern.matcher("010.02010");

if(matcher.find()) {
    System.out.println("group 1 : " + matcher.group(1));
}

Output :

group 1 : 10.0201

As you can see, parsing to a BigDecimal is more readable. Also, using a regex is not necessarily more efficient.

If you need to do the same in Hive, use cast as decimal (adjust to maximum required precision/scale):

select cast(str as decimal(30,5)) as fixed_number
from
(--test dataset
select stack(9, 
'120',
'12',
'120.00',
'000329.0',
'14.4200',
'000430',
'0.24000',
'0.100',
'1.0'
) as str
)s;

Result:

OK
120
12
120
329
14.42
430
0.24
0.1
1
Time taken: 0.519 seconds, Fetched: 9 row(s)

just save this file delete_ending_zeroes_udf.py in hadoop system with the following content.

delete_ending_zeroes_udf.py

import sys
import string
import re

def delete_ending_zeroes(x):
    if '.' in x:
        y = re.sub("0+$","", str(x))
        if len(y.split('.')[1])==0:
            y = y.split('.')[0]
    else:
        y = re.sub("^0+","", str(x))
    return y



while True:
    line = sys.stdin.readline()
    if not line:
        break

    line = string.strip(line, "\n ")
    Input_String = line.strip()
    outut_string = delete_ending_zeroes(Input_String)
    print("\t".join([Input_String, outut_string]))
#

And write the following code in hive compilation

add file hdfs:///delete_ending_zeroes_udf.py; SELECT TRANSFORM (Input_String) USING 'python delete_ending_zeroes_udf.py' AS (outut_string string) FROM <your_hive_table>

reference: https://acadgild.com/blog/hive-udf-python

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM