简体   繁体   中英

Formatting the scientific notation phone number in python

I have my phone_number column listed below.

phone_number
--------------
001 1234567890
380 1234567890
 27 1234567890
001 +11234567890
2.56898E+11
1 1234567890
123-456-7890
 +1 (123) 456-7890
(123) 456-7890
NaN

The following step worked fine

character = '[^0-9]+'
df.phone_number.str.replace(character, '')

The result I got is

phone_number
--------------
11234567890
3.80123E+12
2.71234E+11
11234567890
2.56898E+11
11234567890
1234567890
11234567890
1234567890
NaN

Is there any elegant way to deal with the scientific notation format? I want them to be 11234567890 or longer because of the country code. From there I think I can figure out how to get both international and the US phone number formats. Thanks in advance.

You can use conversion to Int64 / string dtypes:

s1 = (pd.to_numeric(df['phone_number'], errors='coerce')
        .astype('Int64').astype('string')
      )

s2 = df['phone_number'].str.replace(r'\D+', '', regex=True)

df['phone_number_clean'] = s1.fillna(s2)

print(df)

Output:

        phone_number phone_number_clean
0     001 1234567890      0011234567890
1     380 1234567890      3801234567890
2      27 1234567890       271234567890
3   001 +11234567890     00111234567890
4        2.56898E+11       256898000000
5       1 1234567890        11234567890
6       123-456-7890         1234567890
7  +1 (123) 456-7890        11234567890
8     (123) 456-7890         1234567890

Note that depending on the float precision and the way the number was converted to scientific notation, you might lose important digits.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM