简体   繁体   中英

how to replace non-numeric or decimal in string in pandas

I have a column with values in degrees with the degree sign.

42.9377º
42.9368º
42.9359º
42.9259º
42.9341º

The digit 0 should replace the degree symbol

I tried using regex or str.replace but I can't figure out the exact unicode character.

The source xls has it as º

the error shows it as an obelus ÷

printing the dataframe shows it as ?

the exact position of the degree sign may vary, depending on rounding of the decimals, so I can't replace using exact string position.

Use str.replace :

df['a'] = df['a'].str.replace('º', '0')
print (df)
          a
0  42.93770
1  42.93680
2  42.93590
3  42.92590
4  42.93410

#check hex format of char
print ("{:02x}".format(ord('º')))
ba

df['a'] = df['a'].str.replace(u'\xba', '0')
print (df)
          a
0  42.93770
1  42.93680
2  42.93590
3  42.92590
4  42.93410

Solution with extract floats .

df['a'] = df['a'].str.extract('(\d+\.\d+)', expand=False) + '0'
print (df)
          a
0  42.93770
1  42.93680
2  42.93590
3  42.92590
4  42.93410

Or if all last values are º is possible use indexing with str :

df['a'] = df['a'].str[:-1] + '0'
print (df)
          a
0  42.93770
1  42.93680
2  42.93590
3  42.92590
4  42.93410

If you know that it's always the last character you could remove that character and append a "0".

s = "42.9259º"

s = s[:-1]+"0"

print(s) # 42.92590

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM