简体   繁体   中英

Remove all occurrences of dot, except for the first occurence in the elements of a list

I have a list that I need to convert to floats. As the data is not inputted by me, there are elements that have an accidental extra period, for example 39.04.1450 . I need to be able to automatically remove all of the periods except for the first one that appears so that I don't get an error when I say list=float(list) .

Sample list:

latitude= [' -86.57', ' 39.04.1450', ' 37.819' ,' 45.82', ' 54.42', ' 0.' ,' 53.330444',
  ' +45.75' ,' 52.36', ' 43.2167', ' -36.75', ' 6.8N' ,' 40.833' ,' -97.981',
  ' 41.720', ' 41.720', ' 37.41' ,' 37.41' ,' 37.41', ' 37.41']

As you can see, latitude[1] has an extra decimal point. Of course, I will also need to strip the N in 6.8N but that is a separate problem.

I would do it like this:

def fix_float(s):
    return s.replace('.', '[DOT]', 1).replace('.', '').replace('[DOT]', '.')

The function replaces the first occurrence of '.' with '[DOT]' . Then, it removes all the ocurrences of '.' . Finally, it replaces '[DOT]' back to '.' .

To apply it to all the elements of your list, write:

fixed_latitudes = [fix_float(s) for s in latitude]
def my_float(s):
    s=s.split(".")
    return float(".".join([s[0],"".join(s[1:]))])

will split on . and rejoin only adding the first period ... it does not however do anything about -6.8N

You can use regular expressions :

import re  

pattern = re.compile(r'(\d+\.\d+)\.')
new_lst = [re.sub(pattern, r'\1', i).replace('N', '') for i in latitude]

\\d means any digit, + means one or more, \\. matches the dot character. The parenthesis is capturing that part of the match, and later on in the sub() is used as \\1 (meaning first capturing group).

A small hack if your corrupted data contains only N at the end and more than one . ... Else you've gotta add more except clauses

latitude = [' -86.57', ' 39.04.1450', ' 37.819', ' 45.82', ' 54.42', ' 0.', ' 53.330444', ' +45.75', ' 52.36', ' 43.2167', ' -36.75', ' 6.8N', ' 40.833', ' -97.981', ' 41.720', ' 41.720', ' 37.41', ' 37.41', ' 37.41', ' 37.41']
flist = []
for i in latitude:
    try:
        flist.append(float(i))
    except ValueError:
        if (i[-1] == 'N'):
            flist.append(float(i[:-1]))
        else:
            flist.append(float("{}.{}".format(i.split(".")[0],''.join(i.split(".")[1:]))))

print (flist)

Output

[-86.57, 39.04145, 37.819, 45.82, 54.42, 0.0, 53.330444, 45.75, 52.36, 43.2167, -36.75, 6.8, 40.833, -97.981, 41.72, 41.72, 37.41, 37.41, 37.41, 37.41]

You can use regular expression to extract the numbers out of the list and convert them to floats right away.

import re
lat = lambda l: float(re.search('[+-]*\d*\.\d*',l).group(0))
print map(lat,latitude)

edit:
Sorry, I haven't noticed, the digits following second decimal point are also valid. A new solution still expects the first dot is OK and all the rest are to be removed.

One of the values contain N, so I suppose there might be also S which means it's southern, ie negative latitude. Therefore I implemented this assumption into code.

def valid_lat(s): a = re.findall('\\s*[+-]*\\d*\\.\\d*',s)[0] b = s.lstrip(a) d = b.replace('.','') c = re.sub('[nNsS]$','',d) sign = 1. if re.match('[sS]$',d):sign = -1. return (float(a + c))*sign

Then just map it:
map(valid_lat,latitude)

What about this one?

def lol_float(_str):
    # check where decimal point is (starting from right) '3.45' -> 2
    dpi = (len(_str) - _str.count('.') - _str.index('.')) if '.' in _str else 0
    # '3.45' -> 345.0
    float_as_int = float(filter(lambda x: x.isdigit(), _str))
    # dpi = 2, float_as_int = 34.0 -> 3.45
    return float_as_int / (10 ** dpi)

Output:

>>> lol_float('3.34')
3.34
>>> lol_float('3.45')
3.45
>>> lol_float('345')
345.0
>>> lol_float('34.5')
34.5
>>> lol_float('3.4.5')
3.45
>>> lol_float('3.45')
3.45
>>> lol_float('345')
345.0
>>> lol_float('3.4..5')
3.45
>>> lol_float('3.4..5.4')
3.454

Just being original... :)

You can remove any letters using str.rstrip :

from string import ascii_letters

out = []
for x in latitude:
    x = x.rstrip(ascii_letters)
    spl = x.split(".")
    if len(spl) > 2:
        out.append(float("{}.{}".format(spl[0],"".join(spl[1:]))))
    else:
        out.append(float(x)))
print(out)

[-86.57, 39041450.0, 37.819, 45.82, 54.42, 0.0, 53.330444, 45.75, 52.36, 43.2167, -36.75, 6.8, 40.833, -97.981, 41.72, 41.72, 37.41, 37.41, 37.41, 37.41]

You can do it in a single list comp but less efficiently:

print([float(x[::-1].rstrip(ascii_letters).replace(".","")[::-1]) if x.count(".") > 1 else float(x.rstrip(ascii_letters)) for x in latitude ])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM