简体   繁体   中英

In Python, removing thousands comma from numbers in a list where the numbers are separated by commas

I have a list of data similar to that below:

a = ['"105', '424"', '"102', '629"', '"104', '307"']

I want this data to be in a form similar to that of below:

a = ['105424', '102629', '104307']

I am unsure of how to proceed. I thought perhaps removing all the commas then inserting commas only where they should be and then removing the quotations. I am finding this to be quite challenging.

I'm assuming this data was originally in a csv file where data that contains commas is quoted ("105,424","102,629","104,307") and then you are splitting on comma:

>>> '"105,424","102,629","104,307"'.split(',')
['"105', '424"', '"102', '629"', '"104', '307"']

Rather you should let the csv module do the work as it will handle the double quotes:

import csv

with open('u:\\foobar.csv', 'rb') as f:
    reader = csv.reader(f)
    for row in reader:
        print [x.replace(',','') for x in row]

This prints: ['105424', '102629', '104307']

If the source data is CSV, you should use @steven's answer.

Regardless, here's how you could process what you pasted.

As @troutwine stated, this will only work if the number parts are always in pairs.

a = ['"105', '424"', '"102', '629"', '"104', '307"']

from itertools import izip

def pairwise(iterable):
    "s -> (s0,s1), (s2,s3), (s4, s5), ..."
    a = iter(iterable)
    return izip(a, a)

result = []

for x, y in pairwise(a):
    result.append(''.join([x, y]).strip('"'))

print result

Gives:

['105424', '102629', '104307']

Pairwise snippet from here: Iterating over every two elements in a list

If you'll never have an unmatched pair, loop over a range 1/2 the size of the input list, mash the current index plus the next together, do a string substitution and skip to the current index plus two.

Does your data look something like:

"123", "123,456", "123,456,789"

If so then try this

input = '"123", "123,456", "123,456,789"'

import re

reg = re.compile('"(\d{1,3}(,\d{3})*)"')

stringValues = [wholematch.replace(',', '') for wholematch, _endmatch 
                                                    in reg.findall(input)]

This regex should also work on thousands with decimal places as well.

re.compile('"(\d{1,3}(,\d{3})*(\.\d*)?)"')

Reduce to the rescue:

l = ['"105', '424"', '"102', '629"', '"104', '307"', '"123', '456', '789"', '"123"']

# Concatenate everything and split by ", get non-empties
l2 = [num for num in reduce(lambda x, y: x+y, l).split('"') if num != '']

# Output:
# ['105424', '102629', '104307', '123456789', '123']
print l2

Few caveats though: This code can do numbers beyond thousands (ie, 1,457,664), but also assumes that the whole number was double-quoted.

As others have said though, you should revisit your data retrieval as there are most likely ways to get the values correctly without dealing with the double-quotes. This was a fun little challenge nonetheless.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM