简体   繁体   中英

How can i remove all extra characters from list of strings to convert to ints

Hi I'm pretty new to programming and Python, and this is my first post, so I apologize for any poor form.

I am scraping a website's download counts and am receiving the following error when attempting to convert the list of string numbers to integers to get the sum. ValueError: invalid literal for int() with base 10: '1,015'

I have tried .replace() but it does not seem to be doing anything.

And tried to build an if statement to take the commas out of any string that contains them: Does Python have a string contains substring method?

Here's my code:

    downloadCount = pageHTML.xpath('//li[@class="download"]/text()')
    downloadCount_clean = []

    for download in downloadCount:
        downloadCount_clean.append(str.strip(download))

    for item in downloadCount_clean:
        if "," in item:
            item.replace(",", "")
    print(downloadCount_clean)

    downloadCount_clean = map(int, downloadCount_clean)
    total = sum(downloadCount_clean)

Strings are not mutable in Python. So when you call item.replace(",", "") , the method returns what you want, but it is not stored anywhere (thus not in item ).

EDIT :

I suggest this :

for i in range(len(downloadCount_clean)):
    if "," in downloadCount_clean[i]:
        downloadCount_clean[i] = downloadCount_clean[i].replace(",", "")

SECOND EDIT :

For a bit more simplicity and/or elegance :

for index,value in enumerate(downloadCount_clean):
    downloadCount_clean[index] = int(value.replace(",", ""))

For simplicities sake:

>>> aList = ["abc", "42", "1,423", "def"]
>>> bList = []
>>> for i in aList:
...     bList.append(i.replace(',',''))
... 
>>> bList
['abc', '42', '1423', 'def']

or working just with a single list:

>>> aList = ["abc", "42", "1,423", "def"]
>>> for i, x in enumerate(aList):
...     aList[i]=(x.replace(',',''))
... 
>>> aList
['abc', '42', '1423', 'def']

Not sure if this one breaks any python rules or not :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM