简体   繁体   中英

Removing quotation marks and convert to floats

I have a text file that is formatted like this:

DEPT    FTR RPT_PERIOD
Project Management  "68,760.23" 12-Month
Project Management  "142,483.33"    12-Month
AEC Administration  "37,175.06" 12-Month

My goal is to extract the salaries in the quotation marks (under the FTR column), add them all up and find the average according to the department. When I append the salaries to a list, however, they are all in strings and I can't remove the quotation marks to convert them into floats. This is what I have so far, and I'm working on the code step by step:

salary_file = open("salaries.txt", "r")
headers = salary_file.readline()

salaries = [] 

for line in salary_file.readlines():
  line.rstrip()
  (dept, ftr, rpt_period) = line.split('\t')
  salaries.append(ftr)

print salaries
#Sample output: ['"68,760.23"', '"142,483.33"', '"37,175.06"']

What can I do to remove the " " quotation marks so that I can convert them to floats using map?

You cannot convert them to floats directly because:

  1. There's double quotation
  2. float() does not understand the , in the number.

So, remove the extra quotation and the ,

>>> salaries = ['"68,760.23"', '"142,483.33"', '"37,175.06"']
>>> [float("".join(x.replace('"', '').split(","))) for x in salaries]
[68760.23, 142483.33, 37175.06]
>>> 

But maybe you should handle this when you're appending to the list:

with open("salaries.txt", "r") as salary_file:
    for line in salary_file:
        dept, ftr, rpt_period = line.rstrip().split("\t")
        try:
            salaries.append(float("".join(ftr.split(",")))
        except ValueError:
            # Can't convert to float, perhaps it's a comment or the header.
            pass

Be careful, you must be sure that the file is actually tab-delimited.

You can just do the following to get everything but the first and last character of the string and then convert to float as follows:

new_salaries = []
for i in salaries:
    i = i.replace(",", "")
    new_salaries.append(float(i[1:-1]))
print new_salaries
del salaries

You can either do i[1:-1] or you can do i.replace('"', '')

If your string is some_string = "abcdefg" , then some_string[1:-1] will return "bcdef"

The i[1:-1] part gets the string from the second character (since indexing starts at 0) to the second last character. You then cast it as a float and then add it to your new list. You can then delete your old list.

>>> salaries = ['"68,760.23"', '"142,483.33"', '"37,175.06"']
>>> s = [ele.replace('"', "") for ele in salaries]
>>> s
['68,760.23', '142,483.33', '37,175.06']
>>> [float(ele.replace(",", ""))for ele in s]
[68760.23, 142483.33, 37175.06]
>>> 

your Issue is, it have comma in the string. That's why before converting you need to remove comma mark. One liner I can think is,

float(x.split('"')[1].replace(',',''))

Use this line at appropriate position.

Just change the append in your code like this and you'll get a list of floats:

salary_file = open("salaries.txt", "r")
headers = salary_file.readline()

salaries = [] 

for line in salary_file.readlines():
    line.rstrip()
    (dept, ftr, rpt_period) = line.split('\t')
    salaries.append(ftr.replace('"', '').replace(',', ''))

print salaries
#Sample output: [68760.23, 142483.33, 37175.06]

Or if you just want to remove the " and , so that you can use map() , see @msvalkon's answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM