Removing quotation marks and convert to floats

Question

I have a text file that is formatted like this:

DEPT    FTR RPT_PERIOD
Project Management  "68,760.23" 12-Month
Project Management  "142,483.33"    12-Month
AEC Administration  "37,175.06" 12-Month

My goal is to extract the salaries in the quotation marks (under the FTR column), add them all up and find the average according to the department. When I append the salaries to a list, however, they are all in strings and I can't remove the quotation marks to convert them into floats. This is what I have so far, and I'm working on the code step by step:

salary_file = open("salaries.txt", "r")
headers = salary_file.readline()

salaries = [] 

for line in salary_file.readlines():
  line.rstrip()
  (dept, ftr, rpt_period) = line.split('\t')
  salaries.append(ftr)

print salaries
#Sample output: ['"68,760.23"', '"142,483.33"', '"37,175.06"']

What can I do to remove the " " quotation marks so that I can convert them to floats using map?

Answer 1

You cannot convert them to floats directly because:

There's double quotation
float() does not understand the , in the number.

So, remove the extra quotation and the ,

>>> salaries = ['"68,760.23"', '"142,483.33"', '"37,175.06"']
>>> [float("".join(x.replace('"', '').split(","))) for x in salaries]
[68760.23, 142483.33, 37175.06]
>>>

But maybe you should handle this when you're appending to the list:

with open("salaries.txt", "r") as salary_file:
    for line in salary_file:
        dept, ftr, rpt_period = line.rstrip().split("\t")
        try:
            salaries.append(float("".join(ftr.split(",")))
        except ValueError:
            # Can't convert to float, perhaps it's a comment or the header.
            pass

Be careful, you must be sure that the file is actually tab-delimited.

Answer 2

You can just do the following to get everything but the first and last character of the string and then convert to float as follows:

new_salaries = []
for i in salaries:
    i = i.replace(",", "")
    new_salaries.append(float(i[1:-1]))
print new_salaries
del salaries

You can either do i[1:-1] or you can do i.replace('"', '')

If your string is some_string = "abcdefg" , then some_string[1:-1] will return "bcdef"

The i[1:-1] part gets the string from the second character (since indexing starts at 0) to the second last character. You then cast it as a float and then add it to your new list. You can then delete your old list.

Answer 3

>>> salaries = ['"68,760.23"', '"142,483.33"', '"37,175.06"']
>>> s = [ele.replace('"', "") for ele in salaries]
>>> s
['68,760.23', '142,483.33', '37,175.06']
>>> [float(ele.replace(",", ""))for ele in s]
[68760.23, 142483.33, 37175.06]
>>>

Answer 4

your Issue is, it have comma in the string. That's why before converting you need to remove comma mark. One liner I can think is,

float(x.split('"')[1].replace(',',''))

Use this line at appropriate position.

Answer 5

Just change the append in your code like this and you'll get a list of floats:

salary_file = open("salaries.txt", "r")
headers = salary_file.readline()

salaries = [] 

for line in salary_file.readlines():
    line.rstrip()
    (dept, ftr, rpt_period) = line.split('\t')
    salaries.append(ftr.replace('"', '').replace(',', ''))

print salaries
#Sample output: [68760.23, 142483.33, 37175.06]

Or if you just want to remove the " and , so that you can use map() , see @msvalkon's answer.

Removing quotation marks and convert to floats

Question

5 answers

solution1
1 ACCPTED 2014-03-21 09:00:53

solution2
0 2014-03-21 08:56:15

solution3
0 2014-03-21 09:08:42

solution4
0 2014-03-21 09:09:14

solution5
0 2014-03-21 09:11:03

Removing quotation marks and convert to floats

Question

5 answers

solution1 1 ACCPTED 2014-03-21 09:00:53

solution2 0 2014-03-21 08:56:15

solution3 0 2014-03-21 09:08:42

solution4 0 2014-03-21 09:09:14

solution5 0 2014-03-21 09:11:03

solution1
1 ACCPTED 2014-03-21 09:00:53

solution2
0 2014-03-21 08:56:15

solution3
0 2014-03-21 09:08:42

solution4
0 2014-03-21 09:09:14

solution5
0 2014-03-21 09:11:03