I have a text file that is formatted like this:
DEPT FTR RPT_PERIOD
Project Management "68,760.23" 12-Month
Project Management "142,483.33" 12-Month
AEC Administration "37,175.06" 12-Month
My goal is to extract the salaries in the quotation marks (under the FTR column), add them all up and find the average according to the department. When I append the salaries to a list, however, they are all in strings and I can't remove the quotation marks to convert them into floats. This is what I have so far, and I'm working on the code step by step:
salary_file = open("salaries.txt", "r")
headers = salary_file.readline()
salaries = []
for line in salary_file.readlines():
line.rstrip()
(dept, ftr, rpt_period) = line.split('\t')
salaries.append(ftr)
print salaries
#Sample output: ['"68,760.23"', '"142,483.33"', '"37,175.06"']
What can I do to remove the " " quotation marks so that I can convert them to floats using map?
You cannot convert them to floats directly because:
float()
does not understand the ,
in the number. So, remove the extra quotation and the ,
>>> salaries = ['"68,760.23"', '"142,483.33"', '"37,175.06"']
>>> [float("".join(x.replace('"', '').split(","))) for x in salaries]
[68760.23, 142483.33, 37175.06]
>>>
But maybe you should handle this when you're appending to the list:
with open("salaries.txt", "r") as salary_file:
for line in salary_file:
dept, ftr, rpt_period = line.rstrip().split("\t")
try:
salaries.append(float("".join(ftr.split(",")))
except ValueError:
# Can't convert to float, perhaps it's a comment or the header.
pass
Be careful, you must be sure that the file is actually tab-delimited.
You can just do the following to get everything but the first and last character of the string and then convert to float as follows:
new_salaries = []
for i in salaries:
i = i.replace(",", "")
new_salaries.append(float(i[1:-1]))
print new_salaries
del salaries
You can either do i[1:-1]
or you can do i.replace('"', '')
If your string is some_string = "abcdefg"
, then some_string[1:-1]
will return "bcdef"
The i[1:-1]
part gets the string from the second character (since indexing starts at 0) to the second last character. You then cast it as a float and then add it to your new list. You can then delete your old list.
>>> salaries = ['"68,760.23"', '"142,483.33"', '"37,175.06"']
>>> s = [ele.replace('"', "") for ele in salaries]
>>> s
['68,760.23', '142,483.33', '37,175.06']
>>> [float(ele.replace(",", ""))for ele in s]
[68760.23, 142483.33, 37175.06]
>>>
your Issue is, it have comma in the string. That's why before converting you need to remove comma mark. One liner I can think is,
float(x.split('"')[1].replace(',',''))
Use this line at appropriate position.
Just change the append in your code like this and you'll get a list of floats:
salary_file = open("salaries.txt", "r")
headers = salary_file.readline()
salaries = []
for line in salary_file.readlines():
line.rstrip()
(dept, ftr, rpt_period) = line.split('\t')
salaries.append(ftr.replace('"', '').replace(',', ''))
print salaries
#Sample output: [68760.23, 142483.33, 37175.06]
Or if you just want to remove the "
and ,
so that you can use map()
, see @msvalkon's answer.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.