Python: Concatenating and formatting strings/floats while writing to a csv

Question

I have a variable compareout that stores nested lists of data:

compareout = [...
[Guyana,951.723423,1037.123424,28.476757,2.991234],
[Bolivia,936.123420,1065.8234236,43.25123,4.62],
[Philippines,925.52342342,1119.62342341,64.70234234,6.991234123],
[Congo (Rep.),907.22342343,1657.52342349,250.1242342,27.571234123],
...]

I'm trying to:

Sort by the second column ascending, write the first 10 items of this sorted list to a .csv
Sort by the second column descending, write this first 10 items sorted list to a second .csv

However, I need to format the output so that all the floats are only 2 decimal places, and contcatenate USD to front of the second and third column values, and add a '%' sign to the end of the final column value. While I can iterate over 'compareout' and replace the last two colums like so...

for line in compareout:
    avgyrincr = (float(line[2])-float(line[1]))/3
    percent = (avgyrincr/float(line[1])) * 100
    line.append("%.2f" % avgyrincr)
    line.append("%.2f%%" % percent)

I can't do something simple like:

for line in ascending:
    line[1] = "USD %.2f" % line[1]
    line[2] = "USD %.2f" % line[2]

because this does not allow sorting. Currently I have the immediately above code occurring after I sort and write the data the first time to the first file, but of course I cannot then sort by descending...also I'm confused as to how to specify writing 10 items only...

I have googled for about an hour and can't seem to find enough information as to whether the csv.writerow() function allows formatting while writing, and I've run out of approaches. If someone could give me some ideas I would be most appreciative...

Answer 1

You can write a format function that takes an item from your list and returns the formatted line. Something like:

def format_row(row):
    result = row[:]   #make a copy of the row
    # format should be preferred over %.
    # Also, you don't have to escape the %.
    result[1] = "USD {:.2f}".format(result[1])
    result[2] = "USD {:.2f}".format(result[2])
    # do whatever else you have to do for a single row
    return result

After that you can do:

sorted_values = sorted(the_values, key=lambda x: x[1])   #sort by second column
formatted_lines = (format_row(row) for row in sorted_values[:10])
for line in formatted_lines:
    writer.writerow(line)

#[-10:] -> take last 10 elements, [::-1] reverse the order
other_lines = (format_row(row) for row in sorted_values[-10:][::-1])
for line in other_lines:
    writer.writerow(line)

Note that calling two times sorted on the compareout list will take twice the time, while using sorted_values[-10:][::-1] takes constant time, and hence is much more efficient. If you still want to use two sorts, I'd recommend doing something like:

sorted_values = sorted(the_values, key=lambda x: x[1])   #sort by second column
# ...
#use sorted_values, instead of the_values
sorted_values.sort(key=lambda x: x[1], reverse=True)
# ...

ie call .sort on the already sorted values. The sorting algorithm of the lists is really smart when dealing with partially sorted data, hence the above code will take O(nlogn) for the first sort, and only O(n) for the second:

>>> import random
>>> L = [random.randint(0, 1000) for _ in range(10000)]
>>> import timeit
>>> timeit.timeit('sorted(L)', 'from __main__ import L', number=100)
0.2509651184082031
>>> timeit.timeit('sorted(L)', 'from __main__ import L', number=100)
0.2547318935394287
>>> L.sort()
>>> timeit.timeit('sorted(L, reverse=True)', 'from __main__ import L', number=100)
0.11794304847717285
>>> timeit.timeit('sorted(L, reverse=True)', 'from __main__ import L', number=100)
0.11488604545593262

(In this simple example you could use reversed(L) , but in other situations that's not possible).

Answer 2

consider creating two distinct list objects; sort them accordingly then writerow() each row into csv file, respectively

sorted(output,key=itemgetter(0,1)) # eg, sort list object by columns 0 then 1 using operator.itemgetter()

try adding reverse=True to above sorted () function for descending sort

since you want to writerow() only 10 items for each output file, review csvreader.line_num or use a loop with csvreader. next ()

Python: Concatenating and formatting strings/floats while writing to a csv

Question

2 answers

solution1
2 ACCPTED 2013-06-01 06:26:33

solution2
0 2013-06-01 06:58:47

Python: Concatenating and formatting strings/floats while writing to a csv

Question

2 answers

solution1 2 ACCPTED 2013-06-01 06:26:33

solution2 0 2013-06-01 06:58:47

solution1
2 ACCPTED 2013-06-01 06:26:33

solution2
0 2013-06-01 06:58:47