I'm a beginner with python. I'm trying to get the difference between two adjacent columns in a csv file using python 2.7.
Sample input:
Temperature 20 21 23 27 ...
Smoke Obscuration 0.1 0.3 0.6 0.7 ...
Carbon Dioxide 0.05 0.07 0.08 0.09 ...
......
......
I want to calculate the difference between two adjacent values and get the output like this:
Temperature 0 1 2 4 ...
Smoke Obscuration 0 0.2 0.3 0.1 ...
Carbon Dioxide 0 0.02 0.01 0.01 ...
......
......
this is as far as I got:
import csv
with open("test1.csv", "rb") as f_in, open("test2.csv", "w") as f_out:
r = csv.reader(f_in)
w = csv.writer(f_out)
for row in r:
for i, v in enumerate(row):
if i > 1:
v = (float(row[i]) - float(row[i-1]))
w.writerow(row)
It gave an error:
ValueError: could not convert string to float:
Could anyone help? Any guidance would be appreciated.
You may have some spacing issues with your source file, so it may be hard to reproduce your specific error. Since I do not have your original file, I copied your data from here into a text file, then re-saved it as a csv
in Excel. I didn't have the error you encountered other than the wrong output. This suggests that that data can be read and written fine provided the logic is correct.
Option 1: Use the csv
module
I corrected some logic mainly by making each row an iterable (ie list
), which the writerow
method requires:
import csv
# with open("test1.csv", "r") as f_in, open("test2.csv", "w", newline="") as f_out: # python 3
with open("test1.csv", "r") as f_in, open("test2.csv", "wb") as f_out: # python 2
r = csv.reader(f_in)
w = csv.writer(f_out)
values = []
for row in r:
for i, v in enumerate(row):
if i == 0:
values.append(v)
if i == 1:
values.append(0)
if i > 1:
values.append(float(row[i]) - float(row[i-1]))
w.writerow(values)
values = []
Option 2: Use the pandas
library
You can pip install pandas
or ( conda install pandas
if you use Anaconda ) and do this more simply:
import pandas as pd
df = pd.read_csv("test1.csv", header=None, index_col=0)
df2 = df.diff(axis=1)
df2.to_csv("test2.csv", header=False, na_rep=0)
Output csv
for both options (in Excel)
When opened in a text editor, these outputs are comma-delimited by default. There are separate options for choosing different spacings if desired (see References).
Try these options. If you have errors, confirm that your source files are clean so that they are read correctly. For now, use print
statements to verify the output you desire.
References:
Your input file is not an easily parsed csv file. It uses spaces to dilimit columns but also uses spaces within column zero. I don't think the csv module will help you, but you can parse the line yourself with a couple of regexes. My example works by assuming column 0 names do not include digits. If that's not true in general, its going to break.
import re
_col_0_re = re.compile(r'[^\d]+')
_col_x_re = re.compile(r'[\d\.]+')
def get_row(line):
row = []
line = line.strip()
match = _col_0_re.match(line)
if match:
# pull out column 0 string
row.append(line[:match.end()].strip())
# find the remaining floats on the line
row.extend(float(col) for col in _col_x_re.findall(line[match.end():]))
return row
with open("test1.csv", "r") as f_in, open("test2.csv", "w") as f_out:
for line in f_in:
row = get_row(line)
print(row)
if row:
diffs = (row[i] - row[i-1] for i in range(2, len(row)))
diff_str = ''.join('{:10.2f}'.format(diff) for diff in diffs)
f_out.write('{0:20} 0 {1}\n'.format(row[0], diff_str))
The output from your sample data is
Temperature 0 1.00 2.00 4.00
Smoke Obscuration 0 0.20 0.30 0.10
Carbon Dioxide 0 0.02 0.01 0.01
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.