简体   繁体   中英

How to get the difference between two adjacent columns in csv file using python?

I'm a beginner with python. I'm trying to get the difference between two adjacent columns in a csv file using python 2.7.

Sample input:

    Temperature        20     21     23     27 ...

    Smoke Obscuration  0.1    0.3    0.6    0.7 ...

    Carbon Dioxide     0.05   0.07   0.08   0.09 ...

    ......

    ......

I want to calculate the difference between two adjacent values and get the output like this:

    Temperature        0   1      2      4 ...

    Smoke Obscuration  0   0.2    0.3    0.1 ...

    Carbon Dioxide     0   0.02   0.01   0.01 ...

    ......

    ......

this is as far as I got:

import csv
with open("test1.csv", "rb") as f_in, open("test2.csv", "w") as f_out:
    r = csv.reader(f_in)
    w = csv.writer(f_out)
    for row in r:
        for i, v in enumerate(row):
        if i > 1:
                v = (float(row[i]) - float(row[i-1]))
        w.writerow(row)

It gave an error:

ValueError: could not convert string to float:

Could anyone help? Any guidance would be appreciated.

You may have some spacing issues with your source file, so it may be hard to reproduce your specific error. Since I do not have your original file, I copied your data from here into a text file, then re-saved it as a csv in Excel. I didn't have the error you encountered other than the wrong output. This suggests that that data can be read and written fine provided the logic is correct.

Option 1: Use the csv module

I corrected some logic mainly by making each row an iterable (ie list ), which the writerow method requires:

import csv

# with open("test1.csv", "r") as f_in, open("test2.csv", "w", newline="") as f_out: # python 3
with open("test1.csv", "r") as f_in, open("test2.csv", "wb") as f_out:            # python 2
    r = csv.reader(f_in)
    w = csv.writer(f_out)
    values = []
    for row in r:
        for i, v in enumerate(row):
            if i == 0:
                values.append(v)
            if i == 1:
                values.append(0)
            if i > 1:
                values.append(float(row[i]) - float(row[i-1]))
        w.writerow(values)
        values = []

Option 2: Use the pandas library

You can pip install pandas or ( conda install pandas if you use Anaconda ) and do this more simply:

import pandas as pd

df = pd.read_csv("test1.csv", header=None, index_col=0)
df2 = df.diff(axis=1)
df2.to_csv("test2.csv", header=False, na_rep=0)

Output csv for both options (in Excel)

在此处输入图片说明

When opened in a text editor, these outputs are comma-delimited by default. There are separate options for choosing different spacings if desired (see References).

Try these options. If you have errors, confirm that your source files are clean so that they are read correctly. For now, use print statements to verify the output you desire.

References:

  1. CSV file written with Python has blank lines between each row
  2. How do I write data to csv file in columns and rows from a list in python?
  3. How to use delimiter for csv in python
  4. delimiter - Writing a pandas to a csv file

Your input file is not an easily parsed csv file. It uses spaces to dilimit columns but also uses spaces within column zero. I don't think the csv module will help you, but you can parse the line yourself with a couple of regexes. My example works by assuming column 0 names do not include digits. If that's not true in general, its going to break.

import re

_col_0_re = re.compile(r'[^\d]+')
_col_x_re = re.compile(r'[\d\.]+')

def get_row(line):
    row = []
    line = line.strip()
    match = _col_0_re.match(line)
    if match:
        # pull out column 0 string
        row.append(line[:match.end()].strip())
        # find the remaining floats on the line
        row.extend(float(col) for col in _col_x_re.findall(line[match.end():]))
    return row

with open("test1.csv", "r") as f_in, open("test2.csv", "w") as f_out:
    for line in f_in:
        row = get_row(line)
        print(row)
        if row:
            diffs = (row[i] - row[i-1] for i in range(2, len(row)))
            diff_str = ''.join('{:10.2f}'.format(diff) for diff in diffs)
            f_out.write('{0:20}  0 {1}\n'.format(row[0], diff_str))

The output from your sample data is

Temperature           0       1.00      2.00      4.00
Smoke Obscuration     0       0.20      0.30      0.10
Carbon Dioxide        0       0.02      0.01      0.01

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM