如何在 python 中生成包含多个 csv 文件的最后一行和第一行的差异的报告？

Question

I have multiple csv files in the below format:我有以下格式的多个 csv 文件：

CSV File 1: CSV 文件 1：

CSV File 2: CSV 文件 2：

The report needs to be generated containing the difference of last and first row of each csv file as below:需要生成的报告包含每个 csv 文件的最后一行和第一行的差异，如下所示：

The below code calculates the difference between the last and first row.下面的代码计算最后一行和第一行之间的差异。 How do we write the results into a separate file in the report format specified above?我们如何将结果以上面指定的报告格式写入单独的文件中？

def csv_subtract():
    # this is the extension you want to detect
    extension = '.csv'
    for root, dirs_list, files_list in os.walk(csv_file_path):
        for f in files_list:
            if os.path.splitext(f)[-1] == extension:
                file_name_path = os.path.join(root, f)
                df = pd.read_csv(file_name_path)
                diff_row = (df.iloc[len(df.index) - 1] - df.iloc[0]).to_frame()

Answer 1

Using your code使用你的代码

def csv_subtract():
    # this is the extension you want to detect
    extension = '.csv'
    for root, dirs_list, files_list in os.walk(csv_file_path):
        df_dict = {}
        for f in files_list:
            if os.path.splitext(f)[-1] == extension:
                file_name_path = os.path.join(root, f)
                df = pd.read_csv(file_name_path)
                # simplified indexing
                # diff_row = (df.iloc[len(df.index) - 1] - df.iloc[0]).to_frame()  # old
                diff_row = (df.iloc[-1] - df.iloc[0]).to_frame()  # new
                df_dict[f] = diff_row
    
        out = pd.concat(df_dict, names=["File Name"])
        out.to_csv("path/to/report.csv")

Another Approach另一种方法

Concatenate all the data upon read, groupby the file names, and calculate the differences within each group.将读取的所有数据连接起来，按文件名分组，并计算每组内的差异。

import numpy as np
import pandas as pd


if __name__ == "__main__":
    # some fake data for setup
    np.random.seed(1)
    df1 = pd.DataFrame(
        data=np.random.random(size=(5, 5)),
        columns=list("abcde")
    )
    np.random.seed(2)
    df2 = pd.DataFrame(
        data=np.random.random(size=(5, 5)),
        columns=list("abcde")
    )

    # I concatenate all dfs into one and use `keys` to identify which rows
    # belong to which df
    # in your function you could set keys to the file names
    df = pd.concat([df1, df2], keys=["df1", "df2"], names=["file_name"])

    # groupby the keys and calculate the difference between 0th and -1st rows
    out = df.groupby("file_name").apply(lambda df: df.iloc[-1] - df.iloc[0])

    print(out)

                  a         b        c         d         e
file_name                                                 
df1        0.383723  0.247937  0.31331  0.389990  0.729633
df2        0.069251  0.039360 -0.12154 -0.338791 -0.293208

Last step is to save this to a.CSV using pandas.DataFrame.to_csv最后一步是使用 pandas.DataFrame.to_csv 将其保存到pandas.DataFrame.to_csv

out.to_csv("path/to/file.csv")

如何在 python 中生成包含多个 csv 文件的最后一行和第一行的差异的报告？

问题描述

1 个解决方案

解决方案1
0 2022-12-22 16:22:35

Using your code使用你的代码

Another Approach另一种方法

如何在 python 中生成包含多个 csv 文件的最后一行和第一行的差异的报告？

问题描述

1 个解决方案

解决方案1 0 2022-12-22 16:22:35

Using your code使用你的代码

Another Approach另一种方法

解决方案1
0 2022-12-22 16:22:35