如何编写脚本来读取许多 CSV 文件名和数据并写入另一个 CSV 文件？

Question

我有许多 CSV 文件名，需要将文件内的所有文件名和数据写入另一个 CSV 文件。

例子：

文件1：少bonding_err_bond0-if_eth2-d.rrd.csv

1617613500,0.0000000000e+00

文件2：少bonding_err_bond0-if_eth3-d.rrd.csv

1617613500,0.0000000000e+00

最终 output 结果

最终文件：less bonding.csv

bonding_err_bond0-if_eth2-d.rrd,bonding_err_bond0-if_eth3-d.rrd.csv
0.0000000000e+00,0.0000000000e+00

注意：脚本可以是 python 或 bash 脚本

Answer 1

所以基本上你想要一个带有文件名和一串数据的 header 表？ 这是一个可能对您有所帮助的片段

#!/bin/bash
HEADER=''
DATA=''
while IFS= read -r -d '' CSV
do
  HEADER="${HEADER}$(basename "$CSV"),"
  DATA="${DATA}$(cut -d "," -f 2 "$CSV"),"
done <   <(find ./ -name "*.csv" -type f -print0)
echo "${HEADER%,}"
echo "${DATA%,}"

首先，我们初始化两个空变量HEADER将包含我们所有的文件名和DATA包含每个文件的第二个字段，用,符号分隔。

之后我们有一个while循环，它可能看起来很复杂，但这里解释了其原因： https://github.com/koalaman/shellcheck/wiki/SC2044

TLDR 版本是我们要处理所有可能破坏 for 循环的不寻常字符。

在循环中，我们将包含在CSV变量中的文件名附加到HEADER变量中。 basename只给我们文件名部分，没有文件夹。 如果您不需要.csv扩展名，您可以使用basename -s.csv "$CSV"作为命令。

DATA以相同的方式处理，但我们将文件内容拆分为,并仅打印第二个字段。

After both strings are formed, we are echoing them with removed trailing commas, This technique is called bash parameter substitution, check https://www.cyberciti.biz/tips/bash-shell-parameter-substitution-2.html for more.

该脚本将处理当前目录及其子目录中的所有 csv 文件。

要从中创建文件，只需将其 output 重定向到文件，即将此脚本保存为 merge_csv.sh 并运行

bash merge_csv.sh > bonding.csv

去测试：

生成5个内容相似的文件：

for i in $(seq 1 5); do echo "0.0000000000e+00,$i.0000000000e+00" > "$i.csv"; done

在文件夹中运行此脚本会导致：

1,2,3,4,5
1.0000000000e+00,2.0000000000e+00,3.0000000000e+00,4.0000000000e+00,5.0000000000e+00

Answer 2

Pandas Python 库非常适合处理 CSV。

import os
import pandas as pd
import re

out_file_name = './less bonding.csv'

# Create a Pandas DataFrame
output = pd.DataFrame()

# Remove any output files we might've made previously
if os.path.isfile(out_file_name):
    os.remove(out_file_name)

# Get all the files in the current dir
file_names = os.listdir()

# Loop through our file_names
for file_name in file_names:

    # Regex check it's a .csv file
    csv = re.match(r'^.+\.csv$', file_name)
    if(csv != None):

        # Read our csv into a DataFrame
        # To preserve our data rather than it be converted to floats, use dtype=str
        data = pd.read_csv(file_name, header=None, dtype=str)

        # Put column 1 of csv into column [file_name] of our output DataFrame
        output[file_name] = data[1]

# Remove the index (first column) - we don't need it
output.set_index(output.columns[0], inplace=True)

# Output it as a csv
output.to_csv(out_file_name)

这是 output：

less bonding_err_bond0-if_eth2-d.rrd.csv,less bonding_err_bond0-if_eth3-d.rrd.csv
0.0000000000e+00,0.0000000000e+00

Answer 3

顺便说一句 Pandas、Python 库非常有用。

这是一个例子：

from pathlib import Path
import csv, os
import pandas as pd

def finalFile(fname):
    
    output = pd.DataFrame()

    file_names = os.listdir()

    for file_name in file_names:
        if file_name.startswith(fname):
            data = pd.read_csv(file_name, header=None, dtype=str)
            output[file_name.rsplit('.', 4)[2]] = data[1]

    output.set_index(output.columns[0], inplace=True)
    output.to_csv(fname.rsplit('.', 2)[2] + ".csv")


finalFile('xxx.test.test-bonding')

最后结果

test-bonding_err_bond0-if_eth3-d,test-bonding_err_bond0-if_eth2-d
0.0000000000e+00,0.0000000000e+00

如何编写脚本来读取许多 CSV 文件名和数据并写入另一个 CSV 文件？

问题描述

3 个解决方案

解决方案1
1 2021-04-05 12:41:54

解决方案2
1 已采纳 2021-04-05 13:51:12

解决方案3
0

如何编写脚本来读取许多 CSV 文件名和数据并写入另一个 CSV 文件？

问题描述

3 个解决方案

解决方案1 1 2021-04-05 12:41:54

解决方案2 1 已采纳 2021-04-05 13:51:12

解决方案3 0

解决方案1
1 2021-04-05 12:41:54

解决方案2
1 已采纳 2021-04-05 13:51:12

解决方案3
0