繁体   English   中英

从目录中的多个csv创建单个CSV 1st CSV将两列复制到csv之后,仅第二列

[英]Create single CSV from multiple csvs in a directory 1st CSV copy both columns subsequent csv only the 2nd column

我正在寻找从目录中的许多csv创建单个csv。 我知道这个问题已经被讨论过很多次了,但是我有一点点曲解。 我想要做的事情:

  1. 查找最大的文件。
  2. 使用最大的文件-将其用作基础。 最大文件中的第一列将是我合并其余文件所需的主键。
  3. 将目录中的每个文件与第一个CSV中的主键进行比较,然后将每个csv的第二列添加到最大的CSV中。

话虽如此,我正在与以下工作:

我发现此链接将一列从一个csv转移到另一列。

https://askubuntu.com/questions/553219/add-column-from-one-csv-to-another-csv-file

我可以利用类似的东西将列从一个添加到另一个。

paste -d, file2 <(cut -d, -f3- file1)

以下PHP将为我获取目录的文件列表,现在尝试利用PHP组合/合并csvs。

$dir= $Folder.'/Stats/Latency/'; // directory name 
$ar=scandir($dir); 
$box=$_POST['box'];  // Receive the file list from form

// Looping through the list of selected files ///
while (list ($key,$val) = @each ($box)) {
$path=$dir  ."/".$val;
$dest = $Folder."/Report/Latency/".$val;
if(copy($path, $dest)); //echo "Copy Complete file ";
echo "$val,";
}
echo "<hr>";

这是我需要以下CSV合并的地方:我正在辩论利用shell exec命令,但这似乎非常耗费人力。

$reportFiles = $Folder."/Report/Latency/";
foreach(glob($reportFiles."*.csv") as $file)
{
   shell_exec("touch "$reportFiles."latencyReport.csv");

}

由于它与csv文件中的数据有关:

CSV1:

date,vpool06
2016-03-28 12:00:00,0.000
2016-03-28 12:01:00,0.000
2016-03-28 12:02:00,0.000
2016-03-28 12:03:00,0.000
2016-03-28 12:04:00,0.000
2016-03-28 12:05:00,0.000
2016-03-28 12:06:00,0.000
2016-03-28 12:07:00,0.000
2016-03-28 12:08:00,0.000
2016-03-28 12:09:00,0.000
2016-03-28 12:10:00,0.000
2016-03-28 12:11:00,0.000
2016-03-28 12:12:00,0.000
2016-03-28 12:13:00,0.000
2016-03-28 12:14:00,0.000
2016-03-28 12:15:00,0.000
2016-03-28 12:16:00,0.000
2016-03-28 12:17:00,0.000
2016-03-28 12:18:00,0.000
2016-03-28 12:19:00,0.000

CSV2:

date,vpool02
2016-03-28 12:00:00,0.000
2016-03-28 12:01:00,0.000
2016-03-28 12:02:00,0.000
2016-03-28 12:04:00,0.000
2016-03-28 12:05:00,0.000
2016-03-28 12:06:00,0.000
2016-03-28 12:07:00,0.000
2016-03-28 12:08:00,0.000
2016-03-28 12:09:00,0.000
2016-03-28 12:10:00,0.000
2016-03-28 12:11:00,0.000
2016-03-28 12:12:00,0.000
2016-03-28 12:13:00,0.000
2016-03-28 12:14:00,0.000

CSV3:

date,vpool03
2016-03-28 12:00:00,0.000
2016-03-28 12:01:00,0.000
2016-03-28 12:02:00,0.000
2016-03-28 12:04:00,0.000
2016-03-28 12:05:00,0.000

合并的CSV:

date,vpool06,vpool02,vpool03
2016-03-28 12:00:00,0.000,0.000,0.000
2016-03-28 12:01:00,0.000,0.000,0.000
2016-03-28 12:02:00,0.000,0.000,0.000
2016-03-28 12:03:00,0.000,,0.000
2016-03-28 12:04:00,0.000,0.000,0.000
2016-03-28 12:05:00,0.000,0.000,0.000
2016-03-28 12:06:00,0.000,0.000,
2016-03-28 12:07:00,0.000,0.000,
2016-03-28 12:08:00,0.000,0.000,
2016-03-28 12:09:00,0.000,0.000,
2016-03-28 12:10:00,0.000,0.000,
2016-03-28 12:11:00,0.000,0.000,
2016-03-28 12:12:00,0.000,0.000,
2016-03-28 12:13:00,0.000,0.000,
2016-03-28 12:14:00,0.000,0.000,
2016-03-28 12:15:00,0.000,,
2016-03-28 12:16:00,0.000,,
2016-03-28 12:17:00,0.000,,
2016-03-28 12:18:00,0.000,,
2016-03-28 12:19:00,0.000,,

理想情况下,我不在乎此时是否存在“空”值,因为它不会显示在图表中。 这意味着服务器当时处于关闭状态。

需要它在没有数据的空间中具有null。
更新:示例。

date,vpool06,7NA_01,7NA_02,bd01,bd02,vpool01,vpool02,vpool03,vpool04,vpool07
2016-03-28 12:00:00,1.000,null,10.00,02.00,20.00,0.00,0.00,0.00,0.00,0.000
2016-03-28 12:01:00,0.000,11.00,110.00,null,11.00,0.00,0.00,0.00,0.00,0.000
2016-03-28 12:02:00,0.000,null,0.00,2.00,100,0.00,0.00,0.00,0.00,0.000
2016-03-28 12:03:00,0.000,0.00,0.00,02.00,10.00,0.00,0.000,0.00,0.00,0.000

awk解救!

$ awk -F, -v OFS=, 'FNR==1{c++} {a[$1,c]=$2;keys[$1]}
                       END{for(k in keys) 
                            {printf "%s", k; 
                             for(i=1;i<=c;i++) 
                                 printf "%s", OFS (((k,i) in a)?a[k,i]:""); 
                             print ""}}' file{1,2,3} | 
 sort -t, -k1,1 | 
 tee >(sed '$d' > merged) >(tail -1 >> merged) 

$ cat merged

date,vpool06,vpool02,vpool03                                                                                          
2016-03-28 12:00:00,0.000,0.000,0.000                                                                                 
2016-03-28 12:01:00,0.000,0.000,0.000
2016-03-28 12:02:00,0.000,0.000,0.000
2016-03-28 12:03:00,0.000,,
2016-03-28 12:04:00,0.000,0.000,0.000
2016-03-28 12:05:00,0.000,0.000,0.000
2016-03-28 12:06:00,0.000,0.000,
2016-03-28 12:07:00,0.000,0.000,
2016-03-28 12:08:00,0.000,0.000,
2016-03-28 12:09:00,0.000,0.000,
2016-03-28 12:10:00,0.000,0.000,
2016-03-28 12:11:00,0.000,0.000,
2016-03-28 12:12:00,0.000,0.000,
2016-03-28 12:13:00,0.000,0.000,
2016-03-28 12:14:00,0.000,0.000,
2016-03-28 12:15:00,0.000,,
2016-03-28 12:16:00,0.000,,
2016-03-28 12:17:00,0.000,,
2016-03-28 12:18:00,0.000,,
2016-03-28 12:19:00,0.000,,

我不知道您将如何在PHP中执行此操作,但是对于真正的2D数组并使用GNU awk进行排序并“归类”为:

$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { hdr[ARGIND][1]=$1; hdr[ARGIND][2]=$2; next }
{ arr[ARGIND][$1] = $2 }
END {
    for (idx in arr) {
        numRows = length(arr[idx])
        if (numRows > maxRows) {
            maxRows = numRows
            maxIdx  = idx
        }
    }

    printf "%s%s%s", hdr[maxIdx][1], OFS, hdr[maxIdx][2]
    for (idx=1; idx<=ARGIND; idx++) {
        if (idx != maxIdx) {
            printf "%s%s", OFS, hdr[idx][2]
        }
    }
    print ""

    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (tstamp in arr[maxIdx]) {
        printf "%s%s%s", tstamp, OFS, arr[maxIdx][tstamp]
        for (idx=1; idx<=ARGIND; idx++) {
            if (idx != maxIdx) {
                printf "%s%s", OFS, (tstamp in arr[idx] ? arr[idx][tstamp] : "null")
            }
        }
        print ""
    }
}

$ awk -f tst.awk csv3 csv2 csv1
date,vpool06,vpool03,vpool02
2016-03-28 12:00:00,0.000,0.000,0.000
2016-03-28 12:01:00,0.000,0.000,0.000
2016-03-28 12:02:00,0.000,0.000,0.000
2016-03-28 12:03:00,0.000,null,null
2016-03-28 12:04:00,0.000,0.000,0.000
2016-03-28 12:05:00,0.000,0.000,0.000
2016-03-28 12:06:00,0.000,null,0.000
2016-03-28 12:07:00,0.000,null,0.000
2016-03-28 12:08:00,0.000,null,0.000
2016-03-28 12:09:00,0.000,null,0.000
2016-03-28 12:10:00,0.000,null,0.000
2016-03-28 12:11:00,0.000,null,0.000
2016-03-28 12:12:00,0.000,null,0.000
2016-03-28 12:13:00,0.000,null,0.000
2016-03-28 12:14:00,0.000,null,0.000
2016-03-28 12:15:00,0.000,null,null
2016-03-28 12:16:00,0.000,null,null
2016-03-28 12:17:00,0.000,null,null
2016-03-28 12:18:00,0.000,null,null
2016-03-28 12:19:00,0.000,null,null

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM