簡體   English   中英

從目錄中的多個csv創建單個CSV 1st CSV將兩列復制到csv之后,僅第二列

[英]Create single CSV from multiple csvs in a directory 1st CSV copy both columns subsequent csv only the 2nd column

我正在尋找從目錄中的許多csv創建單個csv。 我知道這個問題已經被討論過很多次了,但是我有一點點曲解。 我想要做的事情:

  1. 查找最大的文件。
  2. 使用最大的文件-將其用作基礎。 最大文件中的第一列將是我合並其余文件所需的主鍵。
  3. 將目錄中的每個文件與第一個CSV中的主鍵進行比較,然后將每個csv的第二列添加到最大的CSV中。

話雖如此,我正在與以下工作:

我發現此鏈接將一列從一個csv轉移到另一列。

https://askubuntu.com/questions/553219/add-column-from-one-csv-to-another-csv-file

我可以利用類似的東西將列從一個添加到另一個。

paste -d, file2 <(cut -d, -f3- file1)

以下PHP將為我獲取目錄的文件列表,現在嘗試利用PHP組合/合並csvs。

$dir= $Folder.'/Stats/Latency/'; // directory name 
$ar=scandir($dir); 
$box=$_POST['box'];  // Receive the file list from form

// Looping through the list of selected files ///
while (list ($key,$val) = @each ($box)) {
$path=$dir  ."/".$val;
$dest = $Folder."/Report/Latency/".$val;
if(copy($path, $dest)); //echo "Copy Complete file ";
echo "$val,";
}
echo "<hr>";

這是我需要以下CSV合並的地方:我正在辯論利用shell exec命令,但這似乎非常耗費人力。

$reportFiles = $Folder."/Report/Latency/";
foreach(glob($reportFiles."*.csv") as $file)
{
   shell_exec("touch "$reportFiles."latencyReport.csv");

}

由於它與csv文件中的數據有關:

CSV1:

date,vpool06
2016-03-28 12:00:00,0.000
2016-03-28 12:01:00,0.000
2016-03-28 12:02:00,0.000
2016-03-28 12:03:00,0.000
2016-03-28 12:04:00,0.000
2016-03-28 12:05:00,0.000
2016-03-28 12:06:00,0.000
2016-03-28 12:07:00,0.000
2016-03-28 12:08:00,0.000
2016-03-28 12:09:00,0.000
2016-03-28 12:10:00,0.000
2016-03-28 12:11:00,0.000
2016-03-28 12:12:00,0.000
2016-03-28 12:13:00,0.000
2016-03-28 12:14:00,0.000
2016-03-28 12:15:00,0.000
2016-03-28 12:16:00,0.000
2016-03-28 12:17:00,0.000
2016-03-28 12:18:00,0.000
2016-03-28 12:19:00,0.000

CSV2:

date,vpool02
2016-03-28 12:00:00,0.000
2016-03-28 12:01:00,0.000
2016-03-28 12:02:00,0.000
2016-03-28 12:04:00,0.000
2016-03-28 12:05:00,0.000
2016-03-28 12:06:00,0.000
2016-03-28 12:07:00,0.000
2016-03-28 12:08:00,0.000
2016-03-28 12:09:00,0.000
2016-03-28 12:10:00,0.000
2016-03-28 12:11:00,0.000
2016-03-28 12:12:00,0.000
2016-03-28 12:13:00,0.000
2016-03-28 12:14:00,0.000

CSV3:

date,vpool03
2016-03-28 12:00:00,0.000
2016-03-28 12:01:00,0.000
2016-03-28 12:02:00,0.000
2016-03-28 12:04:00,0.000
2016-03-28 12:05:00,0.000

合並的CSV:

date,vpool06,vpool02,vpool03
2016-03-28 12:00:00,0.000,0.000,0.000
2016-03-28 12:01:00,0.000,0.000,0.000
2016-03-28 12:02:00,0.000,0.000,0.000
2016-03-28 12:03:00,0.000,,0.000
2016-03-28 12:04:00,0.000,0.000,0.000
2016-03-28 12:05:00,0.000,0.000,0.000
2016-03-28 12:06:00,0.000,0.000,
2016-03-28 12:07:00,0.000,0.000,
2016-03-28 12:08:00,0.000,0.000,
2016-03-28 12:09:00,0.000,0.000,
2016-03-28 12:10:00,0.000,0.000,
2016-03-28 12:11:00,0.000,0.000,
2016-03-28 12:12:00,0.000,0.000,
2016-03-28 12:13:00,0.000,0.000,
2016-03-28 12:14:00,0.000,0.000,
2016-03-28 12:15:00,0.000,,
2016-03-28 12:16:00,0.000,,
2016-03-28 12:17:00,0.000,,
2016-03-28 12:18:00,0.000,,
2016-03-28 12:19:00,0.000,,

理想情況下,我不在乎此時是否存在“空”值,因為它不會顯示在圖表中。 這意味着服務器當時處於關閉狀態。

需要它在沒有數據的空間中具有null。
更新:示例。

date,vpool06,7NA_01,7NA_02,bd01,bd02,vpool01,vpool02,vpool03,vpool04,vpool07
2016-03-28 12:00:00,1.000,null,10.00,02.00,20.00,0.00,0.00,0.00,0.00,0.000
2016-03-28 12:01:00,0.000,11.00,110.00,null,11.00,0.00,0.00,0.00,0.00,0.000
2016-03-28 12:02:00,0.000,null,0.00,2.00,100,0.00,0.00,0.00,0.00,0.000
2016-03-28 12:03:00,0.000,0.00,0.00,02.00,10.00,0.00,0.000,0.00,0.00,0.000

awk解救!

$ awk -F, -v OFS=, 'FNR==1{c++} {a[$1,c]=$2;keys[$1]}
                       END{for(k in keys) 
                            {printf "%s", k; 
                             for(i=1;i<=c;i++) 
                                 printf "%s", OFS (((k,i) in a)?a[k,i]:""); 
                             print ""}}' file{1,2,3} | 
 sort -t, -k1,1 | 
 tee >(sed '$d' > merged) >(tail -1 >> merged) 

$ cat merged

date,vpool06,vpool02,vpool03                                                                                          
2016-03-28 12:00:00,0.000,0.000,0.000                                                                                 
2016-03-28 12:01:00,0.000,0.000,0.000
2016-03-28 12:02:00,0.000,0.000,0.000
2016-03-28 12:03:00,0.000,,
2016-03-28 12:04:00,0.000,0.000,0.000
2016-03-28 12:05:00,0.000,0.000,0.000
2016-03-28 12:06:00,0.000,0.000,
2016-03-28 12:07:00,0.000,0.000,
2016-03-28 12:08:00,0.000,0.000,
2016-03-28 12:09:00,0.000,0.000,
2016-03-28 12:10:00,0.000,0.000,
2016-03-28 12:11:00,0.000,0.000,
2016-03-28 12:12:00,0.000,0.000,
2016-03-28 12:13:00,0.000,0.000,
2016-03-28 12:14:00,0.000,0.000,
2016-03-28 12:15:00,0.000,,
2016-03-28 12:16:00,0.000,,
2016-03-28 12:17:00,0.000,,
2016-03-28 12:18:00,0.000,,
2016-03-28 12:19:00,0.000,,

我不知道您將如何在PHP中執行此操作,但是對於真正的2D數組並使用GNU awk進行排序並“歸類”為:

$ cat tst.awk
BEGIN { FS=OFS="," }
FNR==1 { hdr[ARGIND][1]=$1; hdr[ARGIND][2]=$2; next }
{ arr[ARGIND][$1] = $2 }
END {
    for (idx in arr) {
        numRows = length(arr[idx])
        if (numRows > maxRows) {
            maxRows = numRows
            maxIdx  = idx
        }
    }

    printf "%s%s%s", hdr[maxIdx][1], OFS, hdr[maxIdx][2]
    for (idx=1; idx<=ARGIND; idx++) {
        if (idx != maxIdx) {
            printf "%s%s", OFS, hdr[idx][2]
        }
    }
    print ""

    PROCINFO["sorted_in"] = "@ind_str_asc"
    for (tstamp in arr[maxIdx]) {
        printf "%s%s%s", tstamp, OFS, arr[maxIdx][tstamp]
        for (idx=1; idx<=ARGIND; idx++) {
            if (idx != maxIdx) {
                printf "%s%s", OFS, (tstamp in arr[idx] ? arr[idx][tstamp] : "null")
            }
        }
        print ""
    }
}

$ awk -f tst.awk csv3 csv2 csv1
date,vpool06,vpool03,vpool02
2016-03-28 12:00:00,0.000,0.000,0.000
2016-03-28 12:01:00,0.000,0.000,0.000
2016-03-28 12:02:00,0.000,0.000,0.000
2016-03-28 12:03:00,0.000,null,null
2016-03-28 12:04:00,0.000,0.000,0.000
2016-03-28 12:05:00,0.000,0.000,0.000
2016-03-28 12:06:00,0.000,null,0.000
2016-03-28 12:07:00,0.000,null,0.000
2016-03-28 12:08:00,0.000,null,0.000
2016-03-28 12:09:00,0.000,null,0.000
2016-03-28 12:10:00,0.000,null,0.000
2016-03-28 12:11:00,0.000,null,0.000
2016-03-28 12:12:00,0.000,null,0.000
2016-03-28 12:13:00,0.000,null,0.000
2016-03-28 12:14:00,0.000,null,0.000
2016-03-28 12:15:00,0.000,null,null
2016-03-28 12:16:00,0.000,null,null
2016-03-28 12:17:00,0.000,null,null
2016-03-28 12:18:00,0.000,null,null
2016-03-28 12:19:00,0.000,null,null

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM