[英]Concatenate csv files in powershell, without the first line (except for the first file)
I have multiple *.csv files.我有多个 *.csv 文件。 I want to concatenate them into a single CSV file in a powershell script.我想在 powershell 脚本中将它们连接成一个 CSV 文件。 All csv files have the same header (the first line), so when I concatenate them I want to keep the first line only from the first file.所有 csv 文件都有相同的标题(第一行),所以当我连接它们时,我只想保留第一个文件的第一行。
How can I do that?我怎样才能做到这一点?
Note: The solution in this answer intentionally uses plain-text processing to process the files, for two reasons:注意:此答案中的解决方案有意使用纯文本处理来处理文件,原因有两个:
Use of Import-Csv
and Export-Csv
incurs significant processing overhead (though that may not matter in a given situation);使用Import-Csv
和Export-Csv
会产生大量的处理开销(尽管在特定情况下这可能无关紧要); plain-text processing is significantly faster .纯文本处理速度明显更快。
In Windows PowerShell and PowerShell [Core] 6.x, the output will invariably have double-quoted column values , even if they weren't initially (though that should normally not matter).在 Windows PowerShell 和 PowerShell [Core] 6.x 中,输出将始终具有双引号列值,即使它们最初不是(尽管这通常无关紧要)。
Export-Csv
and ConvertTo-Csv
now have a -UseQuotes
parameter that allows you to control quoting in the output.在 PowerShell [Core] 7.0+ Export-Csv
和ConvertTo-Csv
现在有一个-UseQuotes
参数,允许您控制输出中的引用。 That said, Import-Csv
and Export-Csv
are certainly the better choice whenever you need to read and interpret the data (as opposed to just copying it elsewhere) - see Sid's helpful answer .也就是说,当您需要读取和解释数据(而不是仅将其复制到其他地方)时, Import-Csv
和Export-Csv
无疑是更好的选择- 请参阅Sid 的有用答案。
# The single output file.
# Note: Best to save this in a different folder than the input
# folder, in case you need to run multiple times.
$outFile = 'outdir/out.csv'
# Get all input CSV files as an array of file-info objects,
# from the current dir. in this example
$inFiles = @(Get-ChildItem -Filter *.csv)
# Extract the header line (column names) from the first input file
# and write it to the output file.
Get-Content $inFiles[0] -First 1 | Set-Content -Encoding Utf8 $outFile
# Process all input files and append their *data* rows to the
# output file (that is, skip the header row).
# NOTE: If you only wanted to extract a given count $count of data rows
# from each file, add -First ($count+1) to the Get-Content call.
foreach ($file in $inFiles) {
Get-Content $_.FullName | Select-Object -Skip 1 |
Set-Content -Append -Encoding Utf8 $outFile
}
Note the use of -Encoding Utf8
as an example;注意以-Encoding Utf8
为例; adjust as needed;根据需要调整; by default, Set-Content
will use "ANSI" encoding in Windows PowerShell, and BOM-less UTF-8 in PowerShell Core .默认情况下, Set-Content
将在 Windows PowerShell 中使用“ANSI”编码,在 PowerShell Core 中使用BOM-less UTF-8。
Caveat : By doing line-by-line plain-text processing, you're relying on each text line representing a single CSV data row ;警告:通过逐行纯文本处理,您依赖于代表单个 CSV数据行的每个文本行; this is typically true, but doesn't have to be.这通常是正确的,但并非必须如此。
Conversely, if performance is paramount, the plain-text approach above could be made significantly faster with direct use of .NET methods such as [IO.File]::ReadLines()
or, if the files are small enough, even [IO.File]::ReadAllLines()
.相反,如果性能是最重要的,直接使用 .NET 方法(如[IO.File]::ReadLines()
或者,如果文件足够小,甚至[IO.File]::ReadAllLines()
。
You could have done like this:你可以这样做:
(Get-ChildItem -Path $path -Filter *.csv).FullName | Import-Csv | Export-Csv $path\concatenated.csv -NoTypeInformation
Where $path
is the folder where the csv files exist.其中$path
是 csv 文件所在的文件夹。 The final csv file will be in the same folder.最终的 csv 文件将位于同一文件夹中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.