简体   繁体   English

使用 PowerShell 打开 csv 大文件并导出特定列

[英]Using PowerShell to open large csv files and exporting specific columns

I'm using the command below to get the first 1,000 rows of data:我正在使用下面的命令获取前 1,000 行数据:

Get-Content -First 1000 'C:\Users\Inspiron\Desktop\base.csv' | Out-File 'C:\Users\Inspiron\Desktop\sample.csv'

However, how can I adapt this command to obtain data ranges.但是,如何调整此命令以获取数据范围。 For example, to extract the interval between rows 700 through 900.例如,要提取第 700 行到第 900 行之间的间隔。

Another thing, how can I export only a few variables.另一件事,我怎样才能只导出几个变量。 For example, my database has 120 columns but I want to save only the variables year (1st column), date of birth (4th column) and state of origin (100th column).例如,我的数据库有 120 列,但我只想保存变量年份(第 1 列)、出生日期(第 4 列)和 state of origin(第 100 列)。

I would suggest using import-csv instead of get-content我建议使用import-csv而不是get-content

One way you could achieve this would be to use something like:实现此目的的一种方法是使用类似的方法:

$csv = Import-CSv $csvPATH 
$rangeselect = $csv[700..900]

just writing a quick test you should expect a similar result (myself selecting between much smaller ranges) as:只需编写一个快速测试,您应该会得到类似的结果(我自己在更小的范围之间进行选择):

test test1 test2
---- ----- -----
8    9     10   
9    10    11   
10   11    12   
11   12    13   
12   13    14   
13   14    15   
14   15    16   
15   16    17   
16   17    18

With regard to selecting specific columns, you can also achieve this with the use of import-csv .关于选择特定列,您也可以使用import-csv来实现。 Using the above as an example you could add:以上面为例,您可以添加:

$specificcol = $rangeselect.test1

You can see the .test1 is specifically targetting a column and use this to apply to what you are attempting to grab.您可以看到.test1专门针对一个列,并使用它来应用到您试图抓取的内容。

There are a few ways to go about this.有几种方法可以go了解一下。 Building Mathews helpful answer :建筑马修斯有用的答案

$InCsv  = 'C:\Users\Inspiron\Desktop\base.csv'
$OutCsv = 'C:\Users\Inspiron\Desktop\sample.csv'
$Props  = "year","date of birth","state of origin"

$InCsv = Import-Csv $InCsv
$rangeselect = $InCsv[700..900] # You could use variables here too...

$rangeselect | 
Select-Object $Props |
Export-Csv -Path $OutCsv -NoTypeInformation

This takes the additional step of selecting the properties you want and re-exporting them to a new CSV file.这需要额外的步骤来选择您想要的属性并将它们重新导出到新的 CSV 文件。

Note: It's unlikely, but if you are working with very large files this approach may have memory issues.注意:这不太可能,但如果您使用的是非常大的文件,则此方法可能会出现 memory 问题。 It reads the entire file into memory upfront, storing it in the $csv variable.它预先将整个文件读入 memory,并将其存储在$csv变量中。 This could also happen if the system is memory bound, but that's infrequent.如果系统是 memory 绑定,也可能发生这种情况,但这种情况很少见。

Technically you don't need to assign the $rangeselect variable you can use the range operator " .. " directly on the Import-Csv command like:从技术上讲,您不需要分配$rangeselect变量,您可以直接在 Import-Csv 命令上使用范围运算符“ .. ”,例如:

(Import-Csv $InCsv)[700..900] | 
Select-Object $Props |
Export-Csv -Path $OutCsv -NoTypeInformation

Here, the (..) completes reading all the CSV data first before referencing, so it should work about the same.这里, (..)在引用之前首先完成读取所有 CSV 数据,因此它应该大致相同。


If you want to build on the initial sample.如果您想在初始示例的基础上构建。 Which has the advantage of only reading the first 1000 lines, most likely bypassing any memory constraints:它的优点是只读取前 1000 行,很可能绕过任何 memory 约束:

$InCsv  = 'C:\Users\Inspiron\Desktop\base.csv'
$OutCsv = 'C:\Users\Inspiron\Desktop\sample.csv'
$Props  = "year","date of birth","state of origin"
$Skip   = 700
$First  = 200

Get-Content -First 1000 $InCsv | 
ConvertFrom-Csv |
Select-Object -Skip $Skip -First $First -Property $Props |
Export-Csv -Path $OutCsv -NoTypeInformation

This is effectively a one-liner with a few convenience variables.这实际上是一个带有一些方便变量的单行代码。 It takes advantage of the parameters in Select-Object .它利用了Select-Object中的参数。 Note it too only returns the properties you ask for, and so will output a new CSV file with only that data.请注意,它也只返回您要求的属性,因此 output 将返回一个仅包含该数据的新 CSV 文件。


You can also combine these approaches, again because Select-Object allows for -Skip , -First and also -Last parameters for some rudimentary initial filtering.您也可以结合这些方法,因为Select-Object允许-Skip-First-Last参数进行一些基本的初始过滤。 That might look something like:这可能看起来像:

$InCsv  = 'C:\Users\Inspiron\Desktop\base.csv'
$OutCsv = 'C:\Users\Inspiron\Desktop\sample.csv'
$Props  = "year","date of birth","state of origin"
$Skip   = 699
$First  = 200

Import-CSv $InCsv |
Select-Object -Skip 700 -First 200 -Property $Props |
Export-Csv -Path $OutCsv -NoTypeInformation

In this example you may have to play with the boundaries.在这个例子中,你可能不得不玩弄边界。 But, it's still effectively a one-liner and has the potential to get what you're looking for.但是,它实际上仍然是单线的,并且有可能获得您正在寻找的东西。

Note: Select-Object can tell the command on the left side of the pipe to stop sending data.注意: Select-Object可以告诉pipe左边的命令停止发送数据。 However, I'm not sure every cmdlet reacts properly, hence performance may vary compared to the Get-Content approach.但是,我不确定每个 cmdlet 是否都能正确响应,因此与Get-Content方法相比,性能可能会有所不同。 That may only be important if you're deal with larger files, otherwise I'd go with whatever approach is considered more readable and/or maintainable...这可能仅在您处理较大的文件时才重要,否则我会使用任何被认为更具可读性和/或可维护性的方法 go ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM