使用awk从CSV列中删除空格时出现轻微错误

Question

I have used the following awk command on my bash script to delete spaces on the 26th column of my CSV; 我在bash脚本上使用了以下awk命令，以删除CSV第26列上的空格；

awk 'BEGIN{FS=OFS="|"} {gsub(/ /,"",$26)}1' original.csv > final.csv

Out of 400 rows, I have about 5 random rows that this doesn't work on even if I rerun the script on final.csv. 在400行中，即使我在final.csv上重新运行脚本，我也有大约5行不起作用。 Can anyone assist me with a method to take care of this? 谁能协助我解决这个问题？ Thank you in advance. 先感谢您。

EDIT: Here is a sample of the 26th column on original.csv vs final.csv respectively; 编辑：这是分别对original.csv与final.csv的第26列的示例；

2212026837                         2212026837
2256  41688  6                     2256416886
2076113566                         2076113566
2009  84517  7                     2009845177
2067950476                         2067950476
2057  90531  5                     2057  90531  5  
2085271676                         2085271676
2095183426                         2095183426
2347366235                         2347366235
2200160434                         2200160434
2229359595                         2229359595
2045373466                         2045373466
2053849895                         2053849895
2300  81552  3                     2300  81552  3

Answer 1

You can use the string function split , and iterate over the corresponding array to reassign the 26th field: 您可以使用字符串函数 split ，并迭代相应的数组以重新分配第26个字段：

awk 'BEGIN{FS=OFS="|"} {
    n = split($26, a, /[[:space:]]+/)
    $26=a[1]
    for(i=2; i<=n; i++)
        $26=$26""a[i]
}1' original.csv > final.csv

Answer 2

I see two possibilities. 我看到两种可能性。

The simplest is that you have some whitespace other than a space. 最简单的是，您除了空格以外还有一些空白。 You can fix that by using a more general regex in your gsub : instead of / / , use /[[:space:]]/ . 您可以通过在gsub使用更通用的正则表达式来解决此问题：代替/ / ，请使用/[[:space:]]/ 。

If that solves your problem, great! 如果这样可以解决您的问题，那就太好了！ You got lucky, move on. 您很幸运，继续前进。 :) :)

The other possible problem is trickier. 另一个可能的问题是棘手的。 The CSV (or, in this case, pipe-SV) format is not as simple as it appears, since you can have quoted delimiters inside fields. CSV（或在这种情况下为pipe-SV）格式并不像它看起来的那么简单，因为您可以在字段中加引号分隔符。 This, for instance, is a perfectly valid 4-field line in a pipe-delimited file: 例如，这是用竖线分隔的文件中完全有效的4字段行：
```
 field 1|"field 2 contains some |pipe| characters"|field 3|field 4 
```
If the first 4 fields on a line in your file looked like that, your gsub on $26 would actually operate on $24 instead, leaving $26 alone. 如果文件中一行的前4个字段看起来像这样，则$26的gsub实际上$24运行，而剩下$26 。 If you have data like that, the only real solution is to use a scripting language with an actual CSV parsing library. 如果您有这样的数据，唯一的解决方案是使用带有实际CSV解析库的脚本语言。 Perl has Text::CSV , but it's not installed by default; Perl具有Text::CSV ，但默认情况下未安装； Python's csv module is, so you could use a program like so: Python的csv模块是，因此您可以使用如下程序：
```
 import csv, fileinput as fi, re; for row in csv.reader(fi.input(), delimiter='|'): row[25] = re.sub(r'\\s+', '', row[25]) # fields start at 0 instead of 1 print '|'.join(row) 
```
Save the above in a file like colfixer.py and run it with python colfixer.py original.csv >final.csv . 将以上内容保存在类似于colfixer.py的文件中，然后使用python colfixer.py original.csv >final.csv运行它。
(If you tried hard enough, you could get that shoved into a -c option string and run it from the command line without creating a script file, but Python's not really built for that and it gets ugly fast.) （如果经过足够的努力，您可以将其塞入-c选项字符串并在不创建脚本文件的情况下从命令行运行它，但是Python并不是真正为此而构建的，并且它的运行速度很快。）

使用awk从CSV列中删除空格时出现轻微错误

问题描述

2 个解决方案

解决方案1
1 2015-07-21 10:58:04

解决方案2
1 已采纳 2015-07-21 11:08:12

使用awk从CSV列中删除空格时出现轻微错误

问题描述

2 个解决方案

解决方案1 1 2015-07-21 10:58:04

解决方案2 1 已采纳 2015-07-21 11:08:12

解决方案1
1 2015-07-21 10:58:04

解决方案2
1 已采纳 2015-07-21 11:08:12