按特定列中的一系列值过滤.csv 文件，而不使用 awk 或 sed

Question

I have a csv file where the data is stored like this, with a space as the delimiting character:我有一个 csv 文件，其中数据存储如下，空格作为分隔符：

181.221.132.87 2020-03-01T06:22:47.775Z "GET / HTTP/1.1" 200 1 "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko"

I have to print all the lines where the 5th column (in this case the column with the value "1") has a value greater than 5. The catch is that I am limited in the unix commands I can use and have been told that I specifically cannot use awk or sed.我必须打印第 5 列（在本例中为值为“1”的列）的值大于 5 的所有行。问题是我在 unix 命令中受到限制，我可以使用并被告知我特别不能使用 awk 或 sed。 Anything that cannot be accomplished with the list of commands provided to us must be implemented with custom C programs however, the emphasis is to use custom programs as little as possible.提供给我们的命令列表无法完成的任何事情都必须使用自定义 C 程序来实现，但是，重点是尽可能少地使用自定义程序。

Unix commands I can use are as follows: cat curl cut echo exec egrep find grep head ls paste printf sort tail tr uniq wc Unix commands I can use are as follows: cat curl cut echo exec egrep find grep head ls paste printf sort tail tr uniq wc

Sorry if a similar question has been asked before but I cannot find a starting point that doesn't include awk or sed抱歉，如果之前有人问过类似的问题，但我找不到不包括 awk 或 sed 的起点

EDIT:编辑：

{ egrep " "[5-9]{1}" " file.csv; egrep " "[0-9]{2}" "file.csv; }

The above command seems to give the correct output, however I feel there is a better solution.上面的命令似乎给出了正确的 output，但是我觉得有更好的解决方案。

Answer 1

Here's what I came up with!这就是我想出的！

egrep --color '^(("[^"]*"|[^"]\S*)\s+){4}([1-9][0-9]|[6-9])' file.csv

Explanation解释

^ is the start of the line ^是行首
("[^"]*"|[^"]\S*)\s+) is one cell, it's composed of 2 possibilities: ("[^"]*"|[^"]\S*)\s+)是一个单元格，它由两种可能性组成：
- "[^"]*" This is a string cell, surrounded by quotes, and which cannot contain any quotes in its body "[^"]*"这是一个字符串单元格，被引号包围，其正文中不能包含任何引号
- [^"]\S*)\s+ This is a normal cell, which can contain anything except white spaces ( \s is a white space, \S is a non white space) [^"]\S*)\s+这是一个普通单元格，可以包含除空格以外的任何内容（ \s是空格， \S是非空格）
{4} We repeat that 4 times, for the first 4 cells {4}对于前 4 个单元格，我们重复 4 次
([1-9][0-9]|[6-9]) This is your number, composed of 2 possibilities again: ([1-9][0-9]|[6-9])这是您的号码，再次由 2 种可能性组成：
- [1-9][0-9] It's more than 10 [1-9][0-9]超过 10 个
- [6-9] It's more than 5 [6-9] 5个以上

As for the color flag, it... adds color to the command, it makes it easier when you're creating a regex on the go, to have a visual representation of what is matched:至于颜色标志，它...为命令添加颜色，当您在 go 上创建正则表达式时，它可以更轻松地直观地表示匹配的内容：

On some system the --color is there by default, so you might not see the difference在某些系统上，默认情况下--color存在，因此您可能看不到区别

Answer 2

Without grep不带 grep

cat log| while read line
do
  v=`echo $line | cut -d'"' --output-delimiter=' ' -f1,3 | tr -s ' '|cut -f4 -d' ' `
  if [ "$v" -gt 5 ]
  then
    echo $line
  fi                               
done

read the file line by line with while read line使用while read line读取文件

split " with cut odd are fields without " even values inside the " split " 与cut odd 是在 " 内没有 " 偶数值的字段

cut -d'"' --output-delimiter=' ' -f1,3

give 181.221.132.87 2020-03-01T06:22:47.775Z 200 1给 181.221.132.87 2020-03-01T06:22:47.775Z 200 1

remove double spaces with tr用 tr 删除双空格

cut -d'"' --output-delimiter=' ' -f1,3 | tr -s ' '

gives给

181.221.132.87 2020-03-01T06:22:47.775Z 200 1

get the field at pos 4 with cut通过 cut 在 pos 4 获得场地

cut -d'"' --output-delimiter=' ' -f1,3 | tr -s ' '|cut -f4 -d' '

gives给

then check against 5 in pure bash [ "$v" -gt 5 ]然后在纯 bash [ "$v" -gt 5 ]中检查 5

按特定列中的一系列值过滤.csv 文件，而不使用 awk 或 sed

问题描述

2 个解决方案

解决方案1
0 已采纳 2020-04-25 13:48:06

Explanation解释

解决方案2
0 2020-04-25 14:56:11

按特定列中的一系列值过滤.csv 文件，而不使用 awk 或 sed

问题描述

2 个解决方案

解决方案1 0 已采纳 2020-04-25 13:48:06

Explanation解释

解决方案2 0 2020-04-25 14:56:11

解决方案1
0 已采纳 2020-04-25 13:48:06

解决方案2
0 2020-04-25 14:56:11