简体   繁体   English

如何将 sed 应用于 grep 导致 bash 脚本?

[英]How to apply sed to a grep result in a bash script?

I want to modify a csv file with some sed commands but only on the rows that match certain regex.我想用一些 sed 命令修改 csv 文件,但仅限于与某些正则表达式匹配的行。

I have a grep command that works fine in a script:我有一个在脚本中运行良好的 grep 命令:

#!/usr/bin/bash   
   
egrep  '^[A-Z][a-z]*,2018' happiness.csv

And the desired sed commands that work correctly:以及正常工作的所需 sed 命令:

#!/usr/bin/bash   
       
sed  -re '

 s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
 s/[a-z]/\U&/g
 s/([0-9]+\.[0-9]{2})[0-9]+/\1/g
  
' happiness.csv

When I combine them together in one script the grep command gets omitted and the script runs only the sed commands:当我在一个脚本中将它们组合在一起时,grep 命令被省略,脚本仅运行 sed 命令:

#!/usr/bin/bash   
   
egrep  '^[A-Z][a-z]*,2018' happiness.csv

sed  -re '

 s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
 s/[a-z]/\U&/g
 s/([0-9]+\.[0-9]{2})[0-9]+/\1/g
  
' happiness.csv

Can anyone help?任何人都可以帮忙吗?

Sample data:样本数据:

Country name,Year,Life Ladder,Log GDP per capita,Social support,Healthy life expectancy at birth,Freedom to make life choices,Generosity,Perceptions of corruption,Positive affect,Negative affect,Confidence in national government,Democratic Quality,Delivery Quality,Standard deviation of ladder by country-year,Standard deviation/Mean of ladder by country-year,GINI index (World Bank estimate),"GINI index (World Bank estimate), average 2000-16","gini of household income reported in Gallup, by wp5-year","Most people can be trusted, Gallup","Most people can be trusted, WVS round 1981-1984","Most people can be trusted, WVS round 1989-1993","Most people can be trusted, WVS round 1994-1998","Most people can be trusted, WVS round 1999-2004","Most people can be trusted, WVS round 2005-2009","Most people can be trusted, WVS round 2010-2014"
Afghanistan,2008,3.723589897,7.168690205,0.450662315,50.79999924,0.718114316,0.177888572,0.88168633,0.517637193,0.25819549,0.61207211,-1.929689646,-1.655084372,1.774661899,0.476599723,,,,,,,,,,
Afghanistan,2009,4.401778221,7.333789825,0.55230844,51.20000076,0.678896368,0.200178429,0.850035429,0.583925605,0.23709242,0.611545205,-2.044092655,-1.635024786,1.722687602,0.391361743,,,0.441905767,0.286315262,,,,,,
Afghanistan,2018,4.75838089,7.386628628,0.539075196,51.59999847,0.60012722,0.13435255,0.706766069,0.61826545,0.275323808,0.299357414,-1.991810083,-1.617176056,1.878621817,0.394802749,,,0.327318162,0.275832713,,,,,,
Afghanistan,2011,3.83171916,7.415018559,0.521103561,51.91999817,0.495901406,0.172136664,0.731108546,0.611387312,0.267174691,0.307385713,-1.919018269,-1.616221189,1.78535974,0.465942234,,,0.336764246,,,,,,,
Afghanistan,2012,3.782937527,7.517126083,0.520636737,52.24000168,0.530935049,0.244272724,0.775619805,0.710384727,0.267919123,0.435440153,-1.842995763,-1.40407753,1.798283219,0.47536689,,,0.344539613,,,,,,,
Afghanistan,2013,3.572100401,7.522237778,0.48355186,52.56000137,0.577955365,0.070402659,0.8232041,0.620584846,0.273328096,0.482847273,-1.879708767,-1.403035522,1.223689914,0.342568725,,,0.304368466,,,,,,,
Afghanistan,2014,3.130895615,7.516955376,0.525568426,52.88000107,0.508514047,0.113184482,0.871241987,0.531691492,0.374860734,0.409047514,-1.773256779,-1.312502503,1.395396113,0.445685923,,,0.413973927,,,,,,,
Afghanistan,2015,3.982854605,7.500538826,0.528597236,53.20000076,0.388927579,0.089090675,0.880638301,0.553553164,0.339276046,0.260557145,-1.84436357,-1.29159379,2.16061759,0.542479634,,,0.59691757,,,,,,,
Albania,2018,4.220168591,7.497038364,0.559071779,53,0.522566199,0.051364917,0.793245554,0.564952672,0.348332286,0.324989557,-1.855426311,-1.392712831,1.796219468,0.42562741,,,0.418629497,,,,,,,

Desired output:所需的 output:

COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-1984","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-1993","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1994-1998","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-2004","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-2009","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-2014" 

AFGHANISTAN,2018,2.69,7.49,-0.50,52.59,0.37,-0.08,0.92,0.42,0.40,0.36,NULL,NULL,1.40,0.52,NULL,NULL,0.29,NULL,NULL,NULL,NULL,NULL,NULL,
ALBANIA,2018,4.63,9.07,-0.82,65.80,0.52,-0.01,0.87,0.55,0.24,0.30,-0.04,-0.42,1.76,0.38,NULL,0.30,NULL,NULL,NULL,NULL,0.24,0.23,NULL,       ARGENTINA,2018,5.48,9.16,-0.83,66.19,0.52,-0.16,0.86,0.64,0.27,NULL,0.04,-0.26,1.91,0.34,NULL,0.30,0.61,0.11,NULL,NULL,0.24,0.23,NULL,
    

You can search using same regex that you have in egrep and make sure to group all substitution commands:您可以使用egrep中的相同正则表达式进行搜索,并确保对所有替换命令进行分组:

sed -nE '/^[A-Z][a-z]*,2018/ {
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z]+/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+/\1/gp
}' happiness.csv

AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL

I'm not a bash pro but this should work:我不是 bash 专业人士,但这应该可以:

#!/usr/bin/bash   

grep_res=$(egrep  '^[Aa]+.*,2018' happiness.csv)

echo "$grep_res" | sed  -re '
s/(^|,)(,|$)/\1NULL\2/g; s/(^|,)(,|$)/\1NULL\2/g
s/[a-z]/\U&/g
s/([0-9]+\.[0-9]{2})[0-9]+/\1/g
' 

What it does is it saves output of the grep at grep_res variable and than feeds sed command with it.它的作用是将 grep 的 output 保存在grep_res变量中,然后用它来输入 sed 命令。

Here is the same solution with standard Linux awk (gawk) script.这是与标准 Linux awk (gawk) 脚本相同的解决方案。

Includes dealing with first line.包括处理第一行。

script.awk脚本.awk

{ $0 = toupper($0);}   #Upper case each incoming line
/^[A-Z]*,2018/ || NR == 1 {    # deal with first line or matching with /^[A-Z]*,2018/
  $0 = gensub(/([,])([,]|$)/, "\\1NULL\\2", "g", $0); # replace ,, with ,NULL,
  $0 = gensub(/([,])([,]|$)/, "\\1NULL\\2", "g", $0); # replace remaining ,, with ,NULL,
  $0 = gensub(/([0-9]+.[0-9])([0-9])([0-9])*/, "\\1\\2", "g", $0); # trim decimal point numbers
  print $0; # print output line
}

running跑步

 awk -f script.awk happiness.csv

output output

$ awk -f script.awk input.csv
COUNTRY NAME,YEAR,LIFE LADDER,LOG GDP PER CAPITA,SOCIAL SUPPORT,HEALTHY LIFE EXPECTANCY AT BIRTH,FREEDOM TO MAKE LIFE CHOICES,GENEROSITY,PERCEPTIONS OF CORRUPTION,POSITIVE AFFECT,NEGATIVE AFFECT,CONFIDENCE IN NATIONAL GOVERNMENT,DEMOCRATIC QUALITY,DELIVERY QUALITY,STANDARD DEVIATION OF LADDER BY COUNTRY-YEAR,STANDARD DEVIATION/MEAN OF LADDER BY COUNTRY-YEAR,GINI INDEX (WORLD BANK ESTIMATE),"GINI INDEX (WORLD BANK ESTIMATE), AVERAGE 2000-16","GINI OF HOUSEHOLD INCOME REPORTED IN GALLUP, BY WP5-YEAR","MOST PEOPLE CAN BE TRUSTED, GALLUP","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1981-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1989-19","MOST PEOPLE CAN BE TRUSTED, 
WVS ROUND 1994-19","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 1999-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2005-20","MOST PEOPLE CAN BE TRUSTED, WVS ROUND 2010-20"
AFGHANISTAN,2018,4.75,7.38,0.53,51.59,0.60,0.13,0.70,0.61,0.27,0.29,-1.99,-1.61,1.87,0.39,NULL,NULL,0.32,0.27,NULL,NULL,NULL,NULL,NULL,NULL
ALBANIA,2018,4.22,7.49,0.55,53,0.52,0.05,0.79,0.56,0.34,0.32,-1.85,-1.39,1.79,0.42,NULL,NULL,0.41,NULL,NULL,NULL,NULL,NULL,NULL,NULL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM