简体   繁体   中英

Bash iterating over file with irregular line arguments

I have a number of irregular .txt files formatted from the .csv ones. Files contain following data delimited by the semicolon:

A;B;C;D;E;F;G;H;
A;B;C;D;E;F;G;H;I;J;K;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;

What I would like to do is to take the specific values from each line. Example of code which i used looks as follows and works well when the lines which contain the same no. of delimiters:

OIFS=$IFS
IFS=";"
while read var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
do
echo $var2, $var6, $var7, $var8
done < test.txt
IFS=$OIFS

But I'm stucked with the implementation of the code which will count the no. of ";" and apply specific action. Each line's column "B" and whatever exist after column "E" should be taken into account. Minimum no of ";" in each line is 8, while the maximum is 20 (with the increment of "3"). Desired output is:

For lines containing 8 ";"

echo $B { $F { $G:$H } }

For lines including 11 ";"

echo $B { $F { $G:$H } $I { $J:$K } }

For lines with 14 ";"

echo $B { $F { $G:$H } $I { $J:$K } $L { $M:$N } }

And so on. Is it doable in bash ?
Thank you.

I'm not sure I fully understand what you want to do, but this might help as a first step.

Each line's column "B" and whatever exist after column "E" should be taken into account.

For this you can use the cut command:

cut -d ';' -f 2,6-

Where -d ';' sets the delimiter and -f 2,6- selects fields 2 and 6 onwards.

This will select columns $B and columns $F onwards.

You can also change the delimiter that is output by using --output-delimiter

Read each line into an array using the -a option to read ; this makes dealing with variable-length lines much easier.

while IFS=';' read -a vars; do
    printf "%s {" "${vars[1]}"
    for ((i=5; i<${#vars[@]}; i+=3)); do
        printf " %s { %s %s }" "${vars[@]:i:3}"
    done
    printf " }\n"
done < test.txt

Alternatively, you can use python to do what you want (if I understood it correctly):

import fileinput

# http://stackoverflow.com/questions/34576772/bash-iterating-over-file-with-irregular-line-arguments/34576899#34576899

def columns_are_valid(columns):
    return len(columns) >= 8 and len(columns) % 3 == 2

# Returns every three columns as a tuple
# Example: 1,2,3,4,4,5,6,7,8,9  ->  (1,2,3) , (4,5,6) , (7,8,9)
def every_three(rest_columns):
    it = iter(rest_columns)
    while True:
        yield next(it), next(it), next(it)


for line in fileinput.input():
    line = line.rstrip(';\n')  # remove trailing newline and ';'
    columns = line.split(';') # split by ';'
    assert columns_are_valid(columns)

    column_b = columns[1]

    # Selects columns F onwards
    columns_f_onwards = columns[5:]

    # Format parts like '$F { $G:$H }'
    parts = [ '%s {%s:%s}' % (a,b,c) for a,b,c in every_three(columns_f_onwards) ]
    space_delimited_parts = ' '.join(parts)

    print '{ %s { %s }' % (column_b, space_delimited_parts)

Example run:

 % python myscript.py

With input:

A;B;C;D;E;F;G;H;
A;B;C;D;E;F;G;H;I;J;K;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;

Outputs:

{ B { F {G:H} }
{ B { F {G:H} I {J:K} }
{ B { F {G:H} I {J:K} L {M:N} }
{ B { F {G:H} I {J:K} L {M:N} O {P:Q} }

A Bash only solution:

#!/bin/bash

OLD_IFS=$IFS
IFS=";"
while read line; do
    set -- $line
    echo -n "$2 { "
    shift 5
    while [[ -n $1 ]];do
        echo -n "$1 { $2:$3 } "
        shift 3
    done
    echo "}"
done < data
IFS=$OLD_IFS

Input file:

$ cat data 
A;B;C;D;E;F;G;H;
A;B;C;D;E;F;G;H;I;J;K;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;

Result:

$ ./script.sh 
B { F { G:H } }
B { F { G:H } I { J:K } }
B { F { G:H } I { J:K } L { M:N } }
B { F { G:H } I { J:K } L { M:N } O { P:Q } }

Solution 2

Same but with arrays

#!/bin/bash

OLD_IFS=$IFS
IFS=";"
os=5
while read line;do
    c=0
    a=($line)
    echo -n "${a[1]} { "
    while [[ -n ${a[$((os+c*3))]} ]];do
        echo -n "${a[$((os+c*3))]} { "
        echo -n "${a[$((os+c*3+1))]}:${a[$((os+c*3+2))]} } "
        ((c++))
    done
    echo "}"
done < data
IFS=$OLD_IFS

I think you are doing well so far! You just need some small hints:

  • You can set a shell variable for one command
    A changed the solution of IFS a bit.
  • You can check the remaing vars and see if the are empty
  • I will use ${x} in the vars.
    Not needed for this code but a good habit.
  • Use read -r not simple read .

The next code is how you can do when you know you have a small number of fields. You have at most 20 fields now, so you can add more vars and code to the first solution:

while IFS=";" read -r var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14; do
      echo $var2, $var6, $var7, $var8
      if [ -z "${var9}" ]; then
         echo "Line without 8 delimiters"
      elif [ -z "${var10}${var11}${var12}" ]; then
         echo "Line with 9 delimiters"
      else
         echo "Line with more than 9 delimiters"
      fi   
 done

I did not complete the code above, since it is not well structured.
You would like to implement this with a function to take care of a repeating group.

function repeatgroup {
   output=""
   remaining="$*"
   printf "{ "
   while [ -n "${remaining}" ]; do
       rem1=$(echo "$remaining" | cut -d";" -f1)
       rem2=$(echo "$remaining" | cut -d";" -f2)
       rem3=$(echo "$remaining" | cut -d";" -f3)
       remaining=$(echo "$remaining" | cut -d";" -f4-)
       printf "%s {%s:%s} " "${rem1}" "${rem2}" "${rem3}"
   done
}

    while IFS=";" read -r var1 var2 var3 var4 var5 remaining; do
          if [ -z "${var5}${remaining}" ]; then
             echo "field shortage"
          elif [ -z "${remaining}" ]; then
             echo "Line without 8 delimiters"
             echo "{ ${var2} }"
          else
             printf "{ %s " "${var2}"
             repeatgroup "${remaining}"
             printf "}\n"
          fi
     done < input

Remark:
Both rem1=$(echo "$remaining" | cut -d";" -f1) and remaining=$(echo "$remaining" | cut -d";" -f4-) can be written using internal Bash functions, but I thought the code will get hard to understand. When you need to parse large files, you can try that first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM