I have a number of irregular .txt files formatted from the .csv ones. Files contain following data delimited by the semicolon:
A;B;C;D;E;F;G;H;
A;B;C;D;E;F;G;H;I;J;K;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;
What I would like to do is to take the specific values from each line. Example of code which i used looks as follows and works well when the lines which contain the same no. of delimiters:
OIFS=$IFS
IFS=";"
while read var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
do
echo $var2, $var6, $var7, $var8
done < test.txt
IFS=$OIFS
But I'm stucked with the implementation of the code which will count the no. of ";" and apply specific action. Each line's column "B" and whatever exist after column "E" should be taken into account. Minimum no of ";" in each line is 8, while the maximum is 20 (with the increment of "3"). Desired output is:
For lines containing 8 ";"
echo $B { $F { $G:$H } }
For lines including 11 ";"
echo $B { $F { $G:$H } $I { $J:$K } }
For lines with 14 ";"
echo $B { $F { $G:$H } $I { $J:$K } $L { $M:$N } }
And so on. Is it doable in bash ?
Thank you.
I'm not sure I fully understand what you want to do, but this might help as a first step.
Each line's column "B" and whatever exist after column "E" should be taken into account.
For this you can use the cut
command:
cut -d ';' -f 2,6-
Where -d ';'
sets the delimiter and -f 2,6-
selects fields 2 and 6 onwards.
This will select columns $B
and columns $F
onwards.
You can also change the delimiter that is output by using --output-delimiter
Read each line into an array using the -a
option to read
; this makes dealing with variable-length lines much easier.
while IFS=';' read -a vars; do
printf "%s {" "${vars[1]}"
for ((i=5; i<${#vars[@]}; i+=3)); do
printf " %s { %s %s }" "${vars[@]:i:3}"
done
printf " }\n"
done < test.txt
Alternatively, you can use python to do what you want (if I understood it correctly):
import fileinput
# http://stackoverflow.com/questions/34576772/bash-iterating-over-file-with-irregular-line-arguments/34576899#34576899
def columns_are_valid(columns):
return len(columns) >= 8 and len(columns) % 3 == 2
# Returns every three columns as a tuple
# Example: 1,2,3,4,4,5,6,7,8,9 -> (1,2,3) , (4,5,6) , (7,8,9)
def every_three(rest_columns):
it = iter(rest_columns)
while True:
yield next(it), next(it), next(it)
for line in fileinput.input():
line = line.rstrip(';\n') # remove trailing newline and ';'
columns = line.split(';') # split by ';'
assert columns_are_valid(columns)
column_b = columns[1]
# Selects columns F onwards
columns_f_onwards = columns[5:]
# Format parts like '$F { $G:$H }'
parts = [ '%s {%s:%s}' % (a,b,c) for a,b,c in every_three(columns_f_onwards) ]
space_delimited_parts = ' '.join(parts)
print '{ %s { %s }' % (column_b, space_delimited_parts)
Example run:
% python myscript.py
With input:
A;B;C;D;E;F;G;H;
A;B;C;D;E;F;G;H;I;J;K;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;
Outputs:
{ B { F {G:H} }
{ B { F {G:H} I {J:K} }
{ B { F {G:H} I {J:K} L {M:N} }
{ B { F {G:H} I {J:K} L {M:N} O {P:Q} }
A Bash only solution:
#!/bin/bash
OLD_IFS=$IFS
IFS=";"
while read line; do
set -- $line
echo -n "$2 { "
shift 5
while [[ -n $1 ]];do
echo -n "$1 { $2:$3 } "
shift 3
done
echo "}"
done < data
IFS=$OLD_IFS
Input file:
$ cat data
A;B;C;D;E;F;G;H;
A;B;C;D;E;F;G;H;I;J;K;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;
A;B;C;D;E;F;G;H;I;J;K;L;M;N;O;P;Q;
Result:
$ ./script.sh
B { F { G:H } }
B { F { G:H } I { J:K } }
B { F { G:H } I { J:K } L { M:N } }
B { F { G:H } I { J:K } L { M:N } O { P:Q } }
Solution 2
Same but with arrays
#!/bin/bash
OLD_IFS=$IFS
IFS=";"
os=5
while read line;do
c=0
a=($line)
echo -n "${a[1]} { "
while [[ -n ${a[$((os+c*3))]} ]];do
echo -n "${a[$((os+c*3))]} { "
echo -n "${a[$((os+c*3+1))]}:${a[$((os+c*3+2))]} } "
((c++))
done
echo "}"
done < data
IFS=$OLD_IFS
I think you are doing well so far! You just need some small hints:
${x}
in the vars. read -r
not simple read
. The next code is how you can do when you know you have a small number of fields. You have at most 20 fields now, so you can add more vars and code to the first solution:
while IFS=";" read -r var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12 var13 var14; do
echo $var2, $var6, $var7, $var8
if [ -z "${var9}" ]; then
echo "Line without 8 delimiters"
elif [ -z "${var10}${var11}${var12}" ]; then
echo "Line with 9 delimiters"
else
echo "Line with more than 9 delimiters"
fi
done
I did not complete the code above, since it is not well structured.
You would like to implement this with a function to take care of a repeating group.
function repeatgroup {
output=""
remaining="$*"
printf "{ "
while [ -n "${remaining}" ]; do
rem1=$(echo "$remaining" | cut -d";" -f1)
rem2=$(echo "$remaining" | cut -d";" -f2)
rem3=$(echo "$remaining" | cut -d";" -f3)
remaining=$(echo "$remaining" | cut -d";" -f4-)
printf "%s {%s:%s} " "${rem1}" "${rem2}" "${rem3}"
done
}
while IFS=";" read -r var1 var2 var3 var4 var5 remaining; do
if [ -z "${var5}${remaining}" ]; then
echo "field shortage"
elif [ -z "${remaining}" ]; then
echo "Line without 8 delimiters"
echo "{ ${var2} }"
else
printf "{ %s " "${var2}"
repeatgroup "${remaining}"
printf "}\n"
fi
done < input
Remark:
Both rem1=$(echo "$remaining" | cut -d";" -f1)
and remaining=$(echo "$remaining" | cut -d";" -f4-)
can be written using internal Bash functions, but I thought the code will get hard to understand. When you need to parse large files, you can try that first.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.