How do I do a for loop with 2 arrays in shell script?

Question

I have to first declare two arrays which I also need help with.

Originally, it's two single variables.

day=$(hadoop fs -ls -R /user/hive/* | 
        awk '/filename.txt.gz/' |
        tail -1 | 
        date -d $(echo `awk '{print $6}'`) '+%b %-d' | 
        tr -d ' ')

time_stamp=$(hadoop fs -ls -R /user/hive/* | 
             awk '/filename.txt.gz/' |
             tail -1 | 
             awk '{ print $7 }')

Now instead of tail -1 , I need tail -5 . So first, how do I make these two arrays?

Second question, how do I make a for loop with each value from the paired values of $day and $time_stamp ? I can't use array_combine because I need to perform actions on each array separately. Thanks

Answer 1

You are collecting the data into strings, not arrays. But additionally, your code should probably be refactored significantly -- as a general rule of thumb, if something happens in Awk, most of the rest should also happen in Awk.

You assign to an array with variable=(values of array) and to get the values from a subprocess, it's variable=($(command to produce values)) .

Here's a first attempt at refactoring your code.

# Avoid repeated code -- break this out into a function
extract_field () {
    hadoop fs -ls -R /user/hive/* | 
    # Get rid of the tail and the repeated Awk
    # Notice backslashes in regex
    # Pass in the field to extract as a parameter
    awk -v field="$1" '/filename\.txt\.gz/ { d[++i]=$field }
        END { for(j=i-5; j<=i; ++j) print d[j] }'
)

day=($(extract_field 6 |
    # Refactor accordingly
    # And if you don't want a space in the format string, don't put a space in the format string in the first place
    xargs -i {} date -d {} '+%b%-d'))

time_stamp=($(extract_field 7))

I'm highly skeptical of the arrangement to call the Hadoop command twice, though. Perhaps just extract fields 6 and 7 in a single go and then post-process the results to get them into two separate arrays. Something like this instead then?

combined=($(hadoop fs -ls -R /user/hive/* | 
    awk '/filename\.txt\.gz/ { d[++i]=$6 " " $7 }
        END { for(j=i-5; j<=i; ++j) print d[j] }'))
for ((i=0; i<"${#combined[@]}"; ++i)); do
    day[$i]="$(date -d "${combined[i]% *}" +'%b%-d')"
    time_stamp[$i]="${combined[i]#* }"
done
unset combined

The statement that you need to handle the dates and times independently from each other sounds suspicious; if you can find a way to avoid doing that, perhaps after all don't split combined into two separate arrays. The code above reveals how to extract the date and the time from a value in combined (the mechanism is called parameter substitution ). It also obviously demonstrates how to loop over the indices in an array.

How do I do a for loop with 2 arrays in shell script?

Question

1 answers

solution1
0 ACCPTED 2018-08-17 08:02:32

How do I do a for loop with 2 arrays in shell script?

Question

1 answers

solution1 0 ACCPTED 2018-08-17 08:02:32

solution1
0 ACCPTED 2018-08-17 08:02:32