简体   繁体   English

如何在Shell脚本中使用2个数组进行for循环?

[英]How do I do a for loop with 2 arrays in shell script?

I have to first declare two arrays which I also need help with. 我必须首先声明两个数组,这些数组我也需要帮助。

Originally, it's two single variables. 最初,它是两个单个变量。

day=$(hadoop fs -ls -R /user/hive/* | 
        awk '/filename.txt.gz/' |
        tail -1 | 
        date -d $(echo `awk '{print $6}'`) '+%b %-d' | 
        tr -d ' ')

time_stamp=$(hadoop fs -ls -R /user/hive/* | 
             awk '/filename.txt.gz/' |
             tail -1 | 
             awk '{ print $7 }')

Now instead of tail -1 , I need tail -5 . 现在我不需要tail -5 tail -1 ,而是需要tail -5 So first, how do I make these two arrays? 那么首先,我如何制作这两个数组?

Second question, how do I make a for loop with each value from the paired values of $day and $time_stamp ? 第二个问题,如何使用$day$time_stamp的配对值中的每个值进行for循环? I can't use array_combine because I need to perform actions on each array separately. 我不能使用array_combine,因为我需要分别对每个数组执行操作。 Thanks 谢谢

You are collecting the data into strings, not arrays. 您正在将数据收集到字符串中,而不是数组中。 But additionally, your code should probably be refactored significantly -- as a general rule of thumb, if something happens in Awk, most of the rest should also happen in Awk. 但是此外,您的代码可能应该进行重大重构-作为一般经验法则,如果在Awk中发生了某些事情,那么其余大部分也应该在Awk中发生。

You assign to an array with variable=(values of array) and to get the values from a subprocess, it's variable=($(command to produce values)) . 您将variable=(values of array)分配给具有variable=(values of array)并从子过程中获取值,它是variable=($(command to produce values))

Here's a first attempt at refactoring your code. 这是重构代码的首次尝试。

# Avoid repeated code -- break this out into a function
extract_field () {
    hadoop fs -ls -R /user/hive/* | 
    # Get rid of the tail and the repeated Awk
    # Notice backslashes in regex
    # Pass in the field to extract as a parameter
    awk -v field="$1" '/filename\.txt\.gz/ { d[++i]=$field }
        END { for(j=i-5; j<=i; ++j) print d[j] }'
)

day=($(extract_field 6 |
    # Refactor accordingly
    # And if you don't want a space in the format string, don't put a space in the format string in the first place
    xargs -i {} date -d {} '+%b%-d'))

time_stamp=($(extract_field 7))

I'm highly skeptical of the arrangement to call the Hadoop command twice, though. 我对两次调用Hadoop命令的安排表示高度怀疑。 Perhaps just extract fields 6 and 7 in a single go and then post-process the results to get them into two separate arrays. 也许只需要一次提取字段6和7,然后对结果进行后处理就可以将它们分成两个单独的数组。 Something like this instead then? 像这样的东西呢?

combined=($(hadoop fs -ls -R /user/hive/* | 
    awk '/filename\.txt\.gz/ { d[++i]=$6 " " $7 }
        END { for(j=i-5; j<=i; ++j) print d[j] }'))
for ((i=0; i<"${#combined[@]}"; ++i)); do
    day[$i]="$(date -d "${combined[i]% *}" +'%b%-d')"
    time_stamp[$i]="${combined[i]#* }"
done
unset combined

The statement that you need to handle the dates and times independently from each other sounds suspicious; 您需要彼此独立处理日期和时间的声明听起来很可疑; if you can find a way to avoid doing that, perhaps after all don't split combined into two separate arrays. 如果你能找到一种方法来避免这样做,也许毕竟不裂combined成两个单独的阵列。 The code above reveals how to extract the date and the time from a value in combined (the mechanism is called parameter substitution ). 上面的代码展示了如何从combined值中提取日期和时间(该机制称为参数替换 )。 It also obviously demonstrates how to loop over the indices in an array. 显然,它还演示了如何遍历数组中的索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM