[英]How do I do a for loop with 2 arrays in shell script?
I have to first declare two arrays which I also need help with. 我必须首先声明两个数组,这些数组我也需要帮助。
Originally, it's two single variables. 最初,它是两个单个变量。
day=$(hadoop fs -ls -R /user/hive/* |
awk '/filename.txt.gz/' |
tail -1 |
date -d $(echo `awk '{print $6}'`) '+%b %-d' |
tr -d ' ')
time_stamp=$(hadoop fs -ls -R /user/hive/* |
awk '/filename.txt.gz/' |
tail -1 |
awk '{ print $7 }')
Now instead of tail -1
, I need tail -5
. 现在我不需要tail -5
tail -1
,而是需要tail -5
。 So first, how do I make these two arrays? 那么首先,我如何制作这两个数组?
Second question, how do I make a for
loop with each value from the paired values of $day
and $time_stamp
? 第二个问题,如何使用$day
和$time_stamp
的配对值中的每个值进行for
循环? I can't use array_combine because I need to perform actions on each array separately. 我不能使用array_combine,因为我需要分别对每个数组执行操作。 Thanks 谢谢
You are collecting the data into strings, not arrays. 您正在将数据收集到字符串中,而不是数组中。 But additionally, your code should probably be refactored significantly -- as a general rule of thumb, if something happens in Awk, most of the rest should also happen in Awk. 但是此外,您的代码可能应该进行重大重构-作为一般经验法则,如果在Awk中发生了某些事情,那么其余大部分也应该在Awk中发生。
You assign to an array with variable=(values of array)
and to get the values from a subprocess, it's variable=($(command to produce values))
. 您将variable=(values of array)
分配给具有variable=(values of array)
并从子过程中获取值,它是variable=($(command to produce values))
。
Here's a first attempt at refactoring your code. 这是重构代码的首次尝试。
# Avoid repeated code -- break this out into a function
extract_field () {
hadoop fs -ls -R /user/hive/* |
# Get rid of the tail and the repeated Awk
# Notice backslashes in regex
# Pass in the field to extract as a parameter
awk -v field="$1" '/filename\.txt\.gz/ { d[++i]=$field }
END { for(j=i-5; j<=i; ++j) print d[j] }'
)
day=($(extract_field 6 |
# Refactor accordingly
# And if you don't want a space in the format string, don't put a space in the format string in the first place
xargs -i {} date -d {} '+%b%-d'))
time_stamp=($(extract_field 7))
I'm highly skeptical of the arrangement to call the Hadoop command twice, though. 我对两次调用Hadoop命令的安排表示高度怀疑。 Perhaps just extract fields 6 and 7 in a single go and then post-process the results to get them into two separate arrays. 也许只需要一次提取字段6和7,然后对结果进行后处理就可以将它们分成两个单独的数组。 Something like this instead then? 像这样的东西呢?
combined=($(hadoop fs -ls -R /user/hive/* |
awk '/filename\.txt\.gz/ { d[++i]=$6 " " $7 }
END { for(j=i-5; j<=i; ++j) print d[j] }'))
for ((i=0; i<"${#combined[@]}"; ++i)); do
day[$i]="$(date -d "${combined[i]% *}" +'%b%-d')"
time_stamp[$i]="${combined[i]#* }"
done
unset combined
The statement that you need to handle the dates and times independently from each other sounds suspicious; 您需要彼此独立处理日期和时间的声明听起来很可疑; if you can find a way to avoid doing that, perhaps after all don't split combined
into two separate arrays. 如果你能找到一种方法来避免这样做,也许毕竟不裂combined
成两个单独的阵列。 The code above reveals how to extract the date and the time from a value in combined
(the mechanism is called parameter substitution ). 上面的代码展示了如何从combined
值中提取日期和时间(该机制称为参数替换 )。 It also obviously demonstrates how to loop over the indices in an array. 显然,它还演示了如何遍历数组中的索引。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.