简体   繁体   English

Bash:循环遍历文件并读取子字符串作为参数,执行多个实例

[英]Bash: Loop through file and read substring as argument, execute multiple instances

How it is now 现在怎么样

I currently have a script running under windows that frequently invokes recursive file trees from a list of servers. 我目前有一个在Windows下运行的脚本,该脚本经常从服务器列表中调用递归文件树。

I use an AutoIt (job manager) script to execute 30 parallel instances of lftp (still windows), doing this: 我使用AutoIt(作业管理器)脚本执行30个lftp并行实例(静止窗口),执行以下操作:

lftp -e "find .; exit" <serveraddr>

The file used as input for the job manager is a plain text file and each line is formatted like this: 用作作业管理器输入的文件是纯文本文件,每行的格式如下:

<serveraddr>|...

where "..." is unimportant data. 其中“ ...”是不重要的数据。 I need to run multiple instances of lftp in order to achieve maximum performance, because single instance performance is determined by the response time of the server. 我需要运行lftp的多个实例以实现最佳性能,因为单个实例的性能取决于服务器的响应时间。

Each lftp.exe instance pipes its output to a file named 每个lftp.exe实例将其输出传递到名为

<serveraddr>.txt

How it needs to be 它需要如何

Now I need to port this whole thing over to a linux (Ubuntu, with lftp installed) dedicated server. 现在,我需要将整个过程移植到Linux(安装了lftp的Ubuntu)专用服务器上。 From my previous, very(!) limited experience with linux, I guess this will be quite simple. 从我以前非常有限的Linux使用经验来看,我想这将非常简单。

What do I need to write and with what? 我需要写些什么? For example, do I still need a job man script or can this be done in a single script? 例如,我是否仍需要工作手册或可以在一个脚本中完成? How do I read from the file (I guess this will be the easy part), and how do I keep a max. 如何从文件中读取(我想这将是最简单的部分),以及如何保持最大值。 amount of 30 instances running (maybe even with a timeout, because extremely unresponsive servers can clog the queue)? 30个实例的运行量(甚至可能超时,因为响应极慢的服务器可能会阻塞队列)?

Thanks! 谢谢!

Parallel processing 并行处理

I'd use GNU/parallel. 我会使用GNU / parallel。 It isn't distributed by default, but can be installed for most Linux distributions from default package repositories. 它不是默认分发的,但是可以从默认软件包存储库安装到大多数Linux分发中。 It works like this: 它是这样的:

parallel echo ::: arg1 arg2

will execute echo arg1 and and echo arg2 in parallel. 将并行执行echo arg1echo arg2

So the most easy approach is to create a script that synchronizes your server in bash/perl/python - whatever suits your fancy - and execute it like this: 因此,最简单的方法是创建一个脚本,使您的服务器在bash / perl / python中同步(无论您喜欢什么)-并按以下方式执行:

parallel ./script ::: server1 server2

The script could look like this: 该脚本可能如下所示:

#!/bin/sh
#$0 holds program name, $1 holds first argument.
#$1 will get passed from GNU/parallel. we save it to a variable.
server="$1"
lftp -e "find .; exit" "$server" >"$server-files.txt"

lftp seems to be available for Linux as well, so you don't need to change the FTP client. lftp似乎也可用于Linux,因此您无需更改FTP客户端。

To run max. 运行最大 30 instances at a time, pass a -j30 like this: parallel -j30 echo ::: 1 2 3 一次30个实例,像这样传递-j30parallel -j30 echo ::: 1 2 3

Reading the file list 读取文件列表

Now how do you transform specification file containing <server>|... entries to GNU/parallel arguments? 现在如何将包含<server>|...条目的规范文件转换为GNU / parallel参数? Easy - first, filter the file to contain just host names: 简单-首先,过滤文件以仅包含主机名:

sed 's/|.*$//' server-list.txt

sed is used to replace things using regular expressions, and more. sed用于使用正则表达式等替换事物。 This will strip everything ( .* ) after the first | 这将删除第一个|之后的所有内容( .*| up to the line end ( $ ). 直到行尾( $ )。 (While | normally means alternative operator in regular expressions, in sed, it needs to be escaped to work like that, otherwise it means just plain | .) (虽然|通常在sed中表示正则表达式中的替代运算符,但需要对其进行转义以使其那样工作,否则它仅表示普通| 。)

So now you have list of servers. 所以现在您有了服务器列表。 How to pass them to your script? 如何将它们传递给您的脚本? With xargs ! xargs xargs will put each line as if it was an additional argument to your executable. xargs将把每一行都当作可执行文件的附加参数。 For example 例如

echo -e "1\n2"|xargs echo fixed_argument

will run 会跑

echo fixed_argument 1 2

So in your case you should do 所以你应该做

sed 's/|.*$//' server-list.txt | xargs parallel -j30 ./script :::

Caveats 注意事项

Be sure not to save the results to the same file in each parallel task, otherwise the file will get corrupt - coreutils are simple and don't implement any locking mechanisms unless you implement them yourself. 确保不要在每个并行任务中将结果保存到同一文件中,否则文件将损坏-coreutils很简单,除非您自己实现,否则不要实现任何锁定机制。 That's why I redirected the output to $server-files.txt rather than files.txt . 这就是为什么我将输出重定向到$server-files.txt而不是files.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM