简体   繁体   English

创建一个wget Bash脚本

[英]Creating a wget Bash Script

Im creating a wget script to download & mirror a site . 我创建了一个wget脚本来下载和镜像站点 The URLs are taken from a text file. URL来自文本文件。 I have nearly created the whole script, but now I need to make it perfect. 我几乎创建了整个脚本,但是现在我需要使其完美。 It is to be used for 3 hours every day, so it should continue where it last ended. 每天要使用3个小时,因此应该在最后结束的地方继续使用。
I have provided my script below, if anyone finds it useful may use it but keep my name in the script. 我在下面提供了我的脚本,如果有人发现它有用,可以使用它,但请在脚本中保留我的名字。

Problems with script: 脚本问题:

The script is not referencing its links correctly by making it referable to the file in the parent directory, please tell me about that. 脚本无法通过引用父目录中的文件来正确引用其链接 ,请告诉我。
The Script is not resuming after being aborted in the middle even with the --continue parameter 即使使用--continue参数,脚本在中间中止后也不会恢复

#       Created by Salik Sadruddin Merani
#       email: ssm14293@gmail.com
#       site: http://www.dragotech-innovations.tk
clear
echo '  Created by: Salik Sadruddin Merani'
echo '  email: ssm14293@gmail.com'
echo '  site: http://www.dragotech-innovations.tk'
echo
echo '  Info:'
echo '  This script will use the URLs provided in the File "urls.txt"'
echo '  Info: Logs will be saved in logfile.txt'
echo '  URLs are taken from the urls.txt file'
#
url=`< ./urls.txt`
useragent='Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0'
echo '  Mozilla Firefox User agent will be used'

cred='log=abc@123.org&pwd=abc123&wp-submit=Log In&redirect_to=http://abc@123.org/wp-admin/&testcookie=1'
echo '  Loaded Credentails'
echo '  Logging In'
wget --save-cookies cookies.txt --post-data ${cred} --keep-session-cookies http://members.ebenpagan.com/wp-login.php --delete-after

OIFS=$IFS
IFS=','
arr2=$url
for x in $arr2
do
    echo '      Loading Cookies'
    wget --spider --load-cookies cookies.txt --keep-session-cookies --mirror --convert-links --page-requisites ${x} -U ${useragent} -np --adjust-extension --continue -e robots=no --span-hosts --no-parent -o log-file-$x.txt
done
IFS=$OIFS

Regards 问候

The --continue flag in wget will attempt to resume the downloading of a single file in the current directory. wget中的--continue标志将尝试恢复当前目录中单个文件的下载。 Please refer to the man page of wget for more info. 有关更多信息,请参考wget的手册页。 It is quite detailed. 它很详细。

What you need is resuming the mirroring/downloading from where the script previously left off. 您需要从脚本先前停止的位置继续进行镜像/下载。

So, its more of a modification of script than some setting in wget. 因此,与其说是wget中的某些设置,不如说是对脚本的修改。 I can suggest a way to do that, but mind you, you can use a different approach as well. 我可以建议一种方法,但是请注意,您也可以使用其他方法。

Modify the URLs.txt file to have one URL per line. 修改URLs.txt文件以使每行具有一个URL。 Then refer this pseudocode - 然后参考这个伪代码-

  1. get the url from the file 从文件获取URL
  2. if (url ends with a token #DONE), continue 如果(网址以令牌#DONE结尾),请继续
  3. else, wget command 否则,wget命令
  4. append a token #DONE to the end of the url in the file 将令牌#DONE附加到文件中url的末尾

This way, you will know which URL to continue from, the next time you run the script. 这样,您将在下次运行脚本时知道从哪个URL继续。 All URLs that have a "#DONE" at the end will be skipped, and the rest will be downloaded. 所有末尾带有“ #DONE”的URL将被跳过,其余的将被下载。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM