简体   繁体   English

使用 wget 下载 URL 内容后获取文件大小

[英]Get file size after downloading URL content with wget

I'm trying to write a bash script that will download the contents of a URL (not recursive) and then analyze the file that was downloaded.我正在尝试编写一个 bash 脚本,它将下载 URL 的内容(非递归),然后分析下载的文件。

If the downloaded file is a text file (ie index.html) I want to know the size of the file and count the number of characters within that file.如果下载的文件是文本文件(即 index.html),我想知道文件的大小并计算该文件中的字符数。

If the file is an image file I just want to know the file size.如果文件是图像文件,我只想知道文件大小。

Right now I'm working with wget and downloading the contents of the input URL, but the problem is that when I do this inside my script I don't know the file name of the file that was downloaded.现在我正在使用wget并下载输入 URL 的内容,但问题是当我在脚本中执行此操作时,我不知道已下载文件的文件名。

So, the two main question are:所以,两个主要问题是:

  1. How can I get the filename in my script after using wget to perform some analyzing operations on the file?使用wget对文件执行一些分析操作后,如何在脚本中获取文件名?
  2. How can I deterime the file type of the downloaded file?如何确定下载文件的文件类型?

I would suggest setting the file name wget will write to, using the -O switch.我建议使用-O开关设置wget将写入的文件名。 One can then generate a file name, tell wget to download the URL to that file name, and run whatever analysis tools one wants, using the file name you picked.然后可以生成一个文件名,告诉wget将 URL 下载到该文件名,然后使用您选择的文件名运行任何想要的分析工具。

The idea here is, you not have to figure out what name the web site or URL or wget will pick -- you are controlling the parameters.这里的想法是,您不必弄清楚 web 站点或 URL 或wget将选择什么名称 - 您正在控制参数。 That is a useful programming technique in general.一般来说,这是一种有用的编程技术。 The less the user or some external program or website can provide for input, the more robust and simpler your program code will be.用户或某些外部程序或网站提供的输入越少,您的程序代码就越健壮和简单。

As for picking a file name, you could use a timestamp.至于选择文件名,您可以使用时间戳。 The date utility can generate a timestamp for you, if you give it a +FORMAT parameter.如果您给它一个+FORMAT参数, date实用程序可以为您生成一个时间戳。 Alternatively, since you mention this is part of an analysis tool, maybe you don't want to save the file at all.或者,由于您提到这是分析工具的一部分,因此您可能根本不想保存该文件。 In that case, try a tool like mktemp to generate a guaranteed unique file name, and then remove it before exiting.在这种情况下,请尝试使用mktemp之类的工具来生成保证唯一的文件名,然后在退出之前将其删除。

For more information, see the manual pages wget(1) , date(1) , and mktemp(1) .有关详细信息,请参阅手册页wget(1)date(1)mktemp(1)

Not giving complete working code, in case anyone ever gets this as school assignment, and they stumble across this question.不提供完整的工作代码,以防有人将其作为学校作业,他们偶然发现了这个问题。 I wouldn't want to make it too easy for that hypothetical person.我不想让那个假设的人太容易。 ;-) Of course, if someone asked more specific questions, I'd likely clarify my answer for them. ;-) 当然,如果有人问了更具体的问题,我可能会为他们澄清我的答案。

I did finally manage to solve it.我终于设法解决了。

#!usr/bin/env bash
URL="$1"
FILENAME=$(date +%y-%m-%d-%T) #Set the current date and time as the filename
wget -O "$FILENAME" "$URL"    #Download the content from the URL and set the filename
FILE_INFO=$(file "$FILENAME") #Store the output from the 'file' command

if [[ "$FILE_INFO" == *"text"* ]]
then 
 echo "It's a text file"
elif [[ "$FILE_INFO" == *"image"* ]]
then 
 echo "It's an image"
fi

Special thanks to Ben Scott for the help!特别感谢 Ben Scott 的帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM