简体   繁体   English

管道卷曲输出到grep

[英]Piping curl output into grep

Just a little disclaimer, I am not very familiar with programming so please excuse me if I'm using any terms incorrectly/in a confusing way. 只是一点免责声明,我对编程不是很熟悉所以请原谅我,如果我错误地使用任何术语/以一种令人困惑的方式。

I want to be able to extract specific information from a webpage and tried doing this by piping the output of a curl function into grep. 我希望能够从网页中提取特定信息,并尝试通过将curl函数的输出传递给grep来完成此操作。 Oh and this is in cygwin if that matters. 哦,如果重要的话,这是在cygwin。

When just typing in 刚输入时

$ curl www.ncbi.nlm.nih.gov/gene/823951

The terminal prints the whole webpage in what I believe to be html. 终端打印整个网页我认为是HTML。 From here I thought I could just pipe this output into a grep function with whatever search term want with: 从这里我想我可以将这个输出管道输出到一个grep函数,无论搜索术语需要什么:

  $ curl www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene Symbol"

But instead of printing the webpage at all, the terminal gives me: 但终端不是打印网页,而是给我:

 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  142k    0  142k    0     0  41857      0 --:--:--  0:00:03 --:--:-- 42083

Can anyone explain why it does this/how I can search for specific lines of text in a webpage? 任何人都可以解释为什么它这样做/如何在网页中搜索特定的文本行? I eventually want to compile information like gene names, types, and descriptions into a database, so I was hoping to export the results from the grep function into a text file after that. 我最终想要将基因名称,类型和描述等信息编译到数据库中,所以我希望在此之后将grep函数的结果导出到文本文件中。

Any help is extremely appreciated, thanks in advance! 非常感谢任何帮助,提前感谢!

Curl detects that it is not outputting to a terminal, and shows you the Progress Meter. Curl检测到它没有输出到终端,并显示进度表。 You can suppress the progress meter with -s. 您可以使用-s抑制进度表。

The HTML data is indeed being sent to grep. HTML数据确实被发送到grep。 However that page does not contain the text "Gene Symbol". 但是该页面不包含文本“基因符号”。 Grep is case-sensitive (unless invoked with -i) and you are looking for "Gene symbol". Grep区分大小写(除非使用-i调用),并且您正在寻找“Gene symbol”。

$ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep "Gene symbol"
    <dt class="noline"> Gene symbol </dt>

You probably also want the next line of HTML, which you can make grep output with the -A option: 您可能还需要下一行HTML,您可以使用-A选项生成grep输出:

$ curl -s www.ncbi.nlm.nih.gov/gene/823951 | grep -A1 "Gene symbol"
    <dt class="noline"> Gene symbol </dt>
    <dd class="noline">AT3G47960</dd>

See man curl and man grep for more information about these and other options. 有关这些和其他选项的更多信息,请参阅man curlman grep

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM