簡體   English   中英

從包含多個與 grep 或 sed 的相似匹配項的單行中過濾出單個字符串實例?

[英]Filter out single instance of string from a single line containing multiple similar matches with grep or sed?

我一直在制作 shell 腳本,以便能夠從他們的網站下載 Blender 的某個實驗分支。 卷曲站點時,所有版本都出現在所有 html 的真正(我的意思是很長)字符串中。 I can grep (ripgrep spcecifically) only the Linux versions, but when wanting to grep or even sed again, all the filenames start with "https://" and end with ".tar.xz".

而且它們都在同一行,因此匹配第一個匹配的開頭也匹配最后一個匹配的結尾。

os linux" ><a href="https://builder.blender.org/download/experimental/blender-3.0.0-alpha+asset-browser-poselib.fba8de2e8688-linux.x86_64-release.tar.xz" title="Download linux 64bit tar.xz file" class="js-ga" ga_label="linux 64bit tar.xz file" ga_type="button" ga_cat="download"><span class="name">Blender 3.0.0 - <span class="build-var next">asset-browser-poselib</span><small>May 22, 05:26:55 - asset-browser-poselib - fba8de2e8688 - tar.xz - 149.56MB</small></span><span class="build">x64</span><span class="size">release</span></a></li><li class="os linux" style="display:none;"><a href="https://builder.blender.org/download/experimental/blender-3.0.0-alpha+asset-browser-poselib.fba8de2e8688-linux.x86_64-release.tar.xz.sha256" title="Download linux 64bit sha256 file" class="js-ga" ga_label="linux 64bit sha256 file" ga_type="button" ga_cat="download"><span class="name">Blender 3.0.0 - <span class="build-var next">asset-browser-poselib</span><small>May 22, 05:26:55 - asset-browser-poselib - fba8de2e8688 - sha256 - 65.00B</small></span><span class="build">x64</span><span class="size">release</span></a></li><li class="os linux" ><a href="https://builder.blender.org/download/experimental/blender-3.0.0-alpha+cycles-x.a117a9c63c3a-linux.x86_64-release.tar.xz" title="Download linux 64bit tar.xz file" class="js-ga" ga_label="linux 64bit tar.xz file" ga_type="button" ga_cat="download"><span class="name">Blender 3.0.0 - <span class="build-var next">cycles-x</span><small>May 22, 05:03:02 - cycles-x - a117a9c63c3a - tar.xz - 143.11MB</small></span><span class="build">x64</span><span class="size">release</span></a></li><li class="os linux" style="display:none;"><a href="https://builder.blender.org/download/experimental/blender-3.0.0-alpha+cycles-x.a117a9c63c3a-linux.x86_64-release.tar.xz.sha256" title="Download linux 64bit sha256 file" class="js-ga" ga_label="linux 64bit sha256 file" ga_type="button" ga_cat="download"><span class="name">Blender 3.0.0 - <span class="build-var next">cycles-x</span><small>May 22, 05:03:02 - cycles-x - a117a9c63c3a - sha256 - 65.00B</small></span><span class="build">x64</span><span class="size">release</span></a></li><li class="os linux" style="display:none;"><a href="https://builder.blender.org/download/experimental/blender-3.0.0-alpha+override-recursive-resync.0d2c5bf06726-linux.x86_64-debug.tar.xz.sha256" title="Download linux 64bit sha256 file" class="js-ga" ga_label="linux 64bit sha256 file" ga_type="button" ga_cat="download"><span class="name">Blender 3.0.0 - <span class="build-var next">override-recursive-resync</span><small>May 20, 12:38:57 - override-recursive-resync - 0d2c5bf06726 - sha256 - 65.00B</small></span><span class="build">x64</span><span class="size">debug</span></a></li><li class="os linux" ><a href="https://builder.blender.org/download/experimental/blender-3.0.0-alpha+override-recursive-resync.0d2c5bf06726-linux.x86_64-debug.tar.xz" title="Download linux 64bit tar.xz file" class="js-ga" ga_label="linux 64bit tar.xz file" ga_type="button" ga_cat="download"><span class="name">Blender 3.0.0 - <span class="build-var next">override-recursive-resync</span><small>May 20, 12:38:56 - override-recursive-resync - 0d2c5bf06726 - tar.xz - 157.56MB</small></span><span class="build">x64</span><span class="size">debug</span></a></li><li class="os linux" ><a href="https://builder.blender.org/download/experimental/blender-3.0.0-alpha+override-recursive-resync.0d2c5bf06726-linux.x86_64-release.tar.xz" title="Download linux 64bit tar.xz file" class="js-ga" ga_label="linux 64bit tar.xz file" ga_type="button" ga_cat="download"><span class="name">Blender 3.0.0 - <span class="build-var next">override-recursive-resync</span><small>May 20, 11:50:22 - override-recursive-resync - 0d2c5bf06726 - tar.xz - 149.73MB</small></span><span class="build">x64</span><span class="size">release</span></a></li><li class="os linux" style="display:none;"><a href="https://builder.blender.org/download/experimental/blender-3.0.0-alpha+override-recursive-resync.0d2c5bf06726-linux.x86_64-release.tar.xz.sha256" title="Download linux 64bit sha256 file" class="js-ga" ga_label="linux 64bit sha256 file" ga_type="button" ga_cat="download"><span class="name">Blender 3.0.0 - <span class="build-var next">override-recursive-resync</span><small>May 20, 11:50:22 - override-recursive-resync - 0d2c5bf06726 - sha256 - 65.00B</small></span><span class="build">x64</span><span class="size">release</span></a></li><li class="os linux" ><a href="https://builder.blender.org/download/experimental/blender-3.0.0-alpha+profiler-editor.ab200c6eddc6-linux.x86_64-release.tar.xz" title="Download linux 64bit tar.xz file" class="js-ga" ga_label="linux 64bit tar.xz file" ga_type="button" ga_cat="download"><span class="name">Blender 3.0.0 - <span class="build-var next">profiler-editor</span><small>May 20, 04:54:26 - profiler-editor - ab200c6eddc6 - tar.xz - 149.54MB</small></span><span class="build">x64</span><span class="size">release</span></a></li><li class="os linux" style="display:none;"><a href="https://builder.blender.org/download/experimental/blender-3.0.0-alpha+profiler-editor.ab200c6eddc6-linux.x86_64-release.tar.xz

我使用 ripgrep(或 grep)進行了驗證: rg -o 'https.*tar\.xz'但這正是從第一個文件名一直到最后一個文件名的匹配項,也許在 grep 中使用 AND 邏輯會有所幫助嗎?

我想要的字符串中的 URL 如下:

https://builder.blender.org/download/experimental/blender-3.0.0-alpha+cycles-x.a117a9c63c3a-linux.x86_64-release.tar.xz

如果它們的開頭和結尾相同,我如何過濾掉特定的 URL 字符串?

使用 GNU grep使用非貪婪匹配,我們可以嘗試跟隨。

grep -oP 'https?:\/\/.*?tar\.xz' Input_file

說明:只需使用-o選項僅打印匹配的部分,使用-P選項在此處啟用帶有 grep 的 PCRE 正則表達式。 然后在此處使用非貪婪匹配從httphttpstar.xz進行匹配。 它將打印文件中所有匹配的值。

注意:如果您對上面的grep結果感到滿意,這將在終端上打印它們,並且您想將 output 保存到 Input_file 本身然后 append > temp && mv temp Input_file m 到上面的 Input_file 代碼

這是使用 CLI HTML parser pup的方法:

curl -s https://builder.blender.org/download/experimental/ \
    | pup 'li.linux > a[href*="cycles-x"] attr{href}' \
    | grep '\.tar\.xz$'

印刷

https://builder.blender.org/download/experimental/blender-3.0.0-alpha+cycles-x.a117a9c63c3a-linux.x86_64-release.tar.xz

選擇器li.linux > a[href*="cycles-x"]選擇<a>元素,在其href屬性中包含cycles-x ,用於作為具有 class linux的列表項的子項的所有鏈接顯示 function attr{href}打印href屬性的值。

這將返回兩行:我們想要的 URL 和校驗和的 URL。 CSS 支持多個屬性選擇器,如a[href*="cycles-x"][href$=".tar.xz"] ,但 pup 不支持 - 因此grep過濾器。

您可以使用

grep -o 'https[^[:space:]"'"'"']*tar\.xz'

請參閱在線演示

細節

  • https - https字符串
  • [^[:space:]"']* - 除空格之外的零個或多個字符, "'
  • tar\.xz - tar.xz字符串。

您可以在“.tar.xz”的每個實例之后添加一個新行:

sed -i 's/\.tar\.xz/.tar.xz\n/g' your_file 

然后使用以下命令刪除“https”之前的所有內容:

sed -i 's/.*href="//' your_file

將文件更改為此:

https://builder.blender.org/download/experimental/blender-3.0.0-alpha+asset-browser-poselib.fba8de2e8688-linux.x86_64-release.tar.xz
https://builder.blender.org/download/experimental/blender-3.0.0-alpha+asset-browser-poselib.fba8de2e8688-linux.x86_64-release.tar.xz
https://builder.blender.org/download/experimental/blender-3.0.0-alpha+cycles-x.a117a9c63c3a-linux.x86_64-release.tar.xz
https://builder.blender.org/download/experimental/blender-3.0.0-alpha+cycles-x.a117a9c63c3a-linux.x86_64-release.tar.xz
https://builder.blender.org/download/experimental/blender-3.0.0-alpha+override-recursive-resync.0d2c5bf06726-linux.x86_64-debug.tar.xz
https://builder.blender.org/download/experimental/blender-3.0.0-alpha+override-recursive-resync.0d2c5bf06726-linux.x86_64-debug.tar.xz
https://builder.blender.org/download/experimental/blender-3.0.0-alpha+override-recursive-resync.0d2c5bf06726-linux.x86_64-release.tar.xz
https://builder.blender.org/download/experimental/blender-3.0.0-alpha+override-recursive-resync.0d2c5bf06726-linux.x86_64-release.tar.xz
https://builder.blender.org/download/experimental/blender-3.0.0-alpha+profiler-editor.ab200c6eddc6-linux.x86_64-release.tar.xz
https://builder.blender.org/download/experimental/blender-3.0.0-alpha+profiler-editor.ab200c6eddc6-linux.x86_64-release.tar.xz

編輯:@Wiktor Stribiżew 有更好的答案

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM