grep從HTML提取正則表達式href和rel

Question

我正在處理的html看起來像這樣

<a class="title may-blank" data-event-action="title" href="/r/gaming/comments/6t8dj0/we_can_play_singleplayer_games_off_the_internet/" tabindex="1" data-href-url="/r/gaming/comments/6t8dj0/we_can_play_singleplayer_games_off_the_internet/" data-inbound-url="/r/gaming/comments/6t8dj0/we_can_play_singleplayer_games_off_the_internet/?utm_content=title&amp;utm_medium=hot&amp;utm_source=reddit&amp;utm_name=frontpage" rel="">We can play singleplayer games OFF THE INTERNET? Are they seriously that out of touch to advertise this?</a>

這樣的多行

我只想要href="http://xxxxxxxx"和rel="">yyyyyyyyyy的引號之間的rel="">yyyyyyyyyy ，其余的都是不必要的。

我希望他們像這樣輸出，上面的每個塊都換一行

<a href="http://xxxxxxxx" rel="">yyyyyyyyyy</a>

知道我該如何解決嗎？

Answer 1

因此，這是一個10秒鍾的解決方案。 它可能有點脆弱，但是應該假設字符串在名為html.txt的文件中就可以工作

cat html.txt | sed 's/class.*href/href/' | sed 's/data-in.*rel=/rel=/'

Ĵ

Answer 2

您的html示例將我帶到以下模式以獲取所需的值：

<a class=\"(.*) href=\"/(.*)\" tabindex=(.*) rel=\"\">(.*)</a>

使用以下模式替換匹配項：

<a href="http://$2" rel="">$4</a>

您可以在regexe上為我試用，它的工作原理與預期的一樣。

grep從HTML提取正則表達式href和rel

問題描述

2 個解決方案

解決方案1
0 2017-08-12 19:57:19

解決方案2
0 2017-08-12 19:57:26

grep從HTML提取正則表達式href和rel

問題描述

2 個解決方案

解決方案1 0 2017-08-12 19:57:19

解決方案2 0 2017-08-12 19:57:26

解決方案1
0 2017-08-12 19:57:19

解決方案2
0 2017-08-12 19:57:26