簡體   English   中英

如何組合兩個 SED 正則表達式命令?

[英]How to combine two SED regex commands?

我已經閱讀了我可以在 SOF 上找到的所有答案,不幸的是,沒有一個答案讓我找到了解決方案。 我有數千個包含地址信息的文件,我的每個 SED 命令都可以自己運行

匹配地址

sed -n -e 's/^.*address23ca storeh2..\(.*\) Address<\/h2.*optmob..\(.*\)<br>\(.*\)<br>\(.*\)<br>\(.*\)<\/p><p class="addressbox23.*Telephone: \(.-...-...-....\)\(.*\).*v1\/place?..\(.*\)&key.*$/\1,\2,\3,\4,\5,\6/p' afile.html

$ 142 Wayne Street,Abbey,Saskatchewan,S0N 0A0,1-232-321-4321

匹配 GPS

sed -n -e 's/^.*v1\/place?..\(.*\)&key.*$/\1/p' abbey.html

$ 50.736301,-108.757103

我嘗試了以下方法,但它並沒有在電話號碼之后停止匹配,而是繼續直到它匹配v1\/place? 然后停止。 我不知道如何在電話號碼處停止匹配並再次為 GPS 開始匹配。

我怎樣才能結合這兩個匹配?

sed -n -e 's/^.*address23ca storeh2..\(.*\) Address<\/h2.*optmob..\(.*\)<br>\(.*\)<br>\(.*\)<br>\(.*\)<\/p><p class="addressbox23.*Telephone: \(.-...-...-....\)\(.*\).^*v1\/place?..\(.*\)&key.*$/\1,\2,\3,\4,\5,\6,\7/p' afile.html

$ 142 Wayne Street,Abbey,Saskatchewan,S0N 0A0,1-232-321-4321 LOADS OF unnecessary HTML src="https://www.google.com/maps/embed

文件的修剪版本

<!DOCTYPE html>
<html lang="en"> <!--<![endif]-->
<head></head><body> <div class="large-7 columns small-12 addWrap23ca"> <div class="storeH2Wrap23ca"> <h2 class="address23ca storeh2">Canada Post Abbey Address</h2></div><p class="addressbox23ca optmob">142 Wayne Street<br>Abbey<br>Saskatchewan<br>S0N 0A0</p><p class="addressbox23ca optmob">Telephone: 1-866-607-6301</p></div></div><div class="row"> <div class="large-12 medium-12 columns small-12"> <div class="row"> <div class="large-12 columns small-12"> <div class="storeH2Wrap23ca"> <h2 class="hours23ca storeh2">Canada Post Abbey Opening Hours</h2></div><div class="hoursCont23ca"> 13:00-16:30</p><p>Closed</p><p>Closed</p></div></div><div class="notesWrap23ca"><div class="notesTitle23ca"><p class="noteHeading23ca">Post Office Notes</p></div><div class="notesContent23ca"><p class="note23ca">This Post Office Branch closes for lunch on certain days - please see opening hours.</p></div></div></div></div></div></div><div class="row"> <div class="mapadCont23ca"> <div class="large-12 medium-12 columns small-12 map23ca"> <div class="storeH2Wrap23ca storeH2WrapMap23ca"> <h2 class="maptitle23ca storeh2">Canada Post Abbey Map Location</h2></div><div class="mapBreadCrumbs23ca"><ul><li><a href="../canada-post/canada-post.html">Canada Post Locator</a></li><li>&gt;</li><li><a href="saskatchewan.html">Canada Post Saskatchewan</a></li><li>&gt;</li><li>Canada Post in Abbey</li></ul></div> <div class="mapCont23ca"> <iframe width="100%" height="434" frameborder="0" src="https://www.google.com/maps/embed/v1/place?q=50.736301,-108.757103&key=AIzaSyDmJApckRpAR1uhfdfz_QedneaF5lAlrQU"></iframe></div><div class="searchagainouter23ca"> <div class="adddivclear" style="clear:both;"></div></body></html>

您可以將兩個正則表達式與分號結合起來

$ echo "etts" | sed 's/et/te/; s/ts/st/'
test

您可以使用更好的工具來完成這項工作,例如 python 的HTMLParser 這是打印所有標簽的示例,您可以在其中添加所需的任何過濾器

#! /usr/bin/env python3
from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print("Found a start tag:", tag)
        print("\tattrs:", attrs)
    def handle_endtag(self, tag):
        print("Found an end tag:", tag)
    def handle_data(self, data):
        print("Found data:", data)

MyHTMLParser().feed('''
<!DOCTYPE html>
<html lang="en"> <!--<![endif]-->
<head></head><body> <div class="large-7 columns small-12 addWrap23ca"> <div class="storeH2Wrap23ca"> <h2 class="address23ca storeh2">Canada Post Abbey Address</h2></div><p class="addressbox23ca optmob">142 Wayne Street<br>Abbey<br>Saskatchewan<br>S0N 0A0</p><p class="addressbox23ca optmob">Telephone: 1-866-607-6301</p></div></div><div class="row"> <div class="large-12 medium-12 columns small-12"> <div class="row"> <div class="large-12 columns small-12"> <div class="storeH2Wrap23ca"> <h2 class="hours23ca storeh2">Canada Post Abbey Opening Hours</h2></div><div class="hoursCont23ca"> 13:00-16:30</p><p>Closed</p><p>Closed</p></div></div><div class="notesWrap23ca"><div class="notesTitle23ca"><p class="noteHeading23ca">Post Office Notes</p></div><div class="notesContent23ca"><p class="note23ca">This Post Office Branch closes for lunch on certain days - please see opening hours.</p></div></div></div></div></div></div><div class="row"> <div class="mapadCont23ca"> <div class="large-12 medium-12 columns small-12 map23ca"> <div class="storeH2Wrap23ca storeH2WrapMap23ca"> <h2 class="maptitle23ca storeh2">Canada Post Abbey Map Location</h2></div><div class="mapBreadCrumbs23ca"><ul><li><a href="../canada-post/canada-post.html">Canada Post Locator</a></li><li>&gt;</li><li><a href="saskatchewan.html">Canada Post Saskatchewan</a></li><li>&gt;</li><li>Canada Post in Abbey</li></ul></div> <div class="mapCont23ca"> <iframe width="100%" height="434" frameborder="0" src="https://www.google.com/maps/embed/v1/place?q=50.736301,-108.757103&key=AIzaSyDmJApckRpAR1uhfdfz_QedneaF5lAlrQU"></iframe></div><div class="searchagainouter23ca"> <div class="adddivclear" style="clear:both;"></div></body></html>
''')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM