簡體   English   中英

Python中單行匹配的多行輸出

[英]Multiple-line output from a single-line match in Python

我在Python方面仍然是個新手,但是我正在嘗試編寫代碼來解析NOAA的天氣並按照廣播的順序顯示它。

我設法整理了一個使用python表達式的當前條件列表,其中html文件被切成幾行,然后以正確的順序重新輸出,但是每一個都是一行數據。 該代碼如下所示:

#other function downloads  
#http://www.arh.noaa.gov/wmofcst_pf.php?wmo=ASAK48PAFC&type=public
#and renames it currents.html
from bs4 import BeautifulSoup as bs
import re
soup = bs(open('currents.html')
weatherRaw = soup.pre.string
towns = ['PAOM', 'PAUN', 'PAGM', 'PASA']
townOut = []
weatherLines = weatherRaw.splitlines()
for i in range(len(towns)):
    p = re.compile(towns[i] + '.*')
    for line in weatherLines:
        matched = p.match(line)
        if matched:
            townOut.append(matched.group())

現在,我正在處理預測部分,我遇到了一個問題,因為每個預測都必須跨越多行,並且我已將文件切成幾行。

所以:我要尋找的是一個表達式,它將允許我使用類似的循環,這次是在找到的行開始追加,並在僅包含&&的行結束它。 像這樣:

#sample data from http://www.arh.noaa.gov/wmofcst.php?wmo=FPAK52PAFG&type=public
#BeautifulSouped into list fcst (forecast.pre.get_text().splitlines())
zones = ['AKZ214', 'AKZ215', 'AKZ213'] #note the out-of-numerical-order zones
weatherFull = []
for i in range(len(zones)):
    start = re.compile(zones[i] '.*')
    end = re.compile('&&')
    for line in fcst:
        matched = start.match(line)
        if matched:
            weatherFull.append(matched.group())
            #and the other lines of various contents and length
            #until reaching the end match object

我應該怎么做才能改善這段代碼? 我知道這很冗長,但是當我剛開始的時候,我喜歡能夠跟蹤自己在做什么。 提前致謝!

抱歉,如果這與您的要求不符(在這種情況下,很高興進行調整)。 您正在使用BeautifulSoup太棒了,但實際上您可以將其進一步發展。 查看HTML,似乎每個塊都以<a name=zone>結構開始,並在下一個<a name=zone>處結束。 在這種情況下,您可以執行以下操作為每個區域提取相應的HTML:

from bs4 import BeautifulSoup

# I put the HTML in a file, but this will work with a URL as well
with open('weather.html', 'r') as f:
  fcst = f.read()

# Turn the html into a navigable soup object
soup = BeautifulSoup(fcst)

# Define your zones
zones = ['AKZ214', 'AKZ215', 'AKZ213']

weatherFull = []

# This is a more Pythonic loop structure - instead of looping over
# a range of len(zones), simply iterate over each element itself
for zone in zones:
  # Here we use BS's built-in 'find' function to find the 'a' element
  # with a name = the zone in question (as this is the pattern).
  zone_node = soup.find('a', {'name': zone})

  # This loop will continue to cycle through the elements after the 'a'
  # tag until it hits another 'a' (this is highly structure dependent :) )
  while True:
    weatherFull.append(zone_node)
    # Set the tag node = to the next node
    zone_node = zone_node.nextSibling
    # If the next node's tag name = 'a', break out and go to the next zone
    if getattr(zone_node, 'name', None)  == 'a':
      break

# Process weatherFull however you like
print weatherFull

希望這會有所幫助(或至少在您想要的目標中!)。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM