Python：搜索字符串，在該字符串之后解析文本，然后添加到列表中

Question

因此，我有大量的XML（例如，這是https://www.goodreads.com/author/list/20598?format=xml&key=pVrw9BAFGMTuvfj4Y8VHQ ），並且我想針對字符串的每個外觀進行搜索<title>，然后解析其后的文本以獲取實際標題，並將其臨時分配為變量的值，然后追加將該變量添加到列表中。

換句話說，瀏覽此XML並附帶一個列表標題。

然后我的問題（在搜索中，我看到了很多類似的東西，但完全不同）：

1-如何遍歷整個文本，在每次出現的<title>處停止執行我在此描述的操作？

2-我應該如何准確解析該標題？ 也就是說，我想捕獲在<title>和</ title>之間出現的字符串？

先發制人的感謝。

Answer 1

假設用<title>表示標題標簽，那么任何中途的XML解析器都可以輕松地做到這一點：它將在找到title標簽時通知您，然后提取該標簽中的文本（所需標題）。

Answer 2

眾所周知，XML有很多解析器。 但是，如果您想自己執行此操作，則此功能將起作用，除了在注釋掉的文本中出現title元素標志（我不知道它們在技術上叫什么）的情況或非法的情況下文字部分。

def extract_text_between_flags(inputText, flagBegin, flagEnd):
    # Instantiate an empty list to store the results
    excerpts = list()

    # Find the first occurrence of the begin flag
    indexBegin = inputText.find(flagBegin)
    # Until the begin flag is no longer found
    while indexBegin > -1:
        # From the current begin flag location, search forward to the first
        # occurrence of the end flag
        indexEnd = inputText.find(flagEnd, indexBegin + len(flagBegin)) + len(flagEnd)
        # If the end flag is not found, stop searching
        if indexEnd <= 0:
            break
        # Extract the relevant passage from the text and add it to the list
        excerpt = inputText[indexBegin+len(flagBegin):indexEnd-len(flagEnd)]
        excerpts.append(excerpt)

        # Set the new search starting point as the next occurrence of the
        # begin flag
        indexBegin = inputText.find(flagBegin, indexEnd)

    return excerpts

titles = extract_text_between_flags(myXMLString, '< title >', '< /title >')

Python：搜索字符串，在該字符串之后解析文本，然后添加到列表中

問題描述

2 個解決方案

解決方案1
0 2015-06-24 18:52:20

解決方案2
0 2015-06-24 19:54:31

Python：搜索字符串，在該字符串之后解析文本，然后添加到列表中

問題描述

2 個解決方案

解決方案1 0 2015-06-24 18:52:20

解決方案2 0 2015-06-24 19:54:31

解決方案1
0 2015-06-24 18:52:20

解決方案2
0 2015-06-24 19:54:31