在字符串的兩個定界符之間搜索字符

Question

我試圖解析一個字符串，以找到兩個定界符<code>和</code>之間的所有字符。

我曾嘗試使用正則表達式，但似乎無法理解發生了什么。

我的嘗試：

import re
re.findall('<code>(.*?)</code>', processed_df['question'][2])

其中processed_df['question'][2]是字符串（此字符串是連續的，為便於閱讀，我將其鍵入多行）：

 '<code>for x in finallist:\n    matchinfo = 
 requests.get("https://api.opendota.com/api/matches/{}".format(x)).json() 
 ["match_id"]\n    print(matchinfo)\n</code>'

我已經用以下test_string測試過：

 test_string = '<code> this is a test </code>'

而且似乎可行。

我覺得它與<code>和</code>之間的字符中的特殊字符有關，但我不知道如何解決。 感謝您的幫助！

Answer 1

使用html解析器可能比使用正則表達式更好

import lxml.html

html_snippet = """
 ...
 <p>Some stuff</p>
 ...
 <code>for x in finallist:\n    matchinfo = 
 requests.get("https://api.opendota.com/api/matches/{}".format(x)).json() 
 ["match_id"]\n    print(matchinfo)\n</code>
 ...
 And some Stuff
 ...
 another code block <br />
 <code>
    print('Hello world')
 </code>
 """

dom = lxml.html.fromstring(html_snippet)
codes = dom.xpath('//code')


for code in codes:
    print(code.text)

 >>>> for x in finallist:
 >>>>     matchinfo = 
 >>>> requests.get("https://api.opendota.com/api/matches/{}".format(x)).json() 
 >>>> ["match_id"]
 >>>>    print(matchinfo)

 >>>> print('Hello world')

Answer 2

我認為問題是換行符\\ n，請確保使用DOTALL標志進行匹配，例如

import re
regex = r"<code>(.*)\<\/code>"

test_str = ("<code>for x in finallist:\\n    matchinfo = \n"
    " requests.get(\"https://api.opendota.com/api/matches/{}\".format(x)).json() \n"
    " [\"match_id\"]\\n    print(matchinfo)\\n</code>\n")

re.findall(regex, test_str, re.DOTALL)

'for x in finallist:\\n    matchinfo = \n requests.get("https://api.opendota.com/api/matches/{}".format(x)).json() \n ["match_id"]\\n    print(matchinfo)\\n'

Answer 3

因此問題並未明確表示需要regular expresions 。 話雖如此，我會說最好不要使用它們：

例如

test_str = '''
<code>asldkfj
asdlkfjas
asdlkf
for i in range(asdlkf):
    print("Hey")
    if i == 8:
        print(i)
</code>
'''

start = len('<code>')

end = len('</code>')

new_str = test_str.strip()[start:-end] # Should have everything in between <code></code>

在字符串的兩個定界符之間搜索字符

問題描述

3 個解決方案

解決方案1
3 已采納 2019-05-20 22:21:45

解決方案2
2 2019-05-20 22:14:38

解決方案3
1 2019-05-20 22:20:12

在字符串的兩個定界符之間搜索字符

問題描述

3 個解決方案

解決方案1 3 已采納 2019-05-20 22:21:45

解決方案2 2 2019-05-20 22:14:38

解決方案3 1 2019-05-20 22:20:12

解決方案1
3 已采納 2019-05-20 22:21:45

解決方案2
2 2019-05-20 22:14:38

解決方案3
1 2019-05-20 22:20:12