簡體   English   中英

提取 python 中標簽之間的內容

[英]Extract content between tags in python

我正在嘗試獲取兩個標簽之間的內容。 我在我的代碼中使用beautifulsoup ,正如我在這個問題上閱讀的那樣我的代碼是下一個:

 soup3 = BeautifulSoup(html,'html.parser')
 scripts=soup3.find_all('script')
 os.mkdir(path)
 for nonce in scripts:
    if nonce.has_attr('nonce'):
        #print(str(nonce.text.strip()).find('data:image/jpeg;base64'))
        #print(str(nonce.text.strip()).find('image'))

        if str(nonce).strip().find("data:image/jpeg;base64")>0 or str(nonce).strip().find("data:image/png;base64")>0:
         #print(str(nonce).strip())
         print(nonce.text)

…………

如果我執行 if 語句來檢查內部是否有 data:image 字符串,那么如果我按照我對代碼的操作進行操作,但如果我編寫代碼str(nonce.text.strip()).find('data:image/jpeg;base64')我什么也沒得到,也不知道為什么。 那么我該如何獲取腳本標簽之間的內容。 我現在擁有的 output 的一個例子是:

<script nonce="rq3guNaaFH7Hd30OJWKD3Q==">(function(){var s='data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBwgHBgkIBwgKCgkLDRYPDQwMDRsUFRAWIB0iIiAdHx8kKDQsJCYxJx8fLT0tMTU3Ojo6Iys/RD84QzQ5OjcBCgoKDQwNGg8PGjclHyU3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3Nzc3N//AABEIAFoAoAMBIgACEQEDEQH/xAAbAAADAQADAQAAAAAAAAAAAAADBAUGAQIHAP/EAEYQAAIBAwMBBQQFBgsJAQAAAAECAwAEEQUSITEGEyJBURRhcZEjMoGhsRYzQlLR0wcVJDQ1Q2JyksHwRVNVk5SVouHiJf/EABkBAAIDAQAAAAAAAAAAAAAAAAIDAAEEBf/EACMRAAICAgICAwADAAAAAAAAAAABAhEDEiExBCITMkFRUqH/2gAMAwEAAhEDEQA/APFr2EwzkY8J5WqQ0TdGrJNkssZA4/S65FdrtY5x3RHiHRvQ0hcT3kRId3UFQuVOAQOlBbaNVY4SeysffQ1Bf6fABG0sOoxz9vWiHQIxLt9oO3B6gZz5fgaglmPBJwPfXAZgcgkGpUv5IsuH+n+l9dChLsgmlypAztHn9tSbW0NzfCBGyN3LD09aJaT38sipBLKzA5HiJArSaVpy2aHJDSt9dh+A91DKWq5YGWWOSWkaCywBIGKnGBwMUG0QiIledr5wByfCf2U/cp/Jn+A/GlbdSI3Kk8biQBn9Bvu5qYzOc3SETb9/JHJPQZcA4pr6T2hQz4ZUn5A5OGzgigXwbvnYPwyKQSOF+l8qbAke4XxkNmbkDk+Z499aCqA25vJ7MTwvGW/UKAZ+BpaPVZN22Tap96iq1jZM2g97+rkgjj1qdfWyw6nLAyAhTxQTdKwYxtnI1I+Txn4gV82qzYwrQ/4RXDRpxtjApe7xbyiFwEkKBwCPIjI+YrO57cGj4qCNqEx53x/IUJ9Qk/WT5ClXZw5HHBx0rlwWiZtoA+FBqgtWdnvpfVP8NdHuZTycfKliBRtjMcAUTjFIiBvcSHyX5UNHZ5AGOB8KZFvjqM11CZnA4FS0FXIHZm4cEZIOfsqnYxJLA6SorLnkMMjpQsDYx8yOaY03823xqpzbiaXjUWS7nTLTvnAVkAbgBqHBYWe9gyM209C3Wm7wn2iUY43UrGheSTjPPNNjbj2YcnD4H7OSFNyoI4lX9EcZpz2tAYwvO4gceWTU2O2z/ViuQqROjjGVYH7KHVNguy7c/wA1b4D8aVtxhJSP924wB/ZP3Vb0yzj1vTVjtTtvGjY7GPhLA8D7enypGztXglvop4yJkt5QVx/YOPspmPuin0Dv7eUzTnd1hRgWxwDN0HrzVhLYm7wrhT7XOobGSTgZ4P20TW123N+u4NutIypYAAZuSQBjr8aPbu81621kDDU7ghuDk5GQB86e0CgqJ7PoVzEg4BZQPtNT9SsRPeyP0bOATT17KY9MnKkKxnwA7BfM568etBv5m9ofaV3L06Hmhcb4Fxk0yXNp4RMySqBUDVjFLdyyyyBc4+4AdPsqprl3LaQK6nMrttBbnHrWSndpWZ3YszHJJ86W8MU+B7zOuSjLONzZTr76cgiaaxZSpTGOT59TUWzYmZYySQeAPQ1orLvBbSJIDwwwT7/L8aRmWi4LhkcnRNktSgJz0poIAowKJcjKfKu4TjlW+VJc7Q1IWZc0IRkzjHXBxT2w+SP91dY4SZ+VIwpPPlxQ7jUuSdFPHIrBSx4/Vp7TB9G3xqXprIIpBnxnqPdVjSx9E3xp2ZKKaGwk5pSZPuz/ACmXP61IjPeyckc+VULxc3Mv96l7EI88ytJHGfIucA06D9TFl+3J2jeRV2rM4A9wocid2pOWJPrQ13E58Qz1FHvmBEIU5BQE1a7FOqK3ZHVGsJ43XcSHI644I/HNX3umuL7ULuRCHktZ13AfWOAePnWa0FIm7sdWJJYe7/Wa2FzN3sc1wV2vJaTZA/SOAePjRp+5deoPXmZp9STcrE2kWxiuAAbk4A9fjXexlaW8uShQFdWuGDdepGQAaFrpc3F8uVZvZYthC4AHtJIH39aY03dJJLLGqBpNUuJA55H1x08/WmyBXQl2gmb8n5soVZ7nBB5xjcf8qLfYFzcsAN2/BIoOsWayadcyiVk+nY+oHDfsot2u2W7LZ5kzUXYky3aeQk28Z68tUA1T16XfqMmDwgVfu/8AdTD0qPsv8PoG2XEbejj8a1kX5mVemWX/ADrHv0OK2CMjoShByQeD8ay+V9RmL7A5E3J8qbEHA8RFDZfoz8BVJI+Olc2c6RsiJC2J6s3zr7uCpYgkkKevwqksXHShTpsjkbAyFJ+6kfJbobFGE00E3GB5K34VptJi+gf3NUTR4WN2XKjb3bEN6VpNHT+Ty/3h+FdLy5B+PBxjyRruCV7qXYvG7rS72MoDP3QwOSQRTmpaklrcugQswbnypa51UuAICArL4gV5FHDfVUjLOtqOJBFHCrojAnrk5FBu7aYwC5wpQkIQDyp60yVcafE+EK9ME8nzzS7RytBsBJXduI9/SmQYrIrfBZ0W3a0uLW6s5GZotrySKv1TgjAHnya9TtdWs9VtfY9Zgik3ps8P1huOMBhzzx0ryKyEkKbUZgceIdM+lP2M05uFLZOOQBxjy9fKnQ/lipq+jbdtNAnsoG1C3USWZghQbvrRHvgcH3eLGfdUzQyz6ZZzRxqHkmklVm6cuTx9n4VpX7TGXspfTSxG8eFNhXb4WD4XDn3E5yPKs3pETJoumrHGC4tg4Mn1c4OTxTGRN1TE79430GZ5ZjGrSnLbd553eXrR5ZD7XP3aZwc+MjBz7ql6oS2hwp+tcAH5NTc8o7+Zsn9Hp8apC32YXVn36ldEKFHeEADyxxSZNFun7y6ncdGkYj4ZoBqyM+61pdLfdbofVEz8jWaq9o0u+3AOPCAvA95rP5K9A8X2LH9Xn3VVQAD87IvuVQR+FSkP0Zp2HVLHvFhFyneM20Dnr6Z6Vx8kW+kbYjy4x/OW+2P/AOaWvWxbzETI/gbjGPKneMYIpLUiFtJzgfUI+6s8XyOiTYNMgt7jTYEjmnuNQeZVJulhRFRVJz9GxPU/Kr69ndS06IhtIUBufFqy9cA4/M9cH/xNStPlaTXOyz4GVmvD9cr0iQ9RyK9OlgkhkYxrIYIImDR99jGNpDdeT4W+Z9a7sMcZwWysX5GWUcrUXweO9rtMTRJPatV0u6AmneMdzqaMAyhSesHTxfcaz51jSDj/APN1D/r4v3FepfwpG2XsjcyXUIlZnjitX693JuO4jPIGBj148+teHHrTkqVGR8uzTR6pplxCcabfkQKP9oJk5IA6Q++nTqMMSePs3q6qoyS10P3NZiwdo7a8dGKsqIQwOCDvXmvRL3tJql52N7NWntuqzXq3E0uoDExLws42ljjxLj41NUXZEk1q3toy8/ZzU40zyz3YUfPua6DtNYpCJTod93Jbbv8AbgFJ9M91W91ztbZ6ld9rLaP29ba8s3jtLmd5pIJX2g7ViK4Q4zz5YJpS51TszNoc3YtZLwWMNoqQ3jWZ7r2sMT3vHjG5yUII8sVZREsu3vsMF3F+Tl5JBcRP3qSXXhCSefEQwOODQrLtoskMa2vZu9nSBO7Upd7tvhx5RfbWik1/RdUttSsF1JYXuOz1lYh3hkISZWbcpAUngkDp1NC7Hy2XZ3s3q+lNqqvNLcpKk0Qu4Fwow43ogYEeY+dXZDNSa+s8Atvya1P6GQMQt1yDjgH6H0NBn7YWaySLLo14knAYG+UEY+MVbBdfWx7P6yIm1K9uLu9ing9jnulMkXdqCRMy7iB7/dWO/hNvE17Xo9S00TTW7WEDMe6bMfhOQzEeI8Hxef2VRVIjtqeisxJ0zUeTn+kY/wBzXX+MdE/4ZqP/AHCP9zSC6TqLojpYXTK671IgYgr6jjp76T6VdkpGilTT7jRnvrK3uoHjukgKzXCyhgyO2eEXB8A9etE0edY4ZlbgKdxYdcYoGnoz9lboIrMf4xh4UZ/qpaFasYJSsyMqSKVO4EUE1tGiqqVobm1Cdydkrd35EefpkUHTWtxMGuYN20g4D7dw8xny+Pl9lKxMcn4dKKEYHw5B6ig1UVQ6Lbdm706/EsrW0kodwMxyMuzvB7x5H3fH0rvqgItJyw42GsGtxNDKrZKsvIqw2t21zZyJIqxTFccfVP7K5+XxGpbRNCmrKumLNI+h3sdm95DaXFz38cZTIDIgHDEA+fB4OMVtz2kUqu3svqRIHO9omGeBkjfzwPdya8p0wk9TmrcABugCBjZ/nWyWb4+KJLDv732Q7vRu2FxCYboX0sDvu7uW5BUn1wWxmkfyT1zzsG/5iftr0RUU2/Kg8elY/tXBDHqCqkSKpjUkBQBUw+R8kqoRPHqrErfsxrSW9zGdPfMqhVxInUMD6+6tXHrfbiFlki0q1DB1difFvYMhzzIdue7UYXAHOMHmsnaqouogFGM9MfCqOrRxpdPsRV+mPQY86c506KjC1ZbTVu25j7kaLaO3dd0DjJC4YcDvMdGbnHnzniiSax29755pNGtd7EGUtEPHh2Zd3j8mckHgkgZzUxP6Bun/AE4wpRvNTu8vSrlyS1i7EkkwNknz8NUp2X8ZBtI+1djf3V9Z6LDbi6dHliWQmMskgkB5kJ5K+vQnGKtfxx/CBLCyns/aPGyFARFwFPkMP0/HzzWDvCWmAY5weM+VetdkuOy1pjyjYD7qJypWAlzRkNIuO3OkW9rbWumoyWyKid6Ax2iXvf1/XjIwcdOeaaW47ctam2bs1bS27QC27vuyo7kKyiPKyA4AZsHOeetbSeCGVU72JHynO5QakWcaLcEKijxHoKB5KC0IkV329txAtt2dgh7iOKEbYyxKR7doO5z5KBkYPvzzWQfsP2mXltIuAPeV/bXp2mks19uOdreHPlwa4fiKUjjBOMUiflOK6GRw3+kXsHa3/ZvTdQTU4XtHuJ4u7VmGXAV84wfePmKrX16t9byJdhJozxiRc8f699CYDvGbA3FOT86AY09lDbF3euOelc/LJ5Z79GzEtI0YzWtLXT7rvbZSYGGVGc7T6ZqXLeySfWrYyEuHVyWU5BB5HSsLN9eun40nOPt+GPyPR3H9C99u8L8j3130pEa6YyTJGqqcB2xu8sCk/MV91B+NadeKRn2b5Z//2Q\x3d\x3d';var ii=['dimg_44'];_setImagesSrc(ii,s);})();</script>

我想要它沒有標簽。 謝謝

您可以使用split()strip()刪除標簽。

s = '''<script nonce="rq3guNaaFH7Hd30OJWKD3Q==">whatever here</script>'''

r = s.split('<script')[1].split('>')[1].strip('</script>')
print(r) #whatever here

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM