[英]How do I get a specific string from a string between two specified pieces of information
I apologize for the confusing title.我为令人困惑的标题道歉。 I looked around and I know how to get a string between two specified characters, but I am unsure on how to get a string between a phrase and character, such as
src="the information i want"
.我环顾四周,知道如何在两个指定字符之间获取字符串,但我不确定如何在短语和字符之间获取字符串,例如
src="the information i want"
。 In this case I want my starting point to be src="
, and endpoint to be the first "
after the start point.在这种情况下,我希望我的起点是
src="
,端点是起点之后的第一个"
。 How would I go about specifying these parameters in the get method?我将如何 go 在get方法中指定这些参数?
Below is the output of what I am asking for help with.以下是我寻求帮助的 output。 Rather than have to manually copy and paste the second URL, I want to assign that string to a variable to automate the process.
我不想手动复制和粘贴第二个 URL,而是想将该字符串分配给一个变量以自动化该过程。
>>> %Run myProject.py
enter URL
https://www.instagram.com/p/CAYGHWFFp-x/
<video class="tWeCl" playsinline="" poster="https://scontent-iad3-1.cdninstagram.com/v/t51.2885-15/e35/100101005_584997515466659_2719890114744519125_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com&_nc_cat=111&_nc_ohc=DI3B3wg_vaQAX_MvEcQ&oh=06b611ef41299d4f0278467fb1d74e94&oe=5EC66079"
preload="none" src="https://scontent-iad3-1.cdninstagram.com/v/t50.2886-16/98205256_176119867089312_5443572653160790508_n.mp4?_nc_ht=scontent-iad3-1.cdninstagram.com&_nc_cat=100&_nc_ohc=JtZXc2HiQ9kAX_097NE&oe=5EC68ACC&oh=ac92032cb89fa1dfbcb5f2fa9016c9ba" type="video/mp4"></video>
enter the URL
Thank you so much!太感谢了!
You can use Beautiful Soup to parse this content.您可以使用Beautiful Soup来解析此内容。 Then you can look for
video
elements, and read their src
attribute.然后您可以查找
video
元素,并读取它们的src
属性。
from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'html.parser')
for video in soup.find_all('video'):
print(video.get('src'))
Output Output
https://scontent-iad3-1.cdninstagram.com/v/t50.2886-1698205256_176119867089312_5443572653160790508_n.mp4?_nc_ht=scontent-iad3-1.cdninstagram.com&_nc_cat=100&_nc_ohc=JtZXc2HiQ9kAX_097NE&oe=5EC68ACC&oh=ac92032cb89fa1dfbcb5f2fa9016c9ba
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.