简体   繁体   English

如何从两个指定信息之间的字符串中获取特定字符串

[英]How do I get a specific string from a string between two specified pieces of information

I apologize for the confusing title.我为令人困惑的标题道歉。 I looked around and I know how to get a string between two specified characters, but I am unsure on how to get a string between a phrase and character, such as src="the information i want" .我环顾四周,知道如何在两个指定字符之间获取字符串,但我不确定如何在短语和字符之间获取字符串,例如src="the information i want" In this case I want my starting point to be src=" , and endpoint to be the first " after the start point.在这种情况下,我希望我的起点是src=" ,端点是起点之后的第一个" How would I go about specifying these parameters in the get method?我将如何 go 在get方法中指定这些参数?

Below is the output of what I am asking for help with.以下是我寻求帮助的 output。 Rather than have to manually copy and paste the second URL, I want to assign that string to a variable to automate the process.我不想手动复制和粘贴第二个 URL,而是想将该字符串分配给一个变量以自动化该过程。

>>> %Run myProject.py
enter URL
https://www.instagram.com/p/CAYGHWFFp-x/
<video class="tWeCl" playsinline="" poster="https://scontent-iad3-1.cdninstagram.com/v/t51.2885-15/e35/100101005_584997515466659_2719890114744519125_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com&_nc_cat=111&_nc_ohc=DI3B3wg_vaQAX_MvEcQ&oh=06b611ef41299d4f0278467fb1d74e94&oe=5EC66079" 
preload="none" src="https://scontent-iad3-1.cdninstagram.com/v/t50.2886-16/98205256_176119867089312_5443572653160790508_n.mp4?_nc_ht=scontent-iad3-1.cdninstagram.com&_nc_cat=100&_nc_ohc=JtZXc2HiQ9kAX_097NE&oe=5EC68ACC&oh=ac92032cb89fa1dfbcb5f2fa9016c9ba" type="video/mp4"></video>
enter the URL

Thank you so much!太感谢了!

You can use Beautiful Soup to parse this content.您可以使用Beautiful Soup来解析此内容。 Then you can look for video elements, and read their src attribute.然后您可以查找video元素,并读取它们的src属性。

from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'html.parser')
for video in soup.find_all('video'):
    print(video.get('src'))

Output Output

https://scontent-iad3-1.cdninstagram.com/v/t50.2886-1698205256_176119867089312_5443572653160790508_n.mp4?_nc_ht=scontent-iad3-1.cdninstagram.com&_nc_cat=100&_nc_ohc=JtZXc2HiQ9kAX_097NE&oe=5EC68ACC&oh=ac92032cb89fa1dfbcb5f2fa9016c9ba

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM