简体   繁体   中英

How do I get a specific string from a string between two specified pieces of information

I apologize for the confusing title. I looked around and I know how to get a string between two specified characters, but I am unsure on how to get a string between a phrase and character, such as src="the information i want" . In this case I want my starting point to be src=" , and endpoint to be the first " after the start point. How would I go about specifying these parameters in the get method?

Below is the output of what I am asking for help with. Rather than have to manually copy and paste the second URL, I want to assign that string to a variable to automate the process.

>>> %Run myProject.py
enter URL
https://www.instagram.com/p/CAYGHWFFp-x/
<video class="tWeCl" playsinline="" poster="https://scontent-iad3-1.cdninstagram.com/v/t51.2885-15/e35/100101005_584997515466659_2719890114744519125_n.jpg?_nc_ht=scontent-iad3-1.cdninstagram.com&_nc_cat=111&_nc_ohc=DI3B3wg_vaQAX_MvEcQ&oh=06b611ef41299d4f0278467fb1d74e94&oe=5EC66079" 
preload="none" src="https://scontent-iad3-1.cdninstagram.com/v/t50.2886-16/98205256_176119867089312_5443572653160790508_n.mp4?_nc_ht=scontent-iad3-1.cdninstagram.com&_nc_cat=100&_nc_ohc=JtZXc2HiQ9kAX_097NE&oe=5EC68ACC&oh=ac92032cb89fa1dfbcb5f2fa9016c9ba" type="video/mp4"></video>
enter the URL

Thank you so much!

You can use Beautiful Soup to parse this content. Then you can look for video elements, and read their src attribute.

from bs4 import BeautifulSoup
soup = BeautifulSoup(text, 'html.parser')
for video in soup.find_all('video'):
    print(video.get('src'))

Output

https://scontent-iad3-1.cdninstagram.com/v/t50.2886-1698205256_176119867089312_5443572653160790508_n.mp4?_nc_ht=scontent-iad3-1.cdninstagram.com&_nc_cat=100&_nc_ohc=JtZXc2HiQ9kAX_097NE&oe=5EC68ACC&oh=ac92032cb89fa1dfbcb5f2fa9016c9ba

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM