[英]Extract words from text files with python
I have an html file with content I can't extract easily with BeautifulSoup because I think it is loaded with Javascript. 我有一个HTML文件,其中包含我无法使用BeautifulSoup轻松提取的内容,因为我认为它已加载了Javascript。
..."inlineParams":"json","title":"","lNameP":"MYNAME","key":"degree_result_person"},"firstName":"MYFIRSTNAME"...
I have multiples names in this file that I would like to extract. 我要提取此文件中的多个名称。 Those names are just after "lNameP".
这些名称仅在“ lNameP”之后。 Is there any way to do a loop to get all those names (in this case i would like to get MYNAME) ?
有什么办法可以循环获取所有这些名称(在这种情况下,我想获得MYNAME)?
Thanks a lot, 非常感谢,
Using regex? 使用正则表达式?
import re
pattern = re.compile('\"(lNameP)\"\:\"(.*?)\"')
result = pattern.findall(string)
result[0][0]
would be key and result[0][1]
would be the value. result[0][0]
将是键,而result[0][1]
将是值。
This regex code will match exactly what you need: 此正则表达式代码将完全符合您的需求:
string ='"inlineParams":"json","title":"","lNameP":"MYNAME","key":"degree_result_person"},"firstName":"MYFIRSTNAME"'
import re
pattern = re.compile('\"lNameP"\:"(.*?)"')
match = pattern.search(string).group(1)
print (match)
Output: 输出:
MYNAME
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.