[英]Python - How to print out one line out of a text
所以我一直在尝试使用bs4并设法打印出文本。 现在,我设法打印出了更多的init var ajaxsearch
。
我编写了一个代码,在其中打印出所有包含javascript的代码,并在其中以var ajaxsearch
开头的代码进行打印:
try:
product_li_tags = bs4.find_all('script', {'type': 'text/javascript'})
except Exception:
product_li_tags = []
special_code = ''
for s in product_li_tags:
if s.text.strip().startswith('var ajaxsearch'):
special_code = s.text
break
print(special_code)
我得到的输出是:
var ajaxsearch = false;
var combinationsFromController ={
"224114": {
"attributes_values": {
"4": "5.5"
},
"attributes": [
22
],
"unit_impact": 0,
"minimal_quantity": "1",
"date_formatted": "",
"available_date": "",
"id_image": -1,
"list": "'22'"
},
"224140": {
"attributes_values": {
"4": "6"
},
"attributes": [
23
],
"unit_impact": 0,
"minimal_quantity": "1",
"date_formatted": "",
"available_date": "",
"id_image": -1,
"list": "'23'"
},
"224160": {
"attributes_values": {
"4": "6.5"
},
"attributes": [
24
],
"unit_impact": 0,
"minimal_quantity": "1",
"date_formatted": "",
"available_date": "",
"id_image": -1,
"list": "'24'"
},
"224139": {
"attributes_values": {
"4": "7"
},
"attributes": [
25
],
"unit_impact": 0,
"minimal_quantity": "1",
"date_formatted": "",
"available_date": "",
"id_image": -1,
"list": "'25'"
},
"224138": {
"attributes_values": {
"4": "7.5"
},
"attributes": [
26
],
"unit_impact": 0,
"minimal_quantity": "1",
"date_formatted": "",
"available_date": "",
"id_image": -1,
"list": "'26'"
},
"224113": {
"attributes_values": {
"4": "8"
},
"attributes": [
27
],
"unit_impact": 0,
"minimal_quantity": "1",
"date_formatted": "",
"available_date": "",
"id_image": -1,
"list": "'27'"
},
"224129": {
"attributes_values": {
"4": "8.5"
},
"attributes": [
28
],
"unit_impact": 0,
"minimal_quantity": "1",
"date_formatted": "",
"available_date": "",
"id_image": -1,
"list": "'28'"
},
"224161": {
"attributes_values": {
"4": "9"
},
"attributes": [
29
],
"unit_impact": 0,
"minimal_quantity": "1",
"date_formatted": "",
"available_date": "",
"id_image": -1,
"list": "'29'"
}
};
var contentOnly = false;
var Blank = 1;
var Format = 2;
意思是当我打印出s.text时。 我将得到上面代码的输出。 小型修改:如果我尝试执行if s.text.strip().startswith('var combinationsFromController'):
它不会找到该值,并且如果if 'var combinationsFromController' in s.text.strip():
if s.text.strip().startswith('var combinationsFromController'):
,我也无法找到该值if 'var combinationsFromController' in s.text.strip():
它将打印出与上述相同的输出。
但是我的问题是我只想能够打印出var combinationsFromController
groupsFromController并跳过其余的部分,以后我可以使用json.loads将值转换为json,但是在此之前,我的问题是,如何打印以便可以管理只具有值var combinationsFromController
的值?
编辑:可能解决了!
for s in product_li_tags:
if 'var combinationsFromController' in s.text.strip():
for line in s.text.splitlines():
if line.startswith('var combinationsFromController'):
get_full_text = line.strip()
get_config = get_full_text.split(" = ")
cut_text = get_config[1][:-1]
get_json_values = json.loads(cut_text)
如果我正确理解您的问题,您有121行的字符串代表5个JavaScript变量,并且您想获得仅包含第二个变量的子字符串。
您可以使用Python字符串操作,如下所示:
start = special_code.split('\n').index('var combinationsFromController ={')
end = special_code.split('\n')[start + 1:].index('var contentOnly = false;')
print('\n'.join(lines[start:end + 3]))
使用方法str.index
查找所需的javascript变量。 如果顺序变量是任意的 ,即您不知道目标变量之后的下一个变量的名称,您仍然可以使用类似的字符串操作来获取所需的子字符串。
lines = special_code.split('\n')
start = lines.index('var combinationsFromController ={')
end = lines[-1]
for i, line in enumerate(lines[start + 1:]):
if 'var' in line:
end = start + i
break
print('\n'.join(lines[start:end + 1]))
在表达式中使用re
(\\{.*?\\});
捕获之间的数据var combinationsFromController =
和;var contentOnly = false;
import re
....
print(special_code)
jsonStr = re.search(r'(\{.*?\});', special_code, re.S).group(1)
combinationsFromController = json.loads(jsonStr)
for key in combinationsFromController:
print(key)
# 224114
# 224140
# 224160
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.