繁体   English   中英

正则表达式以获取特定模式

[英]regular expression to fetch the specific pattern

我需要从该类别的过滤器列表中获取特定的规则。

我试图从过滤器列表中提取这种类型的规则。 规则模式如下

“ /example.com $ script,domain = example.com”

第二个例外规则是

“ @@ / example.com $ script,domain = example.com”

域锚的第三个规则是

“|| example.com

而带有锚点和域标记的第四条规则是

“|| jizz.best ^ $弹出,域名= vivo.sx

第五个是

“@@ || pagead2.googlesyndication.com/pagead/js/adsbygoogle.js$script,domain=quebeccoupongratuit.com

第六个有域限制的如下

“example.com ### examplebanner

没有域限制的第7个是

“### examplebanner

第八名元素隐藏

example.com#@##examplebanner

这些是我必须分别获取的不同类别的规则

a=open('1-19-16anti-adblock-killer-filters.txt','r')
text=a.read()
 line_starts_with_2pipes_no_domain = 0
  line_starts_with_2pipes_with_domain = 0
 line_starts_with_2ats_with_domain = 0
 line_with_domain = 0

 for line in text.split("\n"): 
   if line.startswith("||"):
    if ",domain" in line: 
        line_starts_with_2pipes_with_domain += 1
    else:
        line_starts_with_2pipes_no_domain += 1
  elif line.startswith("@@") and ",domain" in line:
    line_starts_with_2ats_with_domain += 1
   elif ",domain" in line: 
    line_with_domain += 1
   elif line.strip(): 
      print(f"No idea what to do with :{line}")

print("2pipes_no_group", line_starts_with_2pipes_no_domain ) 
print("2pipes_with_group", line_starts_with_2pipes_with_domain ) 
 print("2@_with_group", line_starts_with_2ats_with_domain ) 
 print("line_with_domain", line_with_domain)

我现在正在尝试获取第5、6、7和8条规则。 感谢您的答复。

您正则表达式不适合,域之前:

"\/[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+domain="
#                  ^^^^^^^^^^^^ no , allowed

您还可以简化很多:

with open("easylist.txt") as f: 
    print('There are total Rule With Domain tag are =', f.read().count(",domain="))

应该给您答案',domain='频率。 如果文件很大,也可以按行计数:

domain_rule_count = 0
with open("easylist.txt") as f:
    for line in f:
        domain_rule_count += 1 if ",domain=" in line else 0

在评论中提问后编辑:您只需测试所需内容:

text = """ some text
/example.com $script,domain=example.com
@@/example.com $script,domain=example.com 
||example.com
||jizz.best^$popup,domain=vivo.sx
"""

line_starts_with_2pipes_no_domain = 0
line_starts_with_2pipes_with_domain = 0
line_starts_with_2ats_with_domain = 0
line_with_domain = 0

for line in text.split("\n"): 
    if line.startswith("||"):
        if ",domain" in line: 
            line_starts_with_2pipes_with_domain += 1
        else:
            line_starts_with_2pipes_no_domain += 1
    elif line.startswith("@@") and ",domain" in line:
        line_starts_with_2ats_with_domain += 1
    elif ",domain" in line: 
        line_with_domain += 1
    elif line.strip(): 
        print(f"No idea what to do with '{line}'")

print("2pipes_no_group", line_starts_with_2pipes_no_domain ) 
print("2pipes_with_group", line_starts_with_2pipes_with_domain ) 
print("2@_with_group", line_starts_with_2ats_with_domain ) 
print("line_with_domain", line_with_domain)

输出:

No idea what to do with ' some text'
2pipes_no_group 1
2pipes_with_group 1
2@_with_group 1
line_with_domain 1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM