简体   繁体   English

使用python获取正则表达式匹配周围的文本

[英]Get text surrounding regex match with python

I have a todo.txt list like this, separated by newlines: 我有一个这样的todo.txt列表,用换行符分隔:

(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
2015-02-18 Redesign the business card for +RepairWork @Computer
(A) 2015-02-17 +Study how to +Ask questions @Computer @Phone
(B) 2015-03-25 Update +LaundryTimer W/ new popup design +Tasker

And I have the regex to capture the +Projects and @Contexts: 我有正则表达式来捕获+ Projects和@Contexts:

## Projects
project_matches = re.findall('[+]\D\w+',todo_list)
print list(set(project_matches))

## Contexts
context_matches = re.findall('[@][A-Z]\w+',todo_list)
print list(set(context_matches))

But I would also like to quickly and efficiently capture each task and group by +Project or @Context. 但我也想通过+ Project或@Context快速有效地捕获每个任务和组。

For example, here is the desired output: 例如,这是所需的输出:

Phone:

(A) 2015-02-17 +Study how to +Ask questions @Computer @Phone
(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer

Computer:

(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
2015-02-18 Redesign the business card for +RepairWork @Computer

Tasker:

(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
(B) 2015-03-25 Update +LaundryTimer W/ new popup design +Tasker

Etc... 等等...

I also have the Regex to capture the task when it finds a Project or Context, but I don't know if it helps: (.*)(?=[+]\\D\\w+)(.*) 我也有Regex可以在找到项目或上下文时捕获任务,但是我不知道它是否有帮助: (.*)(?=[+]\\D\\w+)(.*)

You could build some dictionaries. 您可以建立一些词典。 defaultdict makes it easier to start each item with a list . defaultdict使得从list开始每个项目变得更加容易。

import collections
projects = collections.defaultdict(list)
contexts = collections.defaultdict(list)
with open('todo_list.txt') as todo_list:
    for line in todo_list:
        for item in re.findall(r'[+]\D\w+', line):
            projects[item].append(line)
        for item in re.findall(r'[@][A-Z]\w+', line):
            contexts[item].append(line)

If you've already read the whole file into a single string, use splitlines() to iterate over each line: 如果您已经将整个文件读入单个字符串,请使用splitlines()遍历每行:

import collections
projects = collections.defaultdict(list)
contexts = collections.defaultdict(list)
for line in todo_list.splitlines():
    for item in re.findall(r'[+]\D\w+', line):
        projects[item].append(line)
    for item in re.findall(r'[@][A-Z]\w+', line):
        contexts[item].append(line)

You can grab a whole line where a given word occurs using ^.*word.*$ 您可以使用^.*word.*$给定单词出现的整行

Meaning: From the start of the string ^ match any character . 含义:从字符串^的开头匹配任何字符. any number of times * then match a word. 任意次数*然后匹配一个单词。 Match any character multiple times again .* until the end of the line $ 再次匹配任何字符.*直到$行的结尾

To accomplish your task you could do something like 要完成您的任务,您可以做类似的事情

tasks = re.findall(r"(^.*?%s.*?$)" % context, todo_list, re.MULTILINE)

where context is the word you're looking for (Phone, Computer, Tasker and so on) context是您要查找的词(电话,计算机,塔斯克等)

Edit: the re.MULTILINE makes re matches in every line. 编辑: re.MULTILINE在每一re.MULTILINE进行re匹配。 It acts like the g modifier. 它的作用类似于g修饰符。 You can see my example in action here: https://regex101.com/r/gS2yN9/1 您可以在这里查看我的示例: https : //regex101.com/r/gS2yN9/1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM