使用python获取正则表达式匹配周围的文本

Question

I have a todo.txt list like this, separated by newlines: 我有一个这样的todo.txt列表，用换行符分隔：

(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
2015-02-18 Redesign the business card for +RepairWork @Computer
(A) 2015-02-17 +Study how to +Ask questions @Computer @Phone
(B) 2015-03-25 Update +LaundryTimer W/ new popup design +Tasker

And I have the regex to capture the +Projects and @Contexts: 我有正则表达式来捕获+ Projects和@Contexts：

## Projects
project_matches = re.findall('[+]\D\w+',todo_list)
print list(set(project_matches))

## Contexts
context_matches = re.findall('[@][A-Z]\w+',todo_list)
print list(set(context_matches))

But I would also like to quickly and efficiently capture each task and group by +Project or @Context. 但我也想通过+ Project或@Context快速有效地捕获每个任务和组。

For example, here is the desired output: 例如，这是所需的输出：

Phone:

(A) 2015-02-17 +Study how to +Ask questions @Computer @Phone
(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer

Computer:

(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
2015-02-18 Redesign the business card for +RepairWork @Computer

Tasker:

(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
(B) 2015-03-25 Update +LaundryTimer W/ new popup design +Tasker

Etc... 等等...

I also have the Regex to capture the task when it finds a Project or Context, but I don't know if it helps: (.*)(?=[+]\\D\\w+)(.*) 我也有Regex可以在找到项目或上下文时捕获任务，但是我不知道它是否有帮助： (.*)(?=[+]\\D\\w+)(.*)

Answer 1

You could build some dictionaries. 您可以建立一些词典。 defaultdict makes it easier to start each item with a list . defaultdict使得从list开始每个项目变得更加容易。

import collections
projects = collections.defaultdict(list)
contexts = collections.defaultdict(list)
with open('todo_list.txt') as todo_list:
    for line in todo_list:
        for item in re.findall(r'[+]\D\w+', line):
            projects[item].append(line)
        for item in re.findall(r'[@][A-Z]\w+', line):
            contexts[item].append(line)

If you've already read the whole file into a single string, use splitlines() to iterate over each line: 如果您已经将整个文件读入单个字符串，请使用splitlines()遍历每行：

import collections
projects = collections.defaultdict(list)
contexts = collections.defaultdict(list)
for line in todo_list.splitlines():
    for item in re.findall(r'[+]\D\w+', line):
        projects[item].append(line)
    for item in re.findall(r'[@][A-Z]\w+', line):
        contexts[item].append(line)

Answer 2

You can grab a whole line where a given word occurs using ^.*word.*$ 您可以使用^.*word.*$给定单词出现的整行

Meaning: From the start of the string ^ match any character . 含义：从字符串^的开头匹配任何字符. any number of times * then match a word. 任意次数*然后匹配一个单词。 Match any character multiple times again .* until the end of the line $ 再次匹配任何字符.*直到$行的结尾

To accomplish your task you could do something like 要完成您的任务，您可以做类似的事情

tasks = re.findall(r"(^.*?%s.*?$)" % context, todo_list, re.MULTILINE)

where context is the word you're looking for (Phone, Computer, Tasker and so on) context是您要查找的词（电话，计算机，塔斯克等）

Edit: the re.MULTILINE makes re matches in every line. 编辑： re.MULTILINE在每一re.MULTILINE进行re匹配。 It acts like the g modifier. 它的作用类似于g修饰符。 You can see my example in action here: https://regex101.com/r/gS2yN9/1 您可以在这里查看我的示例： https : //regex101.com/r/gS2yN9/1

使用python获取正则表达式匹配周围的文本

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-12-18 02:26:56

解决方案2
0 2015-12-18 02:25:53

使用python获取正则表达式匹配周围的文本

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-12-18 02:26:56

解决方案2 0 2015-12-18 02:25:53

解决方案1
2 已采纳 2015-12-18 02:26:56

解决方案2
0 2015-12-18 02:25:53