[英]Get text surrounding regex match with python
I have a todo.txt list like this, separated by newlines: 我有一个这样的todo.txt列表,用换行符分隔:
(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
2015-02-18 Redesign the business card for +RepairWork @Computer
(A) 2015-02-17 +Study how to +Ask questions @Computer @Phone
(B) 2015-03-25 Update +LaundryTimer W/ new popup design +Tasker
And I have the regex to capture the +Projects and @Contexts: 我有正则表达式来捕获+ Projects和@Contexts:
## Projects
project_matches = re.findall('[+]\D\w+',todo_list)
print list(set(project_matches))
## Contexts
context_matches = re.findall('[@][A-Z]\w+',todo_list)
print list(set(context_matches))
But I would also like to quickly and efficiently capture each task and group by +Project or @Context. 但我也想通过+ Project或@Context快速有效地捕获每个任务和组。
For example, here is the desired output: 例如,这是所需的输出:
Phone:
(A) 2015-02-17 +Study how to +Ask questions @Computer @Phone
(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
Computer:
(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
2015-02-18 Redesign the business card for +RepairWork @Computer
Tasker:
(D) 2015-02-18 XDA Ultimate guide to +Tasker @Phone @Computer
(B) 2015-03-25 Update +LaundryTimer W/ new popup design +Tasker
Etc... 等等...
I also have the Regex to capture the task when it finds a Project or Context, but I don't know if it helps: (.*)(?=[+]\\D\\w+)(.*)
我也有Regex可以在找到项目或上下文时捕获任务,但是我不知道它是否有帮助:
(.*)(?=[+]\\D\\w+)(.*)
You could build some dictionaries. 您可以建立一些词典。
defaultdict
makes it easier to start each item with a list
. defaultdict
使得从list
开始每个项目变得更加容易。
import collections
projects = collections.defaultdict(list)
contexts = collections.defaultdict(list)
with open('todo_list.txt') as todo_list:
for line in todo_list:
for item in re.findall(r'[+]\D\w+', line):
projects[item].append(line)
for item in re.findall(r'[@][A-Z]\w+', line):
contexts[item].append(line)
If you've already read the whole file into a single string, use splitlines()
to iterate over each line: 如果您已经将整个文件读入单个字符串,请使用
splitlines()
遍历每行:
import collections
projects = collections.defaultdict(list)
contexts = collections.defaultdict(list)
for line in todo_list.splitlines():
for item in re.findall(r'[+]\D\w+', line):
projects[item].append(line)
for item in re.findall(r'[@][A-Z]\w+', line):
contexts[item].append(line)
You can grab a whole line where a given word occurs using ^.*word.*$
您可以使用
^.*word.*$
给定单词出现的整行
Meaning: From the start of the string ^
match any character .
含义:从字符串
^
的开头匹配任何字符.
any number of times *
then match a word. 任意次数
*
然后匹配一个单词。 Match any character multiple times again .*
until the end of the line $
再次匹配任何字符
.*
直到$
行的结尾
To accomplish your task you could do something like 要完成您的任务,您可以做类似的事情
tasks = re.findall(r"(^.*?%s.*?$)" % context, todo_list, re.MULTILINE)
where context
is the word you're looking for (Phone, Computer, Tasker and so on) context
是您要查找的词(电话,计算机,塔斯克等)
Edit: the re.MULTILINE
makes re
matches in every line. 编辑:
re.MULTILINE
在每一re.MULTILINE
进行re
匹配。 It acts like the g
modifier. 它的作用类似于
g
修饰符。 You can see my example in action here: https://regex101.com/r/gS2yN9/1 您可以在这里查看我的示例: https : //regex101.com/r/gS2yN9/1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.