[英]Python: Split a string by a word which contains a substring
I have a string text = "Fix me a meeting in 2 days"
. 我有一个字符串text = "Fix me a meeting in 2 days"
。 I have a list of some words meetingStrings
. 我有一些单词meetingStrings
的列表。 "meet"
is there in meetingStrings
. 在meetingStrings
有"meet"
。 So, I have to split the text by meeting. 所以,我必须通过见面来分割文本。
Desired Output : 期望的输出:
in 2 days 在2天内
meetingStrings = [
"appointment",
"meet",
"interview"
]
text = "Fix me a meeting in 2 days"
for x in meetingStrings:
if x in text.lower():
txt = text.split(x, 1)[1]
print(txt)
This gives Output: 这给出了输出:
ing in 2 days. 在2天内。
Using re.split()
: 使用re.split()
:
import re
meetingStrings = [
"appointment",
"meet",
"interview"
]
text = "Fix me a meeting in 2 days"
print(re.split('|'.join(r'(?:\b\w*'+re.escape(w)+r'\w*\b)' for w in meetingStrings), text, 1)[-1])
Prints: 打印:
in 2 days
With a small change to your code: 只需对代码进行少量更改:
meetingStrings = [
"appointment",
"meet",
"interview"
]
text = "Fix me a meeting in 2 days"
for x in meetingStrings:
if x in text.lower():
txt = text.split(x, 1)[1]
print(txt.split(" ", 1)[1]) #<--- Here
Just take your final output, and split at the first occurrence of a space 只需获取最终输出,并在第一次出现空格时分割
This expression might also work with an i
flag: 此表达式也可能与i
标志一起使用:
(?:meet|interview|appointment)\S*\s+((?:in|after)\s[0-9]+\s+(?:days?|months?|weeks?|years?))
and we can include any desired words that we might want in the non-capturing groups using logical ORs, such as: 我们可以使用逻辑OR包含我们在非捕获组中可能需要的任何所需单词,例如:
(?:in|after|on|from)
(?:days?|months?|weeks?|years?|hours?)
(?:meet|interview|appointment|session|schedule)
import re
regex = r"(?:meet|interview|appointment)\S*\s+((?:in|after)\s[0-9]+\s+(?:days?|months?|weeks?|years?))"
test_str = "Fix me a meeting in 2 days meetings in 2 months meet in 1 week nomeeting in 2 days meet after 2 days"
print(re.findall(regex, test_str, re.IGNORECASE))
['in 2 days', 'in 2 months', 'in 1 week', 'in 2 days', 'after 2 days']
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it. 如果您希望探索/简化/修改表达式,请在本演示的右上方面板中进行说明。
This is for using a search. 这是为了使用搜索。
All you need to do is put the text in the middle of a word 您需要做的就是将文本放在单词的中间
then match the word. 然后匹配这个词。
The result is in capture group 1. 结果在捕获组1中。
No whitespace trim 没有空白修剪
\\b\\w*(?:appointment|meet|interview)\\w*\\b(.*)
https://regex101.com/r/lK4zRz/1 https://regex101.com/r/lK4zRz/1
Readable version 可读版本
\b
\w*
(?:
appointment
| meet
| interview
)
\w*
\b
( .* ) # (1)
With whitespace trim 有空白修剪
(?m)\\b\\w*(?:appointment|meet|interview)\\w*\\b[^\\S\\r\\n]*(.*?)[^\\S\\r\\n]*$
https://regex101.com/r/v2qAOQ/1 https://regex101.com/r/v2qAOQ/1
Additionally, if you add a .*
to the beginning of either regex, 另外,如果你在正则表达式的开头添加.*
,
it will always get the last keyword. 它总是会得到最后一个关键字。
Try this: 尝试这个:
import re
text = "Fix me a meeting in 2 days"
print(re.split("({})\\w*".format("|".join(meetingStrings)), text)[-1].strip())
Outputs: in 2 days
产出: in 2 days
Without Regex, str.partition
-ing: 没有正则表达式, str.partition
-ing:
for x in meetingStrings:
pre, _, post = text.lower().partition(x)
if post:
pre = pre.rpartition(' ')[0] if not pre.endswith(' ') else pre.rstrip()
post = post.partition(' ')[-1] if not post.startswith(' ') else post.lstrip()
print([pre, post])
Example: 例:
In [35]: meetingStrings = [
...: "appointment",
...: "meet",
...: "interview"
...: ]
...: text = "Fix me a meeting in 2 days"
...: for x in meetingStrings:
...: pre, _, post = text.lower().partition(x)
...: if post:
...: pre = pre.rpartition(' ')[0] if not pre.endswith(' ') else pre.rstrip()
...: post = post.partition(' ')[-1] if not post.startswith(' ') else post.lstrip()
...: print([pre, post])
...:
['fix me a', 'in 2 days']
Try something like this: 尝试这样的事情:
import re
meetingStrings = [
"appointment",
"meet",
"interview"
]
text = "Fix me a meeting in 2 days"
def split_string(text, strings):
search = re.compile('|'.join(strings))
start = None
input = text.split()
for e, x in enumerate(input):
if search.search(x):
if start < e:
yield ' '.join(input[start:e])
start = None
else:
if start is None:
start = e
else:
if start is not None:
yield ' '.join(input[start:])
print(' '.join(split_string(text, meetingStrings)))
This might be longer, than other answers, but seems to do exactly what you wanted - split on strings, which contain as substring one of strings passed in. 这可能比其他答案更长,但似乎完全符合您的要求 - 在字符串上拆分,其中包含作为子字符串传入的字符串之一。
I have an alternative and much simpler approach, first split all the words in your sentence then chop off the sentence from the location that meetingStrings
appear: 我有另一种更简单的方法,首先将你的句子中的所有单词拆分,然后从meetingStrings
出现的位置meetingStrings
句子:
l=text.split()
for i in meetingStrings:
for idx, j in enumerate(l):
if i in j:
l=l[idx+1:]
print(' '.join(l))
Gives: 得到:
'in 2 days'
you can just use find() and list slice: 你可以使用find()和list slice:
text = "Fix me a meeting in 2 days"
meetingStrings = [
"appointment",
"meet",
"interview"
]
sep = [i for i in meetingStrings if i in text]
idx = text.find(sep[0])
idx_ = text[idx:].find(' ')
print (text[idx+idx_:])
output: 输出:
in 2 days
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.