I was wondering if it would be possible to split a string such as
string = 'hello world [Im nick][introduction]'
into an array such as
['hello', 'world', '[Im nick][introduction]']
It doesn't have to be efficient, but just a way to get all the words from a sentence split unless they are in brackets, where the whole sentence is not split.
I need this because I have a markdown file with sentences such as
- What is the weather in [San antonio, texas][location]
I need the san antonio texas to be a full sentence inside of an array, would this be possible? The array would look like:
array = ['what', 'is', 'the', 'weather', 'in', 'San antonio, texas][location]']
Maybe this could work for you:
>>> s = 'What is the weather in [San antonio, texas][location]'
>>> i1 = s.index('[')
>>> i2 = s.index('[', i1 + 1)
>>> part_1 = s[:i1].split() # everything before the first bracket
>>> part_2 = [s[i1:i2], ] # first bracket pair
>>> part_3 = [s[i2:], ] # second bracket pair
>>> parts = part_1 + part_2 + part_3
>>> s
'What is the weather in [San antonio, texas][location]'
>>> parts
['What', 'is', 'the', 'weather', 'in', '[San antonio, texas]', '[location]']
It searches for the left brackets and uses that as a reference before splitting by spaces.
This assumes:
Here is a more robust solution:
def do_split(s):
parts = []
while '[' in s:
start = s.index('[')
end = s.index(']', s.index(']')+1) + 1 # looks for second closing bracket
parts.extend(s[:start].split()) # everything before the opening bracket
parts.append(s[start:end]) # 2 pairs of brackets
s = s[end:] # remove processed part of the string
parts.extend(s.split()) # add remainder
return parts
This yields:
>>> do_split('What is the weather in [San antonio, texas][location] on [friday][date]?')
['What', 'is', 'the', 'weather', 'in', '[San antonio, texas][location]', 'on', '[friday][date]', '?']
Maybe this short snippet can help you. But note that this only works if everything you said holds true for all the entries in the file.
s = 'What is the weather in [San antonio, texas][location]'
s = s.split(' [')
s[1] = '[' + s[1] # add back the split character
mod = s[0] # store in a variable
mod = mod.split(' ') # split the first part on space
mod.append(s[1]) # attach back the right part
print(mod)
Outputs:
['What', 'is', 'the', 'weather', 'in', '[San antonio, texas][location]']
and for s = 'hello world [Im nick][introduction]'
['hello', 'world', '[Im nick][introduction]']
For an one liner use functional programming tools such as reduce
from the functool
module
reduce( lambda x, y: x.append(y) if y and y.endswith("]") else x + y.split(), s.split(" ["))
or, slightly shorter with using standard operators, map
and sum
sum(map( lambda x: [x] if x and x.endswith("]") else x.split()), []) s.split(" ["))
you can use regex split with lookbehind/lookahead, note it is simple to filter out empty entries with filter or a list comprehension than avoid in re
import re
s = 'sss sss bbb [zss sss][zsss ss] sss sss bbb [ss sss][sss ss]'
[x for x in re.split(r"(?=\[[^\]\[]+\])* ", s)] if x]
This code below will work with your example. Hope it helps :) I'm sure it can be better but now I have to go. Please enjoy.
string = 'hello world [Im nick][introduction]'
list = string.split(' ')
finall = []
for idx, elem in enumerate(list):
currentelem = elem
if currentelem[0] == '[' and currentelem[-1] != ']':
currentelem += list[(idx + 1) % len(list)]
finall.append(currentelem)
elif currentelem[0] != '[' and currentelem[-1] != ']':
finall.append(currentelem)
print(finall)
Let me offer an alternative to the ones above:
import re
string = 'hello world [Im nick][introduction]'
re.findall(r'(\[.+\]|\w+)', string)
Produces:
['hello', 'world', '[Im nick][introduction]']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.