I have a list of string with some titles inside, as Names, Places, etc.. and I want to extract them from the list and concatenate them if they are near (the words near are multiples). All the found names have to be inserted in a names
list.
import re
from itertools import tee, islice, chain, izip
l = ['hello', 'John', 'how', 'are', 'you', 'The', 'White', 'House', 'cat']
def iter_next(some_iterable):
items, nexts = tee(some_iterable, 2)
nexts = chain(islice(nexts, 1, None), [None])
return izip(items, nexts)
names = []
for word, nxt in iter_next(l):
if word is not None and word.istitle():
names.append(word)
if nxt is not None and nxt.istitle():
names.append(word + ' ' + nxt)
print names
These are the results.
Results:
['John', 'The', 'The White', 'White', 'White House', 'House']
Desired Results:
['John', 'The', 'White ', 'House', 'The White House']
edit1: I would concatenates words if they are Title (with str.istitle) and they are nears in the list ordered by default.
'you', 'The', 'White', 'House', 'cat' -> 'The White House'
You can use itertools.groupby
to group your items using str.istitle
. Extend a new list with the items in the group and append the joined group items if the group length is greater than 1:
from itertools import groupby
l = ['hello', 'John', 'how', 'are', 'you', 'The', 'White', 'House', 'cat']
names = []
for k, g in groupby(l, lambda x: x.istitle()):
if k:
g = list(g)
names.extend(g)
if len(g) > 1:
names.append(' '.join(g))
print(names)
# ['John', 'The', 'White', 'House', 'The White House']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.