简体   繁体   中英

Extract and concatenate titles in a list of string

I have a list of string with some titles inside, as Names, Places, etc.. and I want to extract them from the list and concatenate them if they are near (the words near are multiples). All the found names have to be inserted in a names list.

import re 
from itertools import tee, islice, chain, izip

l = ['hello', 'John', 'how', 'are', 'you', 'The', 'White', 'House', 'cat']

def iter_next(some_iterable):
    items, nexts = tee(some_iterable, 2)
    nexts = chain(islice(nexts, 1, None), [None])
    return izip(items, nexts)

names = []
for word, nxt in iter_next(l):
    if word is not None and word.istitle():
        names.append(word)
        if nxt is not None and nxt.istitle():
            names.append(word + ' ' + nxt)
print names

These are the results.

Results:
['John', 'The', 'The White', 'White', 'White House', 'House']
Desired Results:
['John', 'The', 'White ', 'House', 'The White House']

edit1: I would concatenates words if they are Title (with str.istitle) and they are nears in the list ordered by default.

'you', 'The', 'White', 'House', 'cat' -> 'The White House'

You can use itertools.groupby to group your items using str.istitle . Extend a new list with the items in the group and append the joined group items if the group length is greater than 1:

from itertools import groupby

l = ['hello', 'John', 'how', 'are', 'you', 'The', 'White', 'House', 'cat']
names = []
for k, g in groupby(l, lambda x: x.istitle()):
    if k:
        g = list(g)
        names.extend(g)
        if len(g) > 1:
            names.append(' '.join(g))

print(names)
# ['John', 'The', 'White', 'House', 'The White House']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM