简体   繁体   English

组合列表中的字符串以形成以大写字母开头的单词

[英]Combine strings in list to form words starting with capital letter

NLP newbie here. NLP 新手在这里。 I have a list of strings, and I would like to combine them so that each string starts with a capital letter.我有一个字符串列表,我想将它们组合起来,以便每个字符串都以大写字母开头。 What is the most efficient way to do so?这样做的最有效方法是什么?

Here is the list: [' Ye', 'oks', 'am', '-', 'd', 'ong', ' Gang', 'nam', '-', 'gu Seoul', ' Korea'] .这是列表: [' Ye', 'oks', 'am', '-', 'd', 'ong', ' Gang', 'nam', '-', 'gu Seoul', ' Korea']

Desidered output: ['Yeoksam-dong', 'Gangnam-gu Seoul', 'Korea'] .期望输出: ['Yeoksam-dong', 'Gangnam-gu Seoul', 'Korea'] ['Yeoksam-dong', 'Gangnam-gu', 'Seoul', 'Korea'] is also fine. ['Yeoksam-dong', 'Gangnam-gu', 'Seoul', 'Korea']也不错。

This is the solution I'm working to improve:这是我正在努力改进的解决方案:

places = [' Ye', 'oks', 'am', '-', 'd', 'ong', ' Gang', 'nam', '-', 'gu Seoul', ' Korea']
num_places = 0
Temp = []
for ii in range(len(places)):
    loc = str(" ".join(places[ii].split()))
    print(loc, loc[0].isupper())
    if str(" ".join(places[ii + 1].split()))[0].isupper() == True:
        places_words.append(loc)
        num_places += 1
    else:
        Temp.append(loc)
        print(Temp)

One approach:一种方法:

from itertools import tee


def pairwise(iterable):
    # pairwise('ABCDEFG') --> AB BC CD DE EF FG
    a, b = tee(iterable)
    next(b, None)
    return zip(a, b)


data = [' Ye', 'oks', 'am', '-', 'd', 'ong', ' Gang', 'nam', '-', 'gu Seoul', ' Korea']

# find the indices of the words that ara capitalized 
indices = [i for i, e in enumerate(data) if e.strip()[0].isupper()] + [len(data)]

# iterate pairwise and join the strings 
res = ["".join(data[start:end]).strip() for start, end in pairwise(indices)]
print(res)

Output输出

['Yeoksam-dong', 'Gangnam-gu Seoul', 'Korea']

Alternative using more_itertools ,使用more_itertools 的替代方法,

from more_itertools import split_before

data = [' Ye', 'oks', 'am', '-', 'd', 'ong', ' Gang', 'nam', '-', 'gu Seoul', ' Korea']

chunks = split_before(map(str.strip, data), lambda e: e[0].isupper())

res = ["".join(chunk).strip() for chunk in chunks]
print(res)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 python 中的 re.sub 删除字符串列表中以大写字母开头的单词 - How to remove words starting with capital letter in a list of strings using re.sub in python Pythonic句子拆分以大写字母开头的单词 - Pythonic sentence splitting on words starting with capital letter 查找以大写字母作为起始字母但前面没有空格的单词 - find words with capital letter as starting letter but not preceded by space 在2个大写字母(regex)之前找到以大写字母开头的n个单词 - Find n words starting with capital letter before 2 words of capital letters (regex) 如何在字符串中搜索大写字母并返回带和不带大写字母的单词列表 - how to search for a capital letter within a string and return the list of words with and without capital letters 以两个包含单词的列表作为输入,以形成包含两个单词的元组,每个列表中的一个单词的起始字母相同 - Taking two lists as input that contain words, to form a tuple with two words, one from each list that have the same starting letter of each word 正则表达式将单词与首字母大写匹配 - Regex to match words with first capital letter 从多个列表中获取所有以相同字母开头的单词列表 - Get list of words all starting with the same letter from multiple lists 如何组织从特定字母开始的按字母顺序排列的字符串列表? - How can I organize a list of strings alphabetically starting on a specific letter? 如何从包含特定字母的列表中打印出单词? 我知道如何处理以特定字母开头但不以 BETWEEN 开头的单词 - How can I print out words from a list that contains a particular letter? I know what to do with words starting with specific letter but not in BETWEEN
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM