简体   繁体   English

从列表中的每个元素中检索特定的子字符串

[英]Retrieve a specific substring from each element in a list

It is few hours I am stuck with this: I have a Series called size_col of 887 elements and I want to retrieve from the sizes: S, M, L, XL . 这是几个小时的问题,我被困住了:我有一个名为size_col的887元素系列,我想从尺寸中检索: S, M, L, XL I have tried 2 different approaches, list comprehension and a simple if elif loop, but both attempts do not work. 我尝试了2种不同的方法,即列表理解和简单的if elif循环,但两种尝试均无效。

sizes = ['S', 'M', 'L', 'XL']

tshirt_sizes = []
[tshirt_sizes.append(i) for i in size_col if i in sizes]

Second attempt: 第二次尝试:

sizes = []
for i in size_col:
if len(i) < 15:
   sizes.append(i.split(" / ",1)[-1])
else:
   sizes.append(i.split(" - ",1)[-1])

I created two conditions because in some cases the size follows the ' - ' and in some other the is a '/' . 我创建了两个条件,因为在某些情况下大小遵循' - '而在另一些情况下则为'/' I honestly don't know how do deal with that. 老实说,我不知道该如何处理。

Example of the list: 列表示例:

T-Shirt Donna "Si dai. Ciao." - M
T-Shirt Donna "Honey" - L
T-Shirt Donna "Si dai. Ciao." - M
T-Shirt Donna "I do very bad things" - M
T-Shirt Donna "Si dai. Ciao." - M
T-Shirt Donna "Stai nel tuo (mind your business)" - White / S
T-Shirt Donna "Stay Stronz" - White / L
T-Shirt Donna "Stay Stronz" - White / M
T-Shirt Donna "Si dai. Ciao." - S
T-Shirt Donna "Je suis esaurit" - Black / S
T-Shirt Donna "Si dai. Ciao." - S
T-Shirt Donna "Teamo - Tequila" - S / T-Shirt

You'll need regular expressions here. 您需要在这里使用正则表达式 Precompile a regex pattern and then use pattern.search inside a list comprehension. 预编译正则表达式模式,然后在列表pattern.search使用pattern.search

sizes = ['S', 'M', 'L', 'XL']
p = re.compile(r'\b({})\b'.format('|'.join(sizes))) 

tshirt_sizes = [p.search(i).group(0) for i in size_col]

print(tshirt_sizes)
['M', 'L', 'M', 'M', 'M', 'S', 'L', 'M', 'S', 'S', 'S', 'S']

For added security, you may want a loop instead - list comprehensions are not good with error handling: 为了提高安全性,您可能需要循环处理-列表理解不适用于错误处理:

tshirt_sizes = []
for i in size_col:
    try:
        tshirt_sizes.append(p.search(i).group(0))
    except AttributeError:
        tshirt_sizes.append(None)

Really the only reason to use regex here is to handle the last row in your data appropriately. 真正在这里使用正则表达式的唯一原因是适当地处理数据的最后一行。 In general, if you can, you should prefer the use of string operations (namely, str.split ) unless avoidable, they're much faster and readable than regular expression based pattern matching and extraction. 通常,如果可以的话,除非可以避免,否则您应该更喜欢使用字符串操作(即str.split ),它们比基于正则表达式的模式匹配和提取要快得多且可读性强。

You can do something like that: 您可以执行以下操作:

available_sizes = ["S", "M", "L", "XL"]
sizes = []

for i in size_col:
    for w in i.split():
        if w in available_sizes:
            sizes.append(w)

This wouldn't work if the text contains the words in available_sizes more than once, for example T-Shirt Donna "La S è la più bella consonante" - M , since it would add both S and M to the list. 如果文本多次包含available_sizes中的单词,则此方法将不起作用,例如T-Shirt Donna "La S è la più bella consonante" - M ,因为它将S和M都添加到列表中。


Original answer, before OP specified that the size is not always the last word. 在OP指定大小不总是最后一个字之前的原始答案。

Almost. 几乎。 Just split the string in words and take the last one. 只需将字符串拆分成单词,然后取最后一个。

sizes = []
for i in size_col:
    sizes.append(i.split()[-1])

There are two aspects to this question, 1) the best method of looping over the element and 2) the correct way to split the string. 这个问题有两个方面,1)遍历元素的最佳方法,以及2)拆分字符串的正确方法。

In the general case, list comprehensions are probably the right approach for this type of problem, but you have correctly identified the splitting the string correctly is tricky. 在一般情况下,列表理解可能是解决此类问题的正确方法,但是您已经正确地识别出正确分割字符串是很棘手的。

For this type of problem regular expressions are very powerful and (at the risk of complicating this compared to the previous answers) you could use something like: 对于这种类型的问题, 正则表达式非常强大,并且(与以前的答案相比,有使其复杂化的风险),您可以使用类似以下内容的方法:

import re
pattern = re.compile(r'[-/] (A-Z)$') # select any uppercase letters after either - or / and a space and before the end of the line (marked by $)

sizes = [pattern.search(item).group(1) for item in size_col] # group 1 selects the set of characters in the first set of parentheses (the letters)

Edited: just saw the edit to the posts stating that the item is not always at the end, and COLDSPEED's answer duplicates this one... 编辑:仅看到帖子的编辑,指出该条目并不总是在结尾处,而COLDSPEED的答案重复了这一条...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从列表的每个元素获取子字符串 - getting a substring from each element of a list 如何根据每个元素中的子字符串过滤列表? - How to filter a list based on a substring in each element? 获取列表中具有特定子字符串的元素的索引 - Get index of element in list that has specific substring Python - 从列表中的字符串元素中删除子字符串? - Python - Remove substring from string element in a list? 从字符串列表中提取特定的 substring - Extracting specific substring from a list of strings python,通过一个键来排序列表,该键是每个元素的子串 - python, sorting a list by a key that's a substring of each element 每次出现元素时,将列表分为子列表,从特定子字符串开始 - Split list into sublists at every occurrence of element starting with specific substring Python:将列表切成子列表,每次元素都以特定的子字符串开头 - Python: Slice list into sublists, every time element begins with specific substring 解析字典中的字典列表以从每个字典中检索特定键的值 - Parsing list of dictionaries in a dictionary to retrieve a specific key's value from each dictionary Python - 从作为另一个元素的子字符串的字符串列表中删除任何元素 - Python - Remove any element from a list of strings that is a substring of another element
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM