简体   繁体   English

在字符串和整数的嵌套/平面列表中查找超过 10 的最小数字

[英]Finding the lowest number over 10 in a nested/flat list of strings and integers

I'm trying to extract the lowest number over 10 from a list of nested and unnested strings and integers for each item.我试图从每个项目的嵌套和非嵌套字符串和整数列表中提取超过 10 的最低数字。 I've tried a few different things but they either return nothing or a few different error messages (expected string or buffer, '>' not possible between int and list were two of them).我尝试了一些不同的东西,但它们要么不返回任何内容,要么返回一些不同的错误消息(预期的字符串或缓冲区,'>' 在 int 和 list 之间不可能是其中的两个)。 They need to remain in order because they will be entered into a pandas data frame afterwards.它们需要保持有序,因为之后它们将被输入到熊猫数据框中。

starting_list = [['4dfg', '12kfmgfg','dfgdf133'],[8, '16dgdfg'], 11, '', 'fdsf']

desired_result = [12, 16, 11, NaN, NaN]

Below are 2 of the different functions I've tried.下面是我尝试过的两个不同的功能。 Because the results are going into a data frame, a pandas-based answer would also be fine.由于结果将进入数据框,因此基于 Pandas 的答案也可以。

def min_int(data):
    for item in range(len(data)):
        for i in range(len(data[item])):
            if type(data[item][i]) == int:
                if data[item][i] >10:
                    data.remove(data[item][i])
            else:
                data[item][i] =int(re.sub(r'\D', "", data[item]))
                if data[item][i] >10:
                    data.remove(data[item][i])
        data[item] = min(data)

def remove_text(data):
    for i in range(len(data)):
        try:
            for ii in range(len(data[i])):
                try:
                    data[i][ii] =int(re.sub(r'\D', "", data[item]))
                except:
                    continue
        except:
            continue 

Thanks!谢谢!

Use:用:

s = pd.Series(data)
a = (pd.to_numeric(s.explode() #explode lists
                     .astype(str) #convert all values to strings
                     .str.replace(r'\D', ''), errors='coerce') #replace and convert to numbers if possible
                     .loc[lambda x: x > 10] #filter values
                     .min(level=0) #get minimal per index
                     .reindex(s.index) #add removed values of index
                     .tolist()) #convert to list

#convert non NaNs to integers
a = [int(x) if x == x else x for x in a]
print (a)
[12, 16, 11, nan, nan]

Your function should be simplify:您的功能应该简化:

def try_to_int(x):
    try:
        return int(re.sub(r'\D', "", x))
    except:
        return np.nan

def min_int(x):
    if isinstance(x, int):
        return x
    elif isinstance(x, list):
        gen = (try_to_int(y) for y in x)
        return min(y for y in gen if y == y and y > 10)
    else:
        return try_to_int(x)
        

print ([min_int(x) for x in starting_list])
[12, 16, 11, nan, nan]
    

Although you already have an accepted answer by pandas-wiz @jezrael, here is a lower-level approach, if you like.尽管您已经获得了 pandas-wiz @jezrael 的公认答案,但如果您愿意,这里有一个较低级别的方法。

Essentially, it uses regex to extract the numeric values, and filters to meet your requirement, then appends them to an output list.本质上,它使用正则表达式来提取数值,并使用过滤器来满足您的要求,然后将它们附加到输出列表中。

import re

exp = re.compile(r'(\d+)')

new = []
for i in list_:
    if isinstance(i, list):
        new2 = []
        for j in i:
            f = exp.findall(str(j))
            new2.append(int(f[0]) if f else float('nan'))
        new.append(min(i for i in new2 if i > 10))
    else:
        f = exp.findall(str(i))
        new.append(int(f[0]) if f else float('nan'))

Output:输出:

>>> [12, 16, 11, nan, nan]

Setup:设置:

list_ = [['4dfg', '12kfmgfg', 'dfgdf133'],
         [8, '16dgdfg'],
         11,
         '',
         'fdsf']

You may create a generator:您可以创建一个生成器:

from collections.abc import Iterable
import re


def select_numbers(items, limit=10):
    for item in items:
        if isinstance(item, Iterable) and not isinstance(item, str):
            yield from select_numbers(item, limit)
        else:
            item = re.sub(r"\D", "", str(item))
            if item == "":
                yield float("NaN")
            elif int(item) > limit:
                yield int(item)
                 

Testing it:测试它:

>>> starting_list = [['4dfg', '12kfmgfg','dfgdf133'],[8, '16dgdfg'], 11, '', 'fdsf']
>>>
>>> desired_result = select_numbers(starting_list)       # generator, not a list
>>> list(desired_result)
 [12, 133, 16, 11, nan, nan]
>>> desired_result = select_numbers(starting_list)      # re-creating exhausted generator
>>>
>>> for elem in desired_result:                         # direct use of generator in loop
>>>     print(elem)
 12 133 16 11 nan nan

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM