[英]Finding the lowest number over 10 in a nested/flat list of strings and integers
I'm trying to extract the lowest number over 10 from a list of nested and unnested strings and integers for each item.我试图从每个项目的嵌套和非嵌套字符串和整数列表中提取超过 10 的最低数字。 I've tried a few different things but they either return nothing or a few different error messages (expected string or buffer, '>' not possible between int and list were two of them).
我尝试了一些不同的东西,但它们要么不返回任何内容,要么返回一些不同的错误消息(预期的字符串或缓冲区,'>' 在 int 和 list 之间不可能是其中的两个)。 They need to remain in order because they will be entered into a pandas data frame afterwards.
它们需要保持有序,因为之后它们将被输入到熊猫数据框中。
starting_list = [['4dfg', '12kfmgfg','dfgdf133'],[8, '16dgdfg'], 11, '', 'fdsf']
desired_result = [12, 16, 11, NaN, NaN]
Below are 2 of the different functions I've tried.下面是我尝试过的两个不同的功能。 Because the results are going into a data frame, a pandas-based answer would also be fine.
由于结果将进入数据框,因此基于 Pandas 的答案也可以。
def min_int(data):
for item in range(len(data)):
for i in range(len(data[item])):
if type(data[item][i]) == int:
if data[item][i] >10:
data.remove(data[item][i])
else:
data[item][i] =int(re.sub(r'\D', "", data[item]))
if data[item][i] >10:
data.remove(data[item][i])
data[item] = min(data)
def remove_text(data):
for i in range(len(data)):
try:
for ii in range(len(data[i])):
try:
data[i][ii] =int(re.sub(r'\D', "", data[item]))
except:
continue
except:
continue
Thanks!谢谢!
Use:用:
s = pd.Series(data)
a = (pd.to_numeric(s.explode() #explode lists
.astype(str) #convert all values to strings
.str.replace(r'\D', ''), errors='coerce') #replace and convert to numbers if possible
.loc[lambda x: x > 10] #filter values
.min(level=0) #get minimal per index
.reindex(s.index) #add removed values of index
.tolist()) #convert to list
#convert non NaNs to integers
a = [int(x) if x == x else x for x in a]
print (a)
[12, 16, 11, nan, nan]
Your function should be simplify:您的功能应该简化:
def try_to_int(x):
try:
return int(re.sub(r'\D', "", x))
except:
return np.nan
def min_int(x):
if isinstance(x, int):
return x
elif isinstance(x, list):
gen = (try_to_int(y) for y in x)
return min(y for y in gen if y == y and y > 10)
else:
return try_to_int(x)
print ([min_int(x) for x in starting_list])
[12, 16, 11, nan, nan]
Although you already have an accepted answer by pandas-wiz @jezrael, here is a lower-level approach, if you like.尽管您已经获得了 pandas-wiz @jezrael 的公认答案,但如果您愿意,这里有一个较低级别的方法。
Essentially, it uses regex to extract the numeric values, and filters to meet your requirement, then appends them to an output list.本质上,它使用正则表达式来提取数值,并使用过滤器来满足您的要求,然后将它们附加到输出列表中。
import re
exp = re.compile(r'(\d+)')
new = []
for i in list_:
if isinstance(i, list):
new2 = []
for j in i:
f = exp.findall(str(j))
new2.append(int(f[0]) if f else float('nan'))
new.append(min(i for i in new2 if i > 10))
else:
f = exp.findall(str(i))
new.append(int(f[0]) if f else float('nan'))
Output:输出:
>>> [12, 16, 11, nan, nan]
Setup:设置:
list_ = [['4dfg', '12kfmgfg', 'dfgdf133'],
[8, '16dgdfg'],
11,
'',
'fdsf']
You may create a generator:您可以创建一个生成器:
from collections.abc import Iterable
import re
def select_numbers(items, limit=10):
for item in items:
if isinstance(item, Iterable) and not isinstance(item, str):
yield from select_numbers(item, limit)
else:
item = re.sub(r"\D", "", str(item))
if item == "":
yield float("NaN")
elif int(item) > limit:
yield int(item)
Testing it:测试它:
>>> starting_list = [['4dfg', '12kfmgfg','dfgdf133'],[8, '16dgdfg'], 11, '', 'fdsf']
>>>
>>> desired_result = select_numbers(starting_list) # generator, not a list
>>> list(desired_result)
[12, 133, 16, 11, nan, nan]
>>> desired_result = select_numbers(starting_list) # re-creating exhausted generator
>>>
>>> for elem in desired_result: # direct use of generator in loop
>>> print(elem)
12 133 16 11 nan nan
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.