[英]Finding the lowest number over 10 in a nested/flat list of strings and integers
我试图从每个项目的嵌套和非嵌套字符串和整数列表中提取超过 10 的最低数字。 我尝试了一些不同的东西,但它们要么不返回任何内容,要么返回一些不同的错误消息(预期的字符串或缓冲区,'>' 在 int 和 list 之间不可能是其中的两个)。 它们需要保持有序,因为之后它们将被输入到熊猫数据框中。
starting_list = [['4dfg', '12kfmgfg','dfgdf133'],[8, '16dgdfg'], 11, '', 'fdsf']
desired_result = [12, 16, 11, NaN, NaN]
下面是我尝试过的两个不同的功能。 由于结果将进入数据框,因此基于 Pandas 的答案也可以。
def min_int(data):
for item in range(len(data)):
for i in range(len(data[item])):
if type(data[item][i]) == int:
if data[item][i] >10:
data.remove(data[item][i])
else:
data[item][i] =int(re.sub(r'\D', "", data[item]))
if data[item][i] >10:
data.remove(data[item][i])
data[item] = min(data)
def remove_text(data):
for i in range(len(data)):
try:
for ii in range(len(data[i])):
try:
data[i][ii] =int(re.sub(r'\D', "", data[item]))
except:
continue
except:
continue
谢谢!
用:
s = pd.Series(data)
a = (pd.to_numeric(s.explode() #explode lists
.astype(str) #convert all values to strings
.str.replace(r'\D', ''), errors='coerce') #replace and convert to numbers if possible
.loc[lambda x: x > 10] #filter values
.min(level=0) #get minimal per index
.reindex(s.index) #add removed values of index
.tolist()) #convert to list
#convert non NaNs to integers
a = [int(x) if x == x else x for x in a]
print (a)
[12, 16, 11, nan, nan]
您的功能应该简化:
def try_to_int(x):
try:
return int(re.sub(r'\D', "", x))
except:
return np.nan
def min_int(x):
if isinstance(x, int):
return x
elif isinstance(x, list):
gen = (try_to_int(y) for y in x)
return min(y for y in gen if y == y and y > 10)
else:
return try_to_int(x)
print ([min_int(x) for x in starting_list])
[12, 16, 11, nan, nan]
尽管您已经获得了 pandas-wiz @jezrael 的公认答案,但如果您愿意,这里有一个较低级别的方法。
本质上,它使用正则表达式来提取数值,并使用过滤器来满足您的要求,然后将它们附加到输出列表中。
import re
exp = re.compile(r'(\d+)')
new = []
for i in list_:
if isinstance(i, list):
new2 = []
for j in i:
f = exp.findall(str(j))
new2.append(int(f[0]) if f else float('nan'))
new.append(min(i for i in new2 if i > 10))
else:
f = exp.findall(str(i))
new.append(int(f[0]) if f else float('nan'))
输出:
>>> [12, 16, 11, nan, nan]
设置:
list_ = [['4dfg', '12kfmgfg', 'dfgdf133'],
[8, '16dgdfg'],
11,
'',
'fdsf']
您可以创建一个生成器:
from collections.abc import Iterable
import re
def select_numbers(items, limit=10):
for item in items:
if isinstance(item, Iterable) and not isinstance(item, str):
yield from select_numbers(item, limit)
else:
item = re.sub(r"\D", "", str(item))
if item == "":
yield float("NaN")
elif int(item) > limit:
yield int(item)
测试它:
>>> starting_list = [['4dfg', '12kfmgfg','dfgdf133'],[8, '16dgdfg'], 11, '', 'fdsf']
>>>
>>> desired_result = select_numbers(starting_list) # generator, not a list
>>> list(desired_result)
[12, 133, 16, 11, nan, nan]
>>> desired_result = select_numbers(starting_list) # re-creating exhausted generator
>>>
>>> for elem in desired_result: # direct use of generator in loop
>>> print(elem)
12 133 16 11 nan nan
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.