[英]How to extract numbers from a list of strings?
我應該如何僅從
a = ['1 2 3', '4 5 6', 'invalid']
我努力了:
mynewlist = [s for s in a if s.isdigit()]
print mynewlist
和
for strn in a:
values = map(float, strn.split())
print values
兩者都失敗了,因為數字之間有空格。
注意:我試圖實現輸出為:
[1, 2, 3, 4, 5, 6]
我認為您需要將list
中的每個項目作為空格上的拆分字符串處理。
a = ['1 2 3', '4 5 6', 'invalid']
numbers = []
for item in a:
for subitem in item.split():
if(subitem.isdigit()):
numbers.append(subitem)
print(numbers)
['1', '2', '3', '4', '5', '6']
或者在一個整潔的理解中:
[item for subitem in a for item in subitem.split() if item.isdigit()]
這應該適用於您的特定情況,因為您在列表中包含一個字符串。 因此,您需要將其展平:
new_list = [int(item) for sublist in a for item in sublist if item.isdigit()]
假設列表只是字符串:
[int(word) for sublist in map(str.split, a) for word in sublist if word.isdigit()]
借助套裝,您可以:
>>> a = ['1 2 3', '4 5 6', 'invalid']
>>> valid = set(" 0123456789")
>>> [int(y) for x in a if set(x) <= valid for y in x.split()]
[1, 2, 3, 4, 5, 6]
僅當字符串由valid
集合中的字符組成時,這將包括字符串中的數字。
mynewlist = [s for s in a if s.isdigit()]
print mynewlist
不起作用,因為您正在迭代數組的內容,該數組由三個字符串組成:
這意味着您必須在每個字符串上再次迭代。
你可以嘗試類似的東西
mynewlist = []
for s in a:
mynewlist += [digit for digit in s if digit.isdigit()]
一種襯墊解決方案:
new_list = [int(m) for n in a for m in n if m in '0123456789']
有很多選項可以從字符串列表中提取數字。
假定字符串的一般列表如下:
input_list = ['abc.123def45, ghi67 890 12, jk345', '123, 456 78, 90', 'abc def, ghi'] * 10000
如果不考慮轉換成整數,
def test_as_str(input_list):
output_list = []
for string in input_list:
output_list += re.findall(r'\d+', string)
return output_list
%timeit -n 10 -r 7 test_as_str(input_list)
> 37.6 ms ± 168 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
def test_as_str(input_list):
output_list = []
[output_list.extend(re.findall(r'\d+', string)) for string in input_list]
return output_list
%timeit -n 10 -r 7 test_as_str(input_list)
> 39.5 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
def test_as_str(input_list):
return list(itertools.chain(*[re.findall(r'\d+', string) for string in input_list]))
%timeit -n 10 -r 7 test_as_str(input_list)
> 40.4 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
def test_as_str(input_list):
return list(filter(None, [item for string in input_list for item in re.split('[^\d]+' , string)]))
%timeit -n 10 -r 7 test_as_str(input_list)
> 42.8 ms ± 372 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
也可以考慮轉換成整數。
def test_as_int(input_list):
output_list = []
for string in input_list:
output_list += re.findall(r'\d+', string)
return list(map(int, output_list))
%timeit -n 10 -r 7 test_as_int(input_list)
> 44.7 ms ± 232 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
def test_as_int(input_list):
output_list = []
for string in input_list:
output_list += re.findall(r'\d+', string)
return [int(item) for item in output_list]
%timeit -n 10 -r 7 test_as_int(input_list)
> 47.8 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
def test_as_int(input_list):
return [int(item) for string in input_list for item in re.findall(r'\d+', string)]
%timeit -n 10 -r 7 test_as_int(input_list)
> 48.3 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
def test_as_int(input_list):
return [int(item) for string in input_list for item in re.split('[^\d]+' , string) if item]
%timeit -n 10 -r 7 test_as_int(input_list)
> 51.4 ms ± 150 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
def test_as_int(input_list):
return [int(item) for string in input_list for item in re.split('[^\d]+' , string) if item.isdigit()]
%timeit -n 10 -r 7 test_as_int(input_list)
> 54.9 ms ± 210 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
def test_as_int(input_list):
return [int(item) for string in input_list for item in re.split('[^\d]+' , string) if len(item)]
%timeit -n 10 -r 7 test_as_int(input_list)
> 55.5 ms ± 175 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
性能測試是在Windows OS,Python 3.8.8虛擬環境下進行的,差別不大。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.