Filter list elements that contain N digits in the string

Question

I have a list that contains the HS Codes for the trading data that looks like this

trade_data = ['84 Nuclear Reactor',
  '8401 Nuclear Reactor:Fuel Elem',
  '840120 Isotopic Separation Machinery',
  '8401200000 Isotopic Separation Machinery, Apparatus And Parts']

I want to filter this list so the list contains only items that have 10 digits in their names, for this case '8401200000 Isotopic Separation Machinery, Apparatus And Parts'.

I tried

filtered_list = [x for x in trade_data if "\d{10}" in x]

but the code returns an empty list. Is there anyway to do this?

Answer 1

It seems like you're trying to apply a regex pattern. You can use re.search :

import re
[x for x in trade_data if re.search(r"\d{10}", x)] 
# ['8401200000 Isotopic Separation Machinery, Apparatus And Parts']

Or, still better, pre-compile your pattern:

p = re.compile(r"\d{10}")
[x for x in trade_data if p.search(x)] 
# ['8401200000 Isotopic Separation Machinery, Apparatus And Parts']

Note
If you need to match digits at the start of the string, add the start-of-line anchor ^ to your pattern:
 r'^\\d{10}' 

Since this was originally tagged pandas, here is a pandas solution:

s = pd.Series(trade_data)
s[s.str.contains(r'^\d{10}')]

3    8401200000 Isotopic Separation Machinery, Appa...
dtype: object

Answer 2

You can do it without regular expressions as follows:

trade_data = ['84 Nuclear Reactor',
  '8401 Nuclear Reactor:Fuel Elem',
  '840120 Isotopic Separation Machinery',
  '8401200000 Isotopic Separation Machinery, Apparatus And Parts']
filtered_list = [i for i in trade_data if len([j for j in i if j.isdigit()])==10]
print(filtered_list) #prints ['8401200000 Isotopic Separation Machinery, Apparatus And Parts']

Filter list elements that contain N digits in the string

Question

2 answers

solution1
4 ACCPTED 2019-01-02 17:44:00

solution2
0 2019-01-02 18:40:13

Filter list elements that contain N digits in the string

Question

2 answers

solution1 4 ACCPTED 2019-01-02 17:44:00

solution2 0 2019-01-02 18:40:13

solution1
4 ACCPTED 2019-01-02 17:44:00

solution2
0 2019-01-02 18:40:13