Remove non-numeric characters including numbers that form a URL

Question

I have a string which is comprised of a set of numbers and a URL. I only need all numeric characters except the ones attached to the URL. Below is my code to remove all non-numeric characters but it doesn't remove the numbers from the URL.

test = '4758 11b98https://www.website11/111'
re.sub("[^0-9]","",test)

expected result: 47581198

Answer 1

original answer

Change strategy, it is much easier to just keep the leading numbers and ignore the rest:

import re
test = '47581198https://www.website11/111'
re.findall(r'^\d+', test)[0]

Or, using match, if it is not sure that the leading numbers are present:

m = re.match(r'\d+', test)
if m:
    m = m.group()

Output: '47581198'

Edit after question change

If you're sure that the 'http://' string cannot be in your initial number.

Then you need two passes, one to remove the URL, and another to clean the number.

test = '4758 11b98https://www.website11/1111'
re.sub('\D', '', re.sub('https?://.*', '', test))

Output: '47581198'

Answer 2

Please check the below expression:

y=re.compile('([0-9]+)(?=.*http)')
tokens = y.findall(test)
print(''.join(tokens))

Answer 3

You could match a string that contains https:// or http:// to not capture digits attached to it, and use an alternation | to capture the other digits in group 1.

Then in the output, join all the digits from group 1 with an empty string.

https?://\S+|(\d+)

Regex demo | Python demo

For example

import re

pattern = r"https?://\S+|(\d+)"
s = "4758 11b98https://www.website11/111"

print(''.join(re.findall(pattern, s)))

Output

47581198

Remove non-numeric characters including numbers that form a URL

Question

3 answers

solution1
2 2021-10-13 02:40:14

original answer

Edit after question change

solution2
0 2021-10-13 03:02:36

solution3
0 2021-10-13 07:41:19

Remove non-numeric characters including numbers that form a URL

Question

3 answers

solution1 2 2021-10-13 02:40:14

original answer

Edit after question change

solution2 0 2021-10-13 03:02:36

solution3 0 2021-10-13 07:41:19

solution1
2 2021-10-13 02:40:14

solution2
0 2021-10-13 03:02:36

solution3
0 2021-10-13 07:41:19