So I have a line,
unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786
And I want to store everything after HTTP/1.0" (so those two numbers) into list, how would I do this using regex? I have read the docs on them but they confuse me a bit.
You can use regex101 , to construct regular expressions which suit your need.
For your particular example, the following RE would work:
HTTP\/1.0.(.*$)
Explanation:
Capture in group everthing after HTTP 1.0"
Gives output:
` 200 786`
import re
text = 'unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786'
regex = r'HTTP/1.0".*$'
match = re.search(regex, text)
list_with_numbers = match.groups()[0].split()
You don't need regex for this, you can use built-in str
methods. Eg,
s = 'unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786'
data = s.partition('HTTP/1.0" ')
nums = data[2].split()
print(nums)
output
['200', '786']
You could also use .split()
instead of .partition()
, but I think .partition()
is more natural here. Note that the numbers stored in nums
are strings, so you'll need to add a conversion step if you need to do arithmetic with them.
Here's an example using .split()
instead of .partition()
that converts the number strings to integers.
data = s.split('HTTP/1.0"')
nums = [int(u) for u in data[1].split()]
print(nums)
output
[200, 786]
Do you have to use a regular expression? If not, you could do this:
>>> lines = ['unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786']
>>>
>>> numbers = [line.split()[-2:] for line in lines]
>>> numbers
[['200', '786']]
>>>
This assumes that "the last two whitespace-delimited strings" is equivalent to what you want.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.