简体   繁体   中英

Storing everything after a certain word in a line, in a list- Regex

So I have a line,

unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786

And I want to store everything after HTTP/1.0" (so those two numbers) into list, how would I do this using regex? I have read the docs on them but they confuse me a bit.

You can use regex101 , to construct regular expressions which suit your need.

For your particular example, the following RE would work:

HTTP\/1.0.(.*$)

Explanation:

Capture in group everthing after HTTP 1.0"

Gives output:

` 200 786`
import re
text = 'unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786'
regex = r'HTTP/1.0".*$'
match = re.search(regex, text)
list_with_numbers = match.groups()[0].split()

You don't need regex for this, you can use built-in str methods. Eg,

s = 'unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786'
data = s.partition('HTTP/1.0" ')
nums = data[2].split()
print(nums)

output

['200', '786']

You could also use .split() instead of .partition() , but I think .partition() is more natural here. Note that the numbers stored in nums are strings, so you'll need to add a conversion step if you need to do arithmetic with them.

Here's an example using .split() instead of .partition() that converts the number strings to integers.

data = s.split('HTTP/1.0"')
nums = [int(u) for u in data[1].split()]
print(nums)

output

[200, 786]

Do you have to use a regular expression? If not, you could do this:

>>> lines = ['unicomp6.unicomp.net - - [01/Jul/1995:00:00:14 -0400] "GET /images/NASA-logosmall.gif HTTP/1.0" 200 786']
>>> 
>>> numbers = [line.split()[-2:] for line in lines]
>>> numbers
[['200', '786']]
>>> 

This assumes that "the last two whitespace-delimited strings" is equivalent to what you want.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM