简体   繁体   中英

Python String Extraction from text file

I have written a python script that will make a call to a server and fetch the response. While making a call to the server, it will pass few values in the body of the request. This value is supposed to be fetched by reading a text file. The text file sample is given below.

My text file sample:


Host: localhost:8080
Connection: keep-alive
.....
.....
{"token":"abcdefhutryskslkslksslslks=="}POST /fill/entry/login HTTP/1.1

Host: localhost:8080
Connection: keep-alive
.....
.....
{"value":"abcdefghijklmnopqrstuvwxyz",
 "pass":"123456789zxcvbnmljhgfds",
 "token":"abcdefghijklmnopqrstuvwxyz=="}POST /fill/health HTTP/1.1

Here, if you can observe, I get different responses. I need to capture the string that starts with {"value" and ends with "} (the second part of the response as seen in the sample).

On searching in stack overflow, I came across scenarios where they extract the string but however they have a definite start point and a definite end point. In my case, even though the start point can be identified uniquely using the search string " {"url ", the end point cannot be identified as the text file contains multiple other parentheses as well.

Any suggestions/pointers on fetching the specific part of the sting from the text file(as stated above) will be really helpful.

A re example from the interpreter:

>>> with open('file') as f:
...    raw = f.read()
>>> 
>>> import re
>>> pat = re.compile(r'{"value":[^{]+}')
>>> pat.findall(raw)
['{"value":"abcdefghijklmnopqrstuvwxyz",\n "pass":"123456789zxcvbnmljhgfds",\n "token":"abcdefghijklmnopqrstuvwxyz=="}']
>>> pat.search(raw).group()
'{"value":"abcdefghijklmnopqrstuvwxyz",\n "pass":"123456789zxcvbnmljhgfds",\n "token":"abcdefghijklmnopqrstuvwxyz=="}'

如果文件不是很大,可以使用file.readlines()将整个文本读入字符串,然后使用正则表达式库提取所需的部分。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM