简体   繁体   中英

Python regex .match failing to match in strings returned from a C++ process via subprocess

A string containing the relevant data is sent from a subprocess via the Popen function and stdout.

 run = subprocess.Popen('evo -x "' + folder+filename+'"', stdout = subprocess.PIPE, stderr = subprocess.PIPE, env={'LANG':'C++'})
 data, error = run.communicate()

The string that prints the relevant information looks like this:

printf "height= %.15f \ntilt = %.15f (%.15f)\ncen_volume= %.15f\nr_volume= %.15f\n", height, abs(sin(tilt*pi/180)*ring_OR), abs(tilt), c_vol, r_vol;

and yields to the console on print data (with some extra stuff that is also in stdout):

Evolver 2.40, May 8, 2072; Windows OpenGL, 32-bit

Converting to all named quantities...Done.
Enter command: QUIET
height=    0.000211813357854 
tilt=0.0 (0.0)
cen_volume= 0.000000000600000
r_volume= 0.000000003000000
bad

I match it with:

dat = re.match("[\s]*height\s*=\s*([-]?[\.\d]*)[\s]*tilt\s*=\s*([-]?[\.\d]*)[\s]*\(([\s]*[-]?[\.\d]*)\)[\s]*cen_volume\s*=\s*([-]?[\.\d]*)[\s]*r_volume\s*=\s*([-]?[\.\d]*)[\s]*", data)

However, this is returning 'None'. This exact same .match is used in programs that match the exact same information after it is read in from a txt file instead of stdout, and works perfectly in that case. Is there something unusual about the way that the string is treated, or some unusual extra non-visible character that is added, when this is retrieved via communicate? Note that .findall works perfectly fine for matching re.findall("[\\s]*(bad)[\\s]*", data) , but even trying to match just a single line with .match still fails.

two problems:

  1. use re.search as re.match would require the string to match from the first char
  2. add re.MULTILINE and re.DOTALL flags for the kind of multiline search you want

Try:

>>> re.search("[\s]*height\s*=\s*([-]?[\.\d]*)[\s]*tilt\s*=\s*([-]?[\.\d]*)[\s]*\(([\s]*[-]?[\.\d]*)\)[\s]*cen_volume\s*=\s*([-]?[\.\d]*)[\s]*r_volume\s*=\s*([-]?[\.\d]*)[\s]*", v,re.MULTILINE|re.DOTALL).groups()
('0.000211813357854', '0.0', '0.0', '0.000000000600000', '0.000000003000000')

It might be simpler to use findall in this case:

from pprint import pprint
pprint(dict(re.findall(r"(height|tilt|cen_volume|r_volume)\s*=\s*(.*\S)", data)))

Output

{'cen_volume': '0.000000000600000',
 'height': '0.000211813357854',
 'r_volume': '0.000000003000000',
 'tilt': '0.0 (0.0)'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM