简体   繁体   中英

Use Python regex to parse string of floats output by Java Arrays.deepToString

I'm working with someone's Java code where a key data structure is a mxnxp array, float[][][] . I need to get it into Python; currently my approach is to save the array to a text file using Arrays.deepToString and then parse that text file from Python.

I am stuck on how to write a regular expression that will parse the txt. What I can do is find all the floats with their associated exponents in scientific notation. I use the following pattern to do so:

float_pat = r'\d\.\d*(?:E-\d+)?'

This works fine to capture floats in scientific notation as they are output by deepToString. Note the values are all positive because they are probabilities. Ie, I don't have any issues with how I'm capturing the numbers themselves.

What I cannot do but what I would like to do is have regex search for any number of floats enclosed in left and right brackets. I tried this:

list_of_floats_pat = r'\[(?:\d\.\d*(?:E-\d+)?), )+\]'

where I'm trying to find one or more case of the float format followed by a comma and a space enclosed by square brackets. But that returns [] . Not sure what I'm not understanding.

Here's an example 2x1x1 array:

[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5], [0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5]]]

What I would want is for the regex to return two matches:

0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5

and

0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5

that I can then just parse as strings with strip and split.

I've figured out a workaround where I just find all the bracket indexes. But I'd like to know what I'm not understanding about regexs.

The data that you have is both valid python and valid json:

>>> s = '[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 0.01050721017750691, 9.991008092716556E-5], [0.5904776610141782, 0.18175460267577365, 9.991008092716556E-5, 0.22716827582448523, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5, 9.991008092716556E-5]]]'
>>> ast.literal_eval(s)
[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 0.01050721017750691, 9.991008092716556e-05], [0.5904776610141782, 0.18175460267577365, 9.991008092716556e-05, 0.22716827582448523, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05]]]
>>> json.loads(s)
[[[0.6453525160688715, 0.15620941152962334, 0.1874313118193626, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 0.01050721017750691, 9.991008092716556e-05], [0.5904776610141782, 0.18175460267577365, 9.991008092716556e-05, 0.22716827582448523, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05, 9.991008092716556e-05]]]

You'll be better off parsing with those libraries than trying to do so with regex.

\[(?:\d\.\d*(?:E-\d+)?)(?:, (?:\d\.\d*(?:E-\d+)?))*\]

You an try this.See demo.

https://regex101.com/r/9GergE/1

The problem with your regex

\[(?:\d\.\d*(?:E-\d+)?), )+\]

was that at the end just before \\] there is no , which it was expecting.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM