简体   繁体   中英

Unexpected behavior with Python string parsing

I am trying to parse a string and fill an array with specific information contained in the string, but I am getting some unexpected behavior.

I have written a script that successfully does this for some use cases, but it doesn't work for all possible cases.

Consider the string: 'BEST POSITION:P(0) = 1.124 P(1) = 2.345 P(2) = 3.145 P(3) = 4.354'

The following code should create the list: [1.124, 2.345, 3.145, 4.354]

inputs_best = np.zeros(4)
string_in = 'BEST POSITION:P(0) = 1.124 P(1) = 2.345 P(2) = 3.145 P(3) = 4.354'

best_sols_clean = ''
for item in string_in:
    best_sols_clean += item

best_sols_clean = re.sub('[ \t]', '', best_sols_clean)

count = 0
while best_sols_clean.find('P(') is not -1:
    line_index = best_sols_clean.find('P(')
    try:
        inputs_best[count] = float(best_sols_clean[line_index+5:line_index+10])
        best_sols_clean = best_sols_clean[line_index+10:-1]
        count += 1
    except ValueError:
        inputs_best[count] = float(best_sols_clean[line_index+5:line_index+6])
        best_sols_clean = best_sols_clean[line_index+6:-1]
        count += 1

print(inputs_best)

The output of this script is:

[1.124 2.345 3.145 4. ]

For this string, this works, except for the last entry in the list that is cut off at too few digits.

The Except clause is used to catch exceptions when one or more of the values are integers, such as:

string_in = 'BEST POSITION:P(0) = 1 P(1) = 2.345 P(2) = 3.145 P(3) = 4'

which results in an error.

I believe the problem lies with the line best_sols_clean = best_sols_clean[line_index+10:-1] that for some reason throws away trailing digits of the string, even though I am slicing to the last element of the string.

For the string string_in = 'BEST POSITION:P(0) = 1 P(1) = 2.345 P(2) = 3.145 P(3) = 4' the program quits with the error

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    inputs_best[count] = float(best_sols_clean[line_index+5:line_index+10])
ValueError: could not convert string to float: 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 21, in <module>
    inputs_best[count] = float(best_sols_clean[line_index+5:line_index+6])
ValueError: could not convert string to float:

I would also be open for a more elegent solution than what I am attempting.

You are trying to hard-code the tiny bits, which makes things extremely inefficient, vulnerable and hard to debug. You probably have an issue with your indices but it may not be worthwhile to even dig deep. Why don't you just split your string on space, and try to capture all number-looking strings into a list? Like as follows:

string_in = 'BEST POSITION:P(0) = 1.124 P(1) = 2.345 P(2) = 3.145 P(3) = 4.354'
numbers = []
for x in string_in.split(' '):
    # Append float-able strings into your list
    try: numbers.append(float(x))
    # Pass only on the ValueErrors, do not use bare except. Any other error should break the code by design
    except ValueError: pass
# Produces: [1.124, 2.345, 3.145, 4.354]

If you input string_in = 'BEST POSITION:P(0) = 1 P(1) = 2.345 P(2) = 3.145 P(3) = 4' this returns [1.0, 2.345, 3.145, 4.0] . Is that good for your purposes?

It looks like your problem is with this line

 best_sols_clean = best_sols_clean[line_index+10:-1]

Each time you run through the loop you take one character off of the end of the string. Try changing it to this:

 best_sols_clean = best_sols_clean[line_index+10:]

This will output all numbers in the string that aren't within parentheses:

import re
re.findall('[^(]([\d.]+)', string_in)

Example:

import re

string_in = 'BEST POSITION:P(0) = 1.124 P(1) = 2.345 P(2) = 3.145 P(3) = 4.354'
print(re.findall('[^(]([\d.]+)', string_in))
# ['1.124', '2.345', '3.145', '4.354']

string_in = 'BEST POSITION:P(0) = 1 P(1) = 2.345 P(2) = 3.145 P(3) = 4'
print(re.findall('[^(]([\d.]+)', string_in))
# ['1', '2.345', '3.145', '4']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM