简体   繁体   中英

Find string between two substrings, in a stream of data

I have this continuous serial data stream:

----------------------------------------
 
SENSOR COORDINATE         = 0
 
MEASURED RESISTANCE       = 3.70 kOhm
 
----------------------------------------
 
----------------------------------------
 
SENSOR COORDINATE         = 1
 
MEASURED RESISTANCE       = 3.70 kOhm
 
----------------------------------------
 
----------------------------------------
 
SENSOR COORDINATE         = 2
 
MEASURED RESISTANCE       = 3.69 kOhm
 
----------------------------------------

For each iteration, i want to be able to grab the values. The sensor coordinate value, and the resistance value.

I found solutions using .split() and with using regular expressions ( Find string between two substrings ), but the problem is that in my case, there is not one string that i want to filter, but a continuous stream.

For example, .split() will find my string, but it will split the stream in half. This does not work, in a continuous stream, for more than one time.

NOTE: After the sensor coordinate value, i have a carriage return character.

EDIT 1/3: This is the snippet of code that grabs the serial data:

def readSerial():
    global after_id
    while ser.in_waiting:
        try:
            ser_bytes = ser.readline() #read data from the serial line
            ser_bytes = ser_bytes.decode("utf-8")
            text.insert("end", ser_bytes)
        except UnicodeDecodeError:
            print("UnicodeDecodeError")
    else:
        print("No data received")
    after_id=root.after(50,readSerial)

And if someone wants, to know, this is the C code on the arduino side, that sends the data:

Serial.println("----------------------------------------");
Serial.print("SENSOR COORDINATE         = ");
Serial.println(sensor_coord);
Serial.print("MEASURED RESISTANCE       = ");
double resistanse = ((period * GAIN_VALUE * 1000) / (4 * CAPACITOR_VALUE)) - R_BIAS_VALUE;
Serial.print(resistanse);
Serial.println(" kOhm");

EDIT 2/3: This is a previous approach:

def readSerial():
        global after_id
        while ser.in_waiting:
            try:
                ser_bytes = ser.readline() #read data from the serial line
                ser_bytes = ser_bytes.decode("utf-8")
                text.insert("end", ser_bytes)
                result = re.search.(, ser_bytes)
                print(result)
            except UnicodeDecodeError:
                print("UnicodeDecodeError")
        else:
            print("No data received")
        after_id=root.after(50,readSerial)

And in another attempt, i changed this line result = re.search.(, ser_bytes) to result =ser_bytes.split("TE = ") .

This is a picture of the data i receive (this is a tkinter text frame). 在此处输入图像描述

EDIT 3/3: This is my code implementing dracarys algorithm:

def readSerial():
    global after_id
    while ser.in_waiting:
        try:
            ser_bytes = ser.readline() 
            print(ser_bytes)
            ser_bytes = ser_bytes.decode("utf-8")
            print(ser_bytes)
            text.insert("end", ser_bytes)
           
            if "SENSOR COORDINATE" in ser_bytes:
               found_coordinate = True
               coordinate = int(ser_bytes.split("=")[1].strip())
               print("Coordinate",coordinate)
            if "MEASURED RESISTANCE" in ser_bytes and found_coordinate:
               found_coordinate = False
               resistance = float(ser_bytes.split("=")[1].split("kOhm")[0].strip())
               print("Resistance",resistance)
        
        except UnicodeDecodeError:
            print("UnicodeDecodeError")
    else:
        print("No data received")
    after_id=root.after(50,readSerial)

This is the error i get, after the code runs for about ten seconds succesfully (i have included normal operation output as well for reference):

No data received
b'SENSOR COORDINATE         = 2\r\n'
SENSOR COORDINATE         = 2

Coordinate 2
b'MEASURED RESISTANCE       = 3.67 kOhm\r\n'
MEASURED RESISTANCE       = 3.67 kOhm

Resistance 3.67
b'----------------------------------------\r\n'
----------------------------------------

b'----------------------------------------\r\n'
----------------------------------------

b'SENSOR COORDINATE         = 3\r\n'
SENSOR COORDINATE         = 3

Coordinate 3
No data received
b'MEASURED RESISTANCE       = 3.78 kOhm\r\n'
MEASURED RESISTANCE       = 3.78 kOhm

Exception in Tkinter callback
Traceback (most recent call last):
  File "C:\Users\User1\AppData\Local\Programs\Python\Python38-32\lib\tkinter\__i
nit__.py", line 1883, in __call__
    return self.func(*args)
  File "C:\Users\User1\AppData\Local\Programs\Python\Python38-32\lib\tkinter\__i
nit__.py", line 804, in callit
    func(*args)
  File "tkinterWithPortsExperiment.py", line 73, in readSerial
    if "MEASURED RESISTANCE" in ser_bytes and found_coordinate:
UnboundLocalError: local variable 'found_coordinate' referenced before assignment

As I said in my comments, I feel the Arduino output should be simplified. As @oliver_t said, a one line JSON for each sensor event would be perfect.

If you can't do that, here is the code to parse this.

As I do not have any way of receive your serial monitor output line by line, I have simulated that by storing the output in a txt file and then reading it line by line. I hope this helps as your question is how to parse the input.

f = open('stream.txt', 'r')
global found_coordinate
found_coordinate = False
while True:
    line = f.readline()
    if not line:
        break
    
    if "SENSOR COORDINATE" in line:
        found_coordinate = True
        coordinate = int(line.split("=")[1].strip())
        print("Coordinate",coordinate)
    
    if "MEASURED RESISTANCE" in line and found_coordinate:
        found_coordinate = False
        resistance = float(line.split("=")[1].split("kOhm")[0].strip())
        print("Resistance",resistance)

I hope this helps, if there is any discrepancy in me understanding your requirement, let me know, so I can fix my code.

Note: you actually might not require .strip() as typecasting to a int or float takes care of that, however I have still put it there as a sanity check

So here is your readSerial function:

def readSerial():
    global after_id
    while ser.in_waiting:
        try:
            ser_bytes = ser.readline() 
            print(ser_bytes)
            ser_bytes = ser_bytes.decode("utf-8")
            print(ser_bytes)
            text.insert("end", ser_bytes)
           
            if "SENSOR COORDINATE" in ser_bytes:
               found_coordinate = True
               coordinate = int(ser_bytes.split("=")[1].strip())
               print("Coordinate",coordinate)
            if "MEASURED RESISTANCE" in ser_bytes and found_coordinate:
               found_coordinate = False
               resistance = float(ser_bytes.split("=")[1].split("kOhm")[0].strip())
               print("Resistance",resistance)
        
        except UnicodeDecodeError:
            print("UnicodeDecodeError")
    else:
        print("No data received")
    after_id=root.after(50,readSerial)

As you know, you have found_coordinate defined in each of the two if statements. But lets see where you reference the found_coordinate variable:

            if "MEASURED RESISTANCE" in ser_bytes and found_coordinate:

That is the only place where you use the found_coordinate variable, and it's also where the error occurred. Now consider this, if the

            if "SENSOR COORDINATE" in ser_bytes:

never evaluated to True , then the found_coordinate = True line never met, meaning the found_coordinate never got defined. Yes, there is another line where you define it, but it can only be executed with this condition:

            if "MEASURED RESISTANCE" in ser_bytes and found_coordinate:

which again, the found_coordinate variable didn't get defined yet, causing the error. You might be wondering: how did it run successfully for 10 seconds with no error? It's simple:

For 10 seconds, the if "SENSOR COORDINATE" in ser_bytes: all evaluated to False so the found_coordinate variable never got defined. But at the same time, for 10 seconds, the "MEASURED RESISTANCE" in ser_bytes also all evaluated to False , so the program didn't continue to the and found_coordinate , as there was no need. At the time of the error is when the "MEASURED RESISTANCE" in ser_bytes evaluated to True , making the program parse the and found_coordinate , where found_coordinate haven't gotten defined.

If you can change your Arduino code, then you might be able to leverage the json.load() method in python to turn a string into something more manageable.

I'm not up on Arduinos (despite having one sitting with arms reach, box unopened, for the best part of two years...) so the following might be closer to pseudo-code than actual code:

# This should (fingers crossed) build and send a separate message for each variable.
# It could relatively easily be combined into one message.

double resistanse = ((period * GAIN_VALUE * 1000) / (4 * CAPACITOR_VALUE)) - R_BIAS_VALUE;

##########################################################
# If you want to send a separate message for each variable
String sSensor = "{\"sensorcoord\":";
String sResistance = "{\"resistance\":";
String sEnd = "}"

String sensor_output = sSensor + sensor_coord + sEnd
Serial.println(sensor_output)
# output will be {"sensorcoord":1}

String resistance_output = sResistance + resistanse + sEnd
Serial.println(resistance_output)
# output will be {"resistance":3.7}

########################################################
# If you want to send one message holding both variables

String sSensor = "{\"sensorcoord\":";
String sResistance = ",\"resistance\":";
String sEnd = "}"

String combined_output = sSensor + sensor_coord + sResistance + resistance + sEnd
Serial.println(combined_output)
# output will be {"sensorcoord": 1,"resistance":3.7}

Once you get the string into Python, you can then use json.loads() to take a (properly formatted) text string and turn it into an object that you can access more easily:

import json

data = json.loads(textstringFromArduino)

# If you sent each value separately, you now need to work out which value you are receiving.
for key, value in data.items():
    if key == "sensorcoord":
      print(value)
    elif key == "resistance":
      print(value)


# If you sent both values in one message, it's a lot easier...

print(data['sensorcoord'])
print(data['resistance'])


You were almost there with your attempt. The UnboundLocalError happens because the variable found_coordinate isn't defined in your function if the line is a resistance line. You should define that as a global variable too, because you need to keep track of it over multiple function calls. I'm intrigued that the first set of coordinate/resistance worked. So do

global after_id, found_coordinate

at the beginning of your function.


I wrote this answer before you posted your attempt. The approach is very similar to yours. Use from it what you find useful!

You don't need split() at all. Since all you get at a time is a line, parse each line and interpret the result. If you found the sensor coordinate, keep your eye out for a line that gives measured distance. Once you have both, record them and keep an eye out for a new sensor coordinate.

  1. Receive line
  2. Does line contain "SENSOR COORDINATE" ?
    • Yes: parse the number and save it for later use.
    • No: Do nothing? Or print an error message?
  3. Does line contain "MEASURED RESISTANCE" ?
    • Yes: parse the number and save it.
      • Now we should have both items of our pair. Save them for later.
    • No: Do nothing? Or print an error message?

First, let's define a function to extract only the numbers from each line:

import re

def get_numbers_from_line(line):
    try:
        numbers_rex = r"(\d+(?:\.\d+)*)"
        matched_text = re.findall(numbers_rex, line)[0]
        parsed_number = float(matched_text)
        return parsed_number
    except IndexError:
        print(f"No numbers found on line {line}")
    except ValueError:
        print(f"Couldn't parse number {matched_text}")
    
    # Only come here if error occurred. Not required, but for clarity we return None
    return None

Next, let's define a function to parse each line and decide what to do with it. It will use a couple of global variables to keep track of its state:

all_pairs = []
parsed_pair = []
def parse_line(line):
    global all_pairs, parsed_pair
    if "SENSOR COORDINATE" in line:
        sensor_coord = get_numbers_from_line(line)
        
        if parsed_pair:
            # Something already exists in parsed_pair. Tell user we are discarding it
            print("Data already existed in parsed_pair when a new SENSOR COORDINATE was received")
            print("Existing data was discarded")
            print(f"parsed_pair: {parsed_pair}")
        
        if sensor_coord is not None:
            # Make a new list containing only this newly parsed number
            parsed_pair = [sensor_coord]
    
    elif "MEASURED RESISTANCE" in line:
        resistance = get_numbers_from_line(line)
        if not parsed_pair:
            # parsed_pair is empty, so no sensor coordinate was recorded. 
            # Ignore this line and wait for the start of the next pair of data
            print("Received measured resistance without corresponding sensor coordinate")
            print("Received data was discarded")
            print(f"Received data: {resistance}")
        elif resistance is not None:
            parsed_pair.append(resistance) # Add resistance to the pair
            all_pairs.append(parsed_pair)  # Add the pair to all_pairs 
            parsed_pair = []  # Make a new empty list for the next pair
            print(f"Added pair {parsed_pair}")

Then, in def readSerial(): after text.insert("end", ser_bytes) , call this function.

def readSerial():
    global after_id
    while ser.in_waiting:
        try:
            ser_bytes = ser.readline() #read data from the serial line
            ser_bytes = ser_bytes.decode("utf-8")
            text.insert("end", ser_bytes)
            parse_line(ser_bytes)
        except UnicodeDecodeError:
            print("UnicodeDecodeError")
    else:
        print("No data received")
    after_id=root.after(50,readSerial)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM