Searching a string and returning only things I specify

Question

Hopefully this post goes better..

So I am stuck on this feature of this program that will return the whole word where a certain keyword is specified.

ie - If I tell it to look for the word "I=" in the string "blah blah blah blah I=1mV blah blah etc?", that it returns the whole word where it is found, so in this case, it would return I=1mV.

I have tried a bunch of different approaches, such as,

text = "One of the values, I=1mV is used"
print(re.split('I=', text))

However, this returns the same String without I in it, so it would return

['One of the values, ', '1mV is used']

If I try regex solutions, I run into the problem where the number could possibly be more then 1 digit, and so this bottom piece of code only works if the number is 1 digit. If I=10mV was that value, it would only return one, but if i have [/0-9] in twice, the code no longer works with only 1 value.

text = "One of the values, I=1mV is used"
print(re.findall("I=[/0-9]", text))

['I=1']

When I tried using re.match,

text = "One of the values, I=1mV is used"
print(re.search("I=", text))

<_sre.SRE_Match object at 0x02408BF0>

What is a good way to retrieve the word (In this case, I want to retrieve I=1mV) and cut out the rest of the string?

Answer 1

A better way would be to split the text into words first:

>>> text = "One of the values, I=1mV is used"
>>> words = text.split()
>>> words
['One', 'of', 'the', 'values,', 'I=1mV', 'is', 'used']

And then filter the words to find the one you need:

>>> [w for w in words if 'I=' in w]
['I=1mV']

This returns a list of all words with I= in them. We can then just take the first element found:

>>> [w for w in words if 'I=' in w][0]
'I=1mV'

Done, What we can do to clean this up a bit is to just look for the first match. rather then checking every word: We can use a generator expression for that:

>>> next(w for w in words if 'I=' in w)
'I=1mV'

Of course you could adapt the if condition to fit your needs better, you could for example use str.startswith() to check if the words starts with a certain string or re.match() to check if the word matches a pattern.

Answer 2

Using string methods

For the record, your attempt to split the string in two halves, using I= as the separator, was nearly correct. Instead of using str.split() , which discards the separator, you could have used str.partition() , which keeps it.

>>> my_text = "Loadflow current was I=30.63kA"
>>> my_text.partition("I=")
('Loadflow current was ', 'I=', '30.63kA')

Using regular expressions

A more flexible and robust solution is to use a regular expression:

>>> import re
>>> pattern = r"""
... I=             # specific string "I="
... \s*            # Possible whitespace
... -?             # possible minus sign
... \s*            # possible whitespace
... \d+            # at least one digit
... (\.\d+)?       # possible decimal part
... """
>>> m = re.search(pattern, my_text, re.VERBOSE)
>>> m
<_sre.SRE_Match object at 0x044CCFA0>
>>> m.group()
'I=30.63'

This accounts for a lot more possibilities (negative numbers, integer or decimal numbers).

Note the use of:

Quantifiers to say how many of each thing you want.
- a* - zero or more a s
- a+ - at least one a
- a? - "optional" - one or zero a s
Verbose regular expression ( re.VERBOSE flag) with comments - much easier to understand the pattern above than the non-verbose equivalent, I=\s?-?\s?\d+(\.\d+) .
Raw strings for regexp patterns, r"..." instead of plain strings "..." - means that literal backslashes don't have to be escaped. Not required here because our pattern doesn't use backslashes, but one day you'll need to match C:\Program Files\... and on that day you will need raw strings.

Exercises

Exercise 1: How do you extend this so that it can match the unit as well? And how do you extend this so that it can match the unit as either mA , A , or kA ? Hint: "Alternation operator".
Exercise 2: How do you extend this so that it can match numbers in engineering notation, ie "1.00e3", or "-3.141e-4"?

Answer 3

import re
text = "One of the values, I=1mV is used"
l = (re.split('I=', text))
print str(l[1]).split(' ') [0]

if you have more than one I= do the above for each odd index in l sice 0 is the first one.

that is a good way since one can write "One of the values, I= 1mV is used" and I guess you want to get that I is 1mv.

BTW I is current and its units are Ampers and not Volts:)

Answer 4

With your re.findall attempt you would want to add a + which means one or more.
Here are some examples:

import re

test = "This is a test with I=1mV, I=1.414mv, I=10mv and I=1.618mv."

result = re.findall(r'I=[\d\.]+m[vV]', test)

print(result)

test = "One of the values, I=1mV is used"

result = re.search(r'I=([\d\.]+m[vV])', test)

print(result.group(1))

The first print is: ['I=1mV', 'I=1.414mv', 'I=10mv', 'I=1.618mv']

I've grouped everything other than I= in the re.search example,
so the second print is: 1mV
incase you are interested in extracting that.

Searching a string and returning only things I specify

Question

4 answers

solution1
2 ACCPTED 2012-04-04 02:37:23

solution2
2 2012-04-04 04:22:37

Using string methods

Using regular expressions

Exercises

solution3
1 2012-04-04 02:39:34

solution4
1 2012-04-04 03:18:01

Searching a string and returning only things I specify

Question

4 answers

solution1 2 ACCPTED 2012-04-04 02:37:23

solution2 2 2012-04-04 04:22:37

Using string methods

Using regular expressions

Exercises

solution3 1 2012-04-04 02:39:34

solution4 1 2012-04-04 03:18:01

solution1
2 ACCPTED 2012-04-04 02:37:23

solution2
2 2012-04-04 04:22:37

solution3
1 2012-04-04 02:39:34

solution4
1 2012-04-04 03:18:01