搜索字符串并仅返回我指定的内容

Question

Hopefully this post goes better..希望这篇文章越来越好..

So I am stuck on this feature of this program that will return the whole word where a certain keyword is specified.所以我坚持这个程序的这个功能，它将返回指定某个关键字的整个单词。

ie - If I tell it to look for the word "I=" in the string "blah blah blah blah I=1mV blah blah etc?", that it returns the whole word where it is found, so in this case, it would return I=1mV.即 - 如果我告诉它在字符串“blah blah blah blah I=1mV blah blah etc?”中寻找单词“I=”，它会返回找到它的整个单词，所以在这种情况下，它会返回 I=1mV。

I have tried a bunch of different approaches, such as,我尝试了很多不同的方法，例如，

text = "One of the values, I=1mV is used"
print(re.split('I=', text))

However, this returns the same String without I in it, so it would return但是，这会返回没有 I 的相同字符串，因此它会返回

['One of the values, ', '1mV is used']

If I try regex solutions, I run into the problem where the number could possibly be more then 1 digit, and so this bottom piece of code only works if the number is 1 digit.如果我尝试正则表达式解决方案，我会遇到数字可能超过 1 位的问题，因此只有当数字是 1 位时，这段底部代码才有效。 If I=10mV was that value, it would only return one, but if i have [/0-9] in twice, the code no longer works with only 1 value.如果 I=10mV 是那个值，它只会返回一个，但如果我有两次 [/0-9]，代码不再只使用 1 个值。

text = "One of the values, I=1mV is used"
print(re.findall("I=[/0-9]", text))

['I=1']

When I tried using re.match,当我尝试使用 re.match 时，

text = "One of the values, I=1mV is used"
print(re.search("I=", text))

<_sre.SRE_Match object at 0x02408BF0>

What is a good way to retrieve the word (In this case, I want to retrieve I=1mV) and cut out the rest of the string?有什么好的取词方法（在本例中，我要取I=1mV）并截取字符串的rest？

Answer 1

A better way would be to split the text into words first:更好的方法是先将文本拆分为单词：

>>> text = "One of the values, I=1mV is used"
>>> words = text.split()
>>> words
['One', 'of', 'the', 'values,', 'I=1mV', 'is', 'used']

And then filter the words to find the one you need:然后过滤单词以找到您需要的单词：

>>> [w for w in words if 'I=' in w]
['I=1mV']

This returns a list of all words with I= in them.这将返回其中包含I=的所有单词的列表。 We can then just take the first element found:然后我们可以只取第一个找到的元素：

>>> [w for w in words if 'I=' in w][0]
'I=1mV'

Done, What we can do to clean this up a bit is to just look for the first match.完成，我们可以做的就是寻找第一个匹配项来稍微清理一下。 rather then checking every word: We can use a generator expression for that:而不是检查每个单词：我们可以为此使用生成器表达式：

>>> next(w for w in words if 'I=' in w)
'I=1mV'

Of course you could adapt the if condition to fit your needs better, you could for example use str.startswith() to check if the words starts with a certain string or re.match() to check if the word matches a pattern.当然，您可以调整if条件以更好地满足您的需求，例如，您可以使用str.startswith()检查单词是否以某个字符串开头，或者re.match()检查单词是否与模式匹配。

Answer 2

Using string methods使用字符串方法

For the record, your attempt to split the string in two halves, using I= as the separator, was nearly correct.作为记录，您尝试使用I=作为分隔符将字符串分成两半，这几乎是正确的。 Instead of using str.split() , which discards the separator, you could have used str.partition() , which keeps it.您可以使用str.partition() str.split()保留分隔符，而不是使用丢弃分隔符的 str.split() 。

>>> my_text = "Loadflow current was I=30.63kA"
>>> my_text.partition("I=")
('Loadflow current was ', 'I=', '30.63kA')

Using regular expressions使用正则表达式

A more flexible and robust solution is to use a regular expression:一个更灵活和健壮的解决方案是使用正则表达式：

>>> import re
>>> pattern = r"""
... I=             # specific string "I="
... \s*            # Possible whitespace
... -?             # possible minus sign
... \s*            # possible whitespace
... \d+            # at least one digit
... (\.\d+)?       # possible decimal part
... """
>>> m = re.search(pattern, my_text, re.VERBOSE)
>>> m
<_sre.SRE_Match object at 0x044CCFA0>
>>> m.group()
'I=30.63'

This accounts for a lot more possibilities (negative numbers, integer or decimal numbers).这说明了更多的可能性（负数，integer 或十进制数）。

Note the use of:注意使用：

Quantifiers to say how many of each thing you want.量词表示你想要的每样东西的数量。
- a* - zero or more a s a* - 零个或多个a s
- a+ - at least one a a+ - 至少a
- a? - "optional" - one or zero a s - “可选” - 一个或零a s
Verbose regular expression ( re.VERBOSE flag) with comments - much easier to understand the pattern above than the non-verbose equivalent, I=\s?-?\s?\d+(\.\d+) .带有注释的详细正则表达式（ re.VERBOSE标志） - 比非详细等效项I=\s?-?\s?\d+(\.\d+)更容易理解上面的模式。
Raw strings for regexp patterns, r"..." instead of plain strings "..." - means that literal backslashes don't have to be escaped.正则表达式模式的原始字符串， r"..."而不是纯字符串"..." - 意味着不必转义文字反斜杠。 Not required here because our pattern doesn't use backslashes, but one day you'll need to match C:\Program Files\... and on that day you will need raw strings.此处不需要，因为我们的模式不使用反斜杠，但有一天您需要匹配C:\Program Files\... ，而在那一天您将需要原始字符串。

Exercises练习

Exercise 1: How do you extend this so that it can match the unit as well?练习 1：如何扩展它以便它也可以匹配单位？ And how do you extend this so that it can match the unit as either mA , A , or kA ?您如何扩展它以便它可以将单位匹配为mA 、 A或kA ？ Hint: "Alternation operator".提示：“交替运算符”。
Exercise 2: How do you extend this so that it can match numbers in engineering notation, ie "1.00e3", or "-3.141e-4"?练习 2：如何扩展它以匹配工程符号中的数字，即“1.00e3”或“-3.141e-4”？

Answer 3

import re
text = "One of the values, I=1mV is used"
l = (re.split('I=', text))
print str(l[1]).split(' ') [0]

if you have more than one I= do the above for each odd index in l sice 0 is the first one.如果你有多个I=对 l 中的每个奇数索引执行上述操作，因为 0 是第一个。

that is a good way since one can write "One of the values, I= 1mV is used" and I guess you want to get that I is 1mv.这是一个好方法，因为可以写“使用其中一个值，I = 1mV”，我想你想知道 I 是 1mv。

BTW I is current and its units are Ampers and not Volts:)顺便说一句，我是电流的，它的单位是安培而不是伏特:)

Answer 4

With your re.findall attempt you would want to add a + which means one or more.通过您的 re.findall 尝试，您可能想要添加一个+表示一个或多个。
Here are some examples:这里有些例子：

import re

test = "This is a test with I=1mV, I=1.414mv, I=10mv and I=1.618mv."

result = re.findall(r'I=[\d\.]+m[vV]', test)

print(result)

test = "One of the values, I=1mV is used"

result = re.search(r'I=([\d\.]+m[vV])', test)

print(result.group(1))

The first print is: ['I=1mV', 'I=1.414mv', 'I=10mv', 'I=1.618mv']第一次打印是： ['I=1mV', 'I=1.414mv', 'I=10mv', 'I=1.618mv']

I've grouped everything other than I= in the re.search example,在 re.search 示例中，我将I=以外的所有内容分组，
so the second print is: 1mV所以第二次打印是： 1mV
incase you are interested in extracting that.如果您有兴趣提取它。

搜索字符串并仅返回我指定的内容

问题描述

4 个解决方案

解决方案1
2 已采纳 2012-04-04 02:37:23

解决方案2
2 2012-04-04 04:22:37

Using string methods使用字符串方法

Using regular expressions使用正则表达式

Exercises练习

解决方案3
1 2012-04-04 02:39:34

解决方案4
1 2012-04-04 03:18:01

搜索字符串并仅返回我指定的内容

问题描述

4 个解决方案

解决方案1 2 已采纳 2012-04-04 02:37:23

解决方案2 2 2012-04-04 04:22:37

Using string methods使用字符串方法

Using regular expressions使用正则表达式

Exercises练习

解决方案3 1 2012-04-04 02:39:34

解决方案4 1 2012-04-04 03:18:01

解决方案1
2 已采纳 2012-04-04 02:37:23

解决方案2
2 2012-04-04 04:22:37

解决方案3
1 2012-04-04 02:39:34

解决方案4
1 2012-04-04 03:18:01