[英]Searching a string and returning only things I specify
Hopefully this post goes better..希望这篇文章越来越好..
So I am stuck on this feature of this program that will return the whole word where a certain keyword is specified.所以我坚持这个程序的这个功能,它将返回指定某个关键字的整个单词。
ie - If I tell it to look for the word "I=" in the string "blah blah blah blah I=1mV blah blah etc?", that it returns the whole word where it is found, so in this case, it would return I=1mV.即 - 如果我告诉它在字符串“blah blah blah blah I=1mV blah blah etc?”中寻找单词“I=”,它会返回找到它的整个单词,所以在这种情况下,它会返回 I=1mV。
I have tried a bunch of different approaches, such as,我尝试了很多不同的方法,例如,
text = "One of the values, I=1mV is used"
print(re.split('I=', text))
However, this returns the same String without I in it, so it would return但是,这会返回没有 I 的相同字符串,因此它会返回
['One of the values, ', '1mV is used']
If I try regex solutions, I run into the problem where the number could possibly be more then 1 digit, and so this bottom piece of code only works if the number is 1 digit.如果我尝试正则表达式解决方案,我会遇到数字可能超过 1 位的问题,因此只有当数字是 1 位时,这段底部代码才有效。 If I=10mV was that value, it would only return one, but if i have [/0-9] in twice, the code no longer works with only 1 value.如果 I=10mV 是那个值,它只会返回一个,但如果我有两次 [/0-9],代码不再只使用 1 个值。
text = "One of the values, I=1mV is used"
print(re.findall("I=[/0-9]", text))
['I=1']
When I tried using re.match,当我尝试使用 re.match 时,
text = "One of the values, I=1mV is used"
print(re.search("I=", text))
<_sre.SRE_Match object at 0x02408BF0>
What is a good way to retrieve the word (In this case, I want to retrieve I=1mV) and cut out the rest of the string?有什么好的取词方法(在本例中,我要取I=1mV)并截取字符串的rest?
A better way would be to split the text into words first:更好的方法是先将文本拆分为单词:
>>> text = "One of the values, I=1mV is used"
>>> words = text.split()
>>> words
['One', 'of', 'the', 'values,', 'I=1mV', 'is', 'used']
And then filter the words to find the one you need:然后过滤单词以找到您需要的单词:
>>> [w for w in words if 'I=' in w]
['I=1mV']
This returns a list of all words with I=
in them.这将返回其中包含I=
的所有单词的列表。 We can then just take the first element found:然后我们可以只取第一个找到的元素:
>>> [w for w in words if 'I=' in w][0]
'I=1mV'
Done, What we can do to clean this up a bit is to just look for the first match.完成,我们可以做的就是寻找第一个匹配项来稍微清理一下。 rather then checking every word: We can use a generator expression for that:而不是检查每个单词:我们可以为此使用生成器表达式:
>>> next(w for w in words if 'I=' in w)
'I=1mV'
Of course you could adapt the if
condition to fit your needs better, you could for example use str.startswith()
to check if the words starts with a certain string or re.match()
to check if the word matches a pattern.当然,您可以调整if
条件以更好地满足您的需求,例如,您可以使用str.startswith()
检查单词是否以某个字符串开头,或者re.match()
检查单词是否与模式匹配。
For the record, your attempt to split the string in two halves, using I=
as the separator, was nearly correct.作为记录,您尝试使用I=
作为分隔符将字符串分成两半,这几乎是正确的。 Instead of using str.split()
, which discards the separator, you could have used str.partition()
, which keeps it.您可以使用str.partition()
str.split()
保留分隔符,而不是使用丢弃分隔符的 str.split() 。
>>> my_text = "Loadflow current was I=30.63kA"
>>> my_text.partition("I=")
('Loadflow current was ', 'I=', '30.63kA')
A more flexible and robust solution is to use a regular expression:一个更灵活和健壮的解决方案是使用正则表达式:
>>> import re
>>> pattern = r"""
... I= # specific string "I="
... \s* # Possible whitespace
... -? # possible minus sign
... \s* # possible whitespace
... \d+ # at least one digit
... (\.\d+)? # possible decimal part
... """
>>> m = re.search(pattern, my_text, re.VERBOSE)
>>> m
<_sre.SRE_Match object at 0x044CCFA0>
>>> m.group()
'I=30.63'
This accounts for a lot more possibilities (negative numbers, integer or decimal numbers).这说明了更多的可能性(负数,integer 或十进制数)。
Note the use of:注意使用:
a*
- zero or more a
s a*
- 零个或多个a
sa+
- at least one a
a+
- 至少a
a?
- "optional" - one or zero a
s - “可选” - 一个或零a
sre.VERBOSE
flag) with comments - much easier to understand the pattern above than the non-verbose equivalent, I=\s?-?\s?\d+(\.\d+)
.带有注释的详细正则表达式( re.VERBOSE
标志) - 比非详细等效项I=\s?-?\s?\d+(\.\d+)
更容易理解上面的模式。r"..."
instead of plain strings "..."
- means that literal backslashes don't have to be escaped.正则表达式模式的原始字符串, r"..."
而不是纯字符串"..."
- 意味着不必转义文字反斜杠。 Not required here because our pattern doesn't use backslashes, but one day you'll need to match C:\Program Files\...
and on that day you will need raw strings.此处不需要,因为我们的模式不使用反斜杠,但有一天您需要匹配C:\Program Files\...
,而在那一天您将需要原始字符串。Exercise 1: How do you extend this so that it can match the unit as well?练习 1:如何扩展它以便它也可以匹配单位? And how do you extend this so that it can match the unit as either mA
, A
, or kA
?您如何扩展它以便它可以将单位匹配为mA
、 A
或kA
? Hint: "Alternation operator".提示:“交替运算符”。
Exercise 2: How do you extend this so that it can match numbers in engineering notation, ie "1.00e3", or "-3.141e-4"?练习 2:如何扩展它以匹配工程符号中的数字,即“1.00e3”或“-3.141e-4”?
import re
text = "One of the values, I=1mV is used"
l = (re.split('I=', text))
print str(l[1]).split(' ') [0]
if you have more than one I=
do the above for each odd index in l sice 0 is the first one.如果你有多个I=
对 l 中的每个奇数索引执行上述操作,因为 0 是第一个。
that is a good way since one can write "One of the values, I= 1mV is used" and I guess you want to get that I is 1mv.这是一个好方法,因为可以写“使用其中一个值,I = 1mV”,我想你想知道 I 是 1mv。
BTW I is current and its units are Ampers and not Volts:)顺便说一句,我是电流的,它的单位是安培而不是伏特:)
With your re.findall attempt you would want to add a +
which means one or more.通过您的 re.findall 尝试,您可能想要添加一个+
表示一个或多个。
Here are some examples:这里有些例子:
import re
test = "This is a test with I=1mV, I=1.414mv, I=10mv and I=1.618mv."
result = re.findall(r'I=[\d\.]+m[vV]', test)
print(result)
test = "One of the values, I=1mV is used"
result = re.search(r'I=([\d\.]+m[vV])', test)
print(result.group(1))
The first print is: ['I=1mV', 'I=1.414mv', 'I=10mv', 'I=1.618mv']
第一次打印是: ['I=1mV', 'I=1.414mv', 'I=10mv', 'I=1.618mv']
I've grouped everything other than I=
in the re.search example,在 re.search 示例中,我将I=
以外的所有内容分组,
so the second print is: 1mV
所以第二次打印是: 1mV
incase you are interested in extracting that.如果您有兴趣提取它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.