简体   繁体   English

正则表达式查找HTML标记之间的小写字母和大写字母

[英]Regex to find a lowercase letter followed by an uppercase between a HTML tag

I want to use Regular Expression in TextWrangler to find lowercase letter followed by uppercase between these HTML font-color tags. 我想在TextWrangler中使用正则表达式在这些HTML字体颜色标记之间查找小写字母,然后查找大写字母。 For example: 例如:

<font color =#0B610B> Word word wordWord </font>
<font color =#C0C0C0> Word word wordWord </font>

In fact, I want them to be split by a colon as: 实际上,我希望它们被冒号分隔为:

<font color =#0B610B> Word word word: Word </font>
<font color =#C0C0C0> Word word word: Word </font>

I have used: 我用过:

<font color =#0B610B\b[^>]*>(.*?)</font>

But its finds every thing between the font-color tag 但是它可以找到font-color标签之间的所有内容

I have also tried: 我也尝试过:

<font color =#0B610B\b[^>]*>([a-z])([A-Z])</font>

But it does not work. 但这行不通。

Could anyone help me? 有人可以帮我吗? Thank you very much. 非常感谢你。

This question has not been marked as Answered. 该问题尚未标记为“已回答”。 If you still have not found an adequate answer, you can try this: 如果您仍然找不到合适的答案,则可以尝试以下操作:

Given the following examples, only lines 1, 2, and 3 should "match" your criteria. 给定以下示例,只有第1、2和3行才能“匹配”您的条件。 Line 4 should NOT match, since there is no "lowercase-Uppercase" combination. 4号线应匹配,因为没有“小写,大写”的组合。 Line 5 should also not match because the font color (#FFFFFF) does not match what you specified (in the OP as well as follow-up comments). 第5行也应该不匹配,因为字体颜色(#FFFFFF)与您指定的颜色不匹配(在OP以及后续注释中)。

<font color =#0B610B> Word word wordWord </font>
<font color =#C0C0C0> Word word wordWord </font>
<font color =#C0C0C0> wordWord wordWordwordWord </font>
<font color =#0B610B> word word word Word Word Word Wordword </font>
<font color =#FFFFFF> Word word wordWord </font>

The search term could be written like this: 搜索词可以这样写:

(?<=font color =#(?:0B610B|C0C0C0)>)((?:(?!</font>|[\r\n]).)*[a-z])([A-Z])

The replacement term could be written like this: 替换术语可以这样写:

\1: \2

The search term has several nested parentheses. 搜索词有几个嵌套的括号。 The first, (?<...) finds the "" tag on the left, and then starts the search from the right side of it. 第一个(?<...)在左侧找到“”标签,然后从右侧开始搜索。 The (?:0B610B|C0C0C0) finds either of your specified font colors (you can add more by adding more "|" pipes), and does not store them in one of the \\# registers (like \\1 or \\2). (?:0B610B|C0C0C0)查找您指定的两种字体颜色(您可以通过添加更多“ |”管道来添加更多字体颜色),而不将它们存储在\\#寄存器之一(如\\ 1或\\ 2)中。

There are then 3 opening ( 's. The first is a matching group, which will be matched with the \\1 . The third (skipping the 2nd for now) that looks like (?!...) will look that the characters just to the right of the current search pattern is NOT the closing </font> tag, nor is it any kind of newline character. While that condition is true, the . character advances the search to the next character, where it checks again to ensure that the </font> is not found. It does this until it finds the </font> closing tag. 然后有3个开头(是。第一个是匹配组, 它将\\1匹配。第三个(现在跳过第二个)看起来像(?!...)当前搜索模式的右侧不是</font>标记,也不是任何换行符。在这种情况下, .字符会将搜索前进到下一个字符,在此再次检查以确保该</font>是找不到的。为此,它会,直到它找到</font>结束标记。

The reason for the 2nd (?:...) group is that we don't want that search result to be passed into any registers: we want the "everything between <font>...</font> tags", but actually excluding the tags. 第二个(?:...)组的原因是我们不希望将搜索结果传递到任何寄存器中:我们希望“ <font> ... </ font>标记之间的所有内容”,但是实际上不包括标签。

Finally, in the replacement term, we paste the portion of the text from the right of the <font> tag, to the first occurrence of where the word is lowercase and before the same word hits an Uppercase character. 最后,在替换项中,我们将文本的一部分从<font>标记的右侧粘贴到该单词为小写字母且在同一单词出现大写字母之前的第一次出现。 Then it simply enters a colon, a space, and ends. 然后,它只是进入一个冒号,一个空格并结束。 You may have to run this replacement multiple times for cases where a single line contain wordWordWordWord . 对于单行包含wordWordWordWord情况,您可能必须多次运行此替换。

How about doing a positive look ahead, something like this 像这样的事情怎么样?

[a-z](?=[A-Z])

I don't have text wrangler but you can use this and match the word and add your colon and space . 我没有text wrangler但是您可以使用它和单词匹配,并添加colonspace I tested this regex in perl and it looks ok. 我在perl测试了此regex ,看起来还可以。

[jaypal:~/Temp] cat temp
<font color =#0B610B> Word word wordWord </font>
<font color =#C0C0C0> Word word wordWord </font>

[jaypal:~/Temp] perl -pe 's/([a-z])(?=[A-Z])/$1: /' temp
<font color =#0B610B> Word word word: Word </font>
<font color =#C0C0C0> Word word word: Word </font>

Update: I forgot I have BBEdit which is the big brother of Text Wrangler. 更新:我忘了我有BBEdit,它是Text Wrangler的老大哥。 Here is it in action . 它在起作用

Update2: Here is it in action in Text Wrangler. Update2:这是Text Wrangler中的实际操作

尝试这个

<font.*?>.*?[az][AZ].*?</font>

这个怎么样:

<font[^>]*>[^<>]*([a-z][A-Z])[^<>]*</font>

I don't think you can do it in one single Regex expression, but provided you can loop through it: 我不认为您可以在一个Regex表达式中完成此操作,但前提是您可以遍历它:

<script type="text/javascript">
function checkscript() {
    var content = document.regexForm.input.value;
//match any HTML tag (you could specify font)(not an opening tag)(lowercase)(uppercase)(not an opening tag)
    while(content.match(/(<[^>]*?>)([^<]*)([a-z])([A-Z])([^<]*)/))
    {
        content = content.replace(/(<[^>]*?>)([^<]*)([a-z])([A-Z])([^<]*)/g,"$1$2$3: $4$5");
    }
    document.regexForm.output.value = content;
}
</script>
<body>

<form name="regexForm">
    <textarea rows="10" cols="50" name="input"> 
            <font color =#0B610B> Word myWord<BR> wordWord </font>
            <font color =#C0C0C0> Word word wordWord </font>
    </textarea>
<BR>    
<input type=button value="run test regex" onClick="checkscript();return true;">
<BR><textarea rows="10" cols="50" name="output"></textarea>
</form>

this: 这个:

<font color =#0B610B> Word myWord<BR> wordWord </font>
<font color =#C0C0C0> Word word wordWord </font>

becomes: 变成:

<font color =#0B610B> Word my: Word<BR> word: Word </font>
<font color =#C0C0C0> Word word word: Word </font>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM