简体   繁体   English

Java RegEx可修复Flex中的HTML

[英]Java RegEx to fix HTML from Flex

Need help trying to write either some RegEx or a function to fix some Adobe Flex HTML tags. 需要帮助来尝试编写一些RegEx或修复某些Adobe Flex HTML标签的函数。 We have most of them fixed, but are running into trouble with justification tags. 我们已经修复了大多数问题,但是正当理由标签出现问题。 The problem is Flex saves html from its RichTextEditor as follows: 问题是Flex从其RichTextEditor中保存html,如下所示:

<TEXTFORMAT LEADING="2">
<P ALIGN="LEFT">
<FONT FACE="Arial" STYLE="font-size: 10pt" size="10" COLOR="#000000" LETTERSPACING="0" KERNING="0">
Test Line 1
</FONT>
</P>
<TEXTFORMAT LEADING="2">
<P ALIGN="CENTER">
<FONT FACE="Arial" STYLE="font-size: 10pt" size="10" COLOR="#000000" LETTERSPACING="0" KERNING="0">
Test Line 2
</FONT>
</P>

We are trying to come up with a way to replace the P ALIGN=" _ _" with it's equivalent html4 tag, such as left, right, center, justify. 我们正在尝试提出一种方法,用等价的html4标签替换P ALIGN =“ _ _”,例如left,right,center,justify。 But we also need to replace its closing tag. 但是我们还需要替换其结束标记。

The above example would become something like 上面的例子会变成类似

<TEXTFORMAT LEADING="2">
<left>
<FONT FACE="Arial" STYLE="font-size: 10pt" size="10" COLOR="#000000" LETTERSPACING="0" KERNING="0">
Test Line 1
</FONT>
</left>
<TEXTFORMAT LEADING="2">
<center>
<FONT FACE="Arial" STYLE="font-size: 10pt" size="10" COLOR="#000000" LETTERSPACING="0" KERNING="0">
Test Line 1
</FONT>
</center>

Anybody done anything like this? 有人这样做吗?

I am just thinking of using tow regex with some manual manipulation as below: 我只是想通过以下一些手动操作使用拖曳正则表达式:

      String startingTagPattern = "<P ALIGN=\"[\\w]\">";
      String endTagPattern = "</p>";

Now if startingTagPattern is matched, get the matching tag line as : 现在,如果startingTagPattern被匹配,则获取匹配的标记行为:

      String tagLine = match.group(0);
      String alignString = tagLine.substring(8,tagLine.lastIndexOf("\""));

Now you may put the tags in output as: 现在,您可以将标签放在输出中,如下所示:

      String startTag = "<"+alignString+">";
      String endTag = "</"+alignString+">";

You can very well use regex to replace the p align tags with their respective values. 您可以很好地使用正则表达式将p align标记替换为其各自的值。

For an input like 对于像这样的输入

<P ALIGN="LEFT">xyz</P><SomeTag></SomeTag><P ALIGN="CENTER">xyz</P>;

Here's the code, 这是代码,

    public static void main(String[] args) {        
        String str = "<P ALIGN=\"LEFT\">xyz</P><SomeTag></SomeTag><P ALIGN=\"CENTER\">xyz</P>";
        Pattern pattern = Pattern.compile("(?i)<(P ALIGN=\")([^\"]+)([^>]+)>([^<]+)<([^>]+)");
        Matcher matcher = pattern.matcher(str);
        while(matcher.find()) {         
            str = str.replaceFirst(matcher.group(1), ""); // replaces <p align
            str = str.replaceFirst(matcher.group(3), ""); // replaces the single quote after LEFT or CENTER
            str = str.replaceFirst(matcher.group(5), "/" + matcher.group(2)); // replaces /P from the ending tag with /LEFT or /CENTER 
        }
        System.out.println(str);
    }

Output 输出量

<LEFT>xyz</LEFT><SomeTag></SomeTag><CENTER>xyz</CENTER>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM