简体   繁体   中英

Replace an Expression Within Text Boundaries

I have a rather annoying issue that I solved using a simple recursive method in Java. However, I'm looking for a better way to do this.

The initial problem involved the presence of whitespace within a Quoted Printable/Base64 encoded Mime header - which as I read the RFC 2047 specification - isn't allowed. This means that decoding fails for a MIME header when whitespace is present, eg

=?iso-8859-1?Q?H=E4 ll and nothing?=

or more pertinently:

=?iso-8859-1?Q?H=E4 ll?= preserve this text =?iso-8859-1?Q?mo nk ey?=

The goal is to only remove the whitespace between the =? ?= boundaries (or re-encode using =20). Other text outside this should be preserved.

I'm looking for alternative approaches to solving this my target language for this is Java. Any ideas on the simplest, cleanest approach to this?

You could build a simple state machine to track if you are between =? and ?= , then read the input char by char and output it char by char converting whitespaces when needed...

Regular expressions http://java.sun.com/docs/books/tutorial/essential/regex/ .

\\s = whitespace
\\S = non-whitespace
\\? = escaped question mark
. = all characters, similar to * in weaker pattern matching.

Might be easiest to do a multi-part find and replace using something like this: Pull out this part: =\\?.\\?=

Globally replace \\s in that part with empty string.

Put the part back.

You might be able to get it down to a single search and replace if you play with the regex long enough...

Well, I don't know about better, but here's an alternate approach:

    public static void main( String[] args )
    {
        String ex1 = "=?iso-8859-1?Q?H=E4 ll?= " + 
            "preserve this text =?iso-8859-1?Q?mo nk ey?=";
        String res1 = removeSpaces( ex1 );

        System.out.println( ex1 );
        System.out.println();
        System.out.println( res1 );
    }

    public static String removeSpaces( String str )
    {
        StringBuffer result = new StringBuffer();
        String strPattern = "(\\?.+\\?)";
        Pattern p = Pattern.compile( strPattern );
        Matcher m = p.matcher( str );

        if ( !m.find() || m.groupCount() == 0 )
        { // Contains no matching sequence.
            return str;
        }

        for ( int i = 1; i <= m.groupCount(); i++ )
        {
            m.appendReplacement( result, 
                m.group( i ).replaceAll( "\\s", "" ) );
        }

        return result.toString();
    }

You could split the string on ?, then put it back together, alternating between replacing spaces and not.

Edit: Oops. Missed the equal signs. Will correct.

Edit 2: Corrected implementation (derived from Javadoc example for Matcher.appendReplacement() ):

String input = "=?iso-8859-1?Q?H=E4 ll?= what about in this case? :) =?iso-8859-1?Q?mo nk ey?=";

Pattern p = Pattern.compile("=\\?(.*?)\\?=");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer();
while (m.find()) {
    m.appendReplacement(sb, m.group().replaceAll(" ", ""));
}
m.appendTail(sb);
System.out.println(sb.toString());

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM