I want to take a string like: ab%cde%fg hij %klm n%op
And convert it to any of (all are acceptable):
'ab'%c'de'%f'g hij '%k'lm n'%o'p'
'ab'%c'de'%f'g' 'hij' %k'lm' 'n'%o'p'
'a''b'%c'd''e'%f'g' 'h''i''j' %k'l''m' 'n'%o'p'
(if an alphabetical character is not preceded by a %
, it needs to be within single quotes. Opening and closing extra single quotes is acceptable)
I'm trying to take a string in C strftime
format and convert it to work with Java's SimpleDateFormat
. For the most part, this is pretty straight forward:
String format = "%y-%m-%d %H:%M:%S";
Map<String, String> replacements = new HashMap<String, String>() {{
put("%a", "EEE");
put("%A", "EEEE");
put("%b", "MMM");
put("%B", "MMMM");
put("%c", "EEE MMM dd HH:mm:ss yyyy");
// ... for each strftime token, create a mapping ...
}};
for ( String key : replacements.keySet() )
{
// apply the mappings one at a time
format = format.replaceAll(key, replacements.get(key));
}
// Then format
SimpleDateFormat df = new SimpleDateFormat(format, Locale.getDefault());
System.out.println(df.format(Calendar.getInstance().getTime()));
However when I introduce character literals, it runs into issues. According to the strftime
documentation, all character literal not preceded by a percent sign are passed along without modification to the output string. So:
Format: "%y is a great year!"
Output: "2019 is a great year!"
However with SimpleDateFormat
, all character literals are treated as tokens unless surrounded by single quotes:
Format: "yyyy 'is a great year!'"
Output: "2019 is a great year!"
Format: "yyyy is a great year!"
Output: ERROR - invalid token "i"
Because strftime
tokens are always a single character , it shouldn't be too difficult to fix our format string. In a worst case scenario, "if a letter is not preceded by a %
sign, wrap it in single quotes", which would lead to:
Format: "%y is a great year!"
Processed: "%y 'i''s' 'a' 'g''r''e''a''t' 'y''e''a''r'!"
This is ugly, but would behave as expected and is an acceptable answer. Ideally we would wrap all runs of alphabetical characters not preceded by a %
, like so:
Format: "%y is a great year!"
Processed: "%y 'is' 'a' 'great' 'year'!"
Or, better yet, all runs including non-alpha and non- %
characters :
Format: "%y is a great year!"
Processed: "%y' is a great year!'"
I started with a mindless regular expression that I was pretty sure wouldn't work, and it didn't:
format.replaceAll("[^%]([a-zA-Z]+)", "'$1'");
// Format: "Literal %t Literal"
// Output: "'iteral' %t'Literal'"
// Expected: "'Literal' %t 'Literal'"
I don't have a firm grasp on back-references so I gave them a whirl but messed something up there as well:
format.replaceAll("(?!%)([a-zA-Z]+)", "'$1'");
// Format: "Literal %t Literal"
// Output: "'Literal' %'t' 'Literal'"
// Expected: "'Literal' %t 'Literal'"
I also considered writing a very simple lexer. Something like:
StringBuffer s = new StringBuffer();
boolean inQuote = false;
for (int i = 0; i < format.length; i++)
{
if (format[i] == '%')
{
i++;
s.append(replacements.get(format[i]);
}
else if (inQuote)
{
s.append(format[i]);
}
else
{
s.append("'");
inQuote = true;
s.append(format[i]);
}
}
However I learned that format[i]
isn't valid Java syntax, and didn't spend much time looking into how to properly get a character from a string before I decided to just post here.
I would prefer a regular expression solution so that I can write it in a single line instead of a loop like this.
This has been updated to work with a single regex. Additional formats can be added to test for correctness.
String[] formats = { "ab%cde%fg hij %klm n%op", "ab%c", "%d"
};
for (String f : formats) {
String parsed = f.replaceAll("(^[a-z]+|(?<=%[a-z])([a-z ]+))", "'$1'");
System.out.println(parsed);
}
The two possibilities are:
[az]+
that follow %[az]
between single quotes. %
and not included above between single quotes. Why not use several replaceAll functions since you have already considered it.
First, add single quotes to all consecutive character strings;
Then, move the single quote preceded by % by one character;
Last, remove empty quotes.
Below is my testing code in Python. I believe it works in other languages such as Java as well.
>>> str1=re.sub("([a-zA-Z]+)","'\g<1>'",input)
>>> str2=re.sub("%'([a-zA-Z])'","%\g<1>",str1)
>>> str3=re.sub("''","",str2)
>>> str1
"'Literal' %'t' 'Literal'"
>>> str2
"'Literal' %t 'Literal'"
>>> str3
"'Literal' %t 'Literal'"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.