简体   繁体   English

是否有任何正则表达式可用于标识字符串是否用Java进行了base64编码?

[英]Is there any regular expression available to identify whether a string is base64 encoded or not in java?

I went through the several discussions to find out how to do this. 我经历了几次讨论,以了解如何执行此操作。 But not found any exact solution for doing this. 但没有找到执行此操作的确切解决方案。 I have used the following regular expression to check whether the string is Base64 encoded or not 我使用以下正则表达式检查字符串是否为Base64编码

^([A-Za-z0-9+/]{4})*([A-Za-z0-9+/]{4}|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{2}==)$  

But this is not accurate every time. 但这并不是每次都准确。 I know i can use some try catch method. 我知道我可以使用一些try catch方法。 But that is expensive operation for java. 但这对于Java来说是昂贵的操作。 Is there any exact way for doing this. 有什么确切的方法可以做到这一点。 I am using java 7. 我正在使用Java 7。

I would advise caution on this. 我对此建议谨慎。 There are two problems: 有两个问题:

The first problem is that regexes like the one you have shown us can suffer from performance problems when the string is not a match. 第一个问题是,当字符串不匹配时,像您展示给我们的正则表达式可能会遇到性能问题。 In particular, you get a lot of unnecessary backtracking before match failure. 特别是,在比赛失败之前,您会得到很多不必要的回溯。

(It is possible to avoid the backtracking by using "reluctant" or "possessive" quantifiers rather than "greedy" quantifiers, but you need to understand what you are doing.) (可以通过使用“不情愿”或“可能”量词而不是“贪婪”量词来避免回溯,但是您需要了解自己在做什么。)

Even so, unless the string is short, it is likely to be more efficient to attempt a base64 decode using a Base64.Decoder::decode method and catch a possible exception, than to use a regex to validate. 即使这样,除非字符串短,否则使用Base64.Decoder::decode方法尝试进行base64解码并捕获可能的异常可能比使用正则表达式进行验证更有效。 And you have the potential bonus that you have the decoded data. 而且您拥有解码数据的潜在好处。

(Maybe as a speedup you could check the first 4 and last 4 characters before attempting a full base64 decode.) (也许为了加快速度,您可以在尝试完整的base64解码之前检查前4个和后4个字符。)


The second problem is that (in theory) a string may be syntactically valid as Base64, but it have been produced by another "process". 第二个问题是(理论上)字符串在语法上可以作为Base64有效,但是它是由另一个“进程”产生的。 Thus, when you decode the string, you may get garbage. 因此,在解码字符串时,可能会产生垃圾。 Therefore, it may worth decoding the string and checking what is inside ... as part of your validation. 因此,作为验证的一部分,可能值得对字符串进行解码并检查其中的内容。


I know i can use some try catch method. 我知道我可以使用一些try catch方法。 But that is expensive operation for java. 但这对于Java来说是昂贵的操作。

It is all relative. 都是相对的。 Furthermore, newer JVMs can throw and handle exceptions more efficiently due to some optimizations introduced in (I think) Java 8. 此外,由于(我认为)Java 8中引入了一些优化,因此更新的JVM可以更有效地引发和处理异常。

A base64 rendering of any given string is just another string consisting of an alphabet of 64 tokens. 任何给定字符串的base64渲染只是由64个记号的字母组成的另一个字符串。 Can a string be regex-checked for consisting of only tokens of that given alphabet ? 是否可以对字符串进行正则表达式检查,使其仅包含该给定字母的标记? Yes. 是。 Does that imply that such a string is indeed the result of an intentional base64 encoding ? 这是否意味着这样的字符串确实是有意的base64编码的结果? No. Also note that the very fact of consisting only of an alphabet of 64 tokens does not imply being a legitimate base64 encoding of some other string. 不能。还请注意,仅由64个标记的字母组成的事实并不意味着它是其他字符串的合法base64编码。 Due to issues of string length and possible padding and the way it is dealt with, it might or might not be the case that the string "a" is itself not a valid base64 encoding for anything, even if the alphabet it consists of might suggest otherwise. 由于字符串长度和可能的填充以及处理方式的问题,字符串“ a”本身不是任何东西的有效base64编码,即使它所包含的字母可能暗示也可能不是这种情况除此以外。

"Try to detect from actual content" is in general a very poor (because utterly error prone) strategy. 通常,“尝试从实际内容中进行检测”是一种非常糟糕的策略(因为完全容易出错)。 Avoid whenever possible. 尽可能避免。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM