Java：搜索错误的编码字符串而不修改它

Question

I have to find a user-defined String in a Document (using Java ), which is stored in a database in a BLOB. 我必须在Document（使用Java ）中找到用户定义的字符串，该字符串存储在BLOB的数据库中。 When I search a String with special characters ("Umlaute", äöü etc.), it failes, meaning it does not return any positions at all. 当我搜索带有特殊字符（“ Umlaute”，äöü等）的字符串时，它失败，这意味着它根本不返回任何位置。 And I am not allowed to convert the document's content into UTF-8 (which would have fixed this problem but raised a new, even bigger one). 而且不允许我将文档的内容转换为UTF-8（本来可以解决此问题，但是提出了一个新的甚至更大的问题）。

Some additional information: The document's content is returned as String in "ISO-8859-1" (Latin1). 一些其他信息：文档的内容在“ ISO-8859-1”（Latin1）中作为字符串返回。 Here is an example, what a String could look like: 这是一个示例，字符串看起来像什么：

Die Erkenntnis, daÃ der KÃ¼nstler Schutz braucht, ...

This is how it should look like: 它应该是这样的：

Die Erkenntnis, daß der Künstler Schutz braucht, ...

If I am searching for Künstler it would fail to find it, because it looks for ü but only finds Ã¼ . 如果我正在搜索Künstler ，它将找不到它，因为它查找ü但只能找到Ã¼ 。

Is it possible to convert Künstler into KÃ¼nstler so I can search for the wrong encoded version instead? 是否可以将Künstler转换为KÃ¼nstler以便我可以搜索错误的编码版本？

Note: We are using the Hibernate Framework for Database access. 注意：我们正在使用Hibernate Framework进行数据库访问。 The original Getter for the Document's Content returns a byte[] . 文档内容的原始Getter返回byte[] 。 The String is than returned by calling 然后通过调用返回字符串

new String(getContent(), "ISO-8859-1")

The problem here is, that I cannot change this to UTF-8, because it would then mess up the rest of our application which is based on a third party application that delivers data this way. 这里的问题是，我无法将其更改为UTF-8，因为这将使我们的应用程序的其余部分混乱，该应用程序基于以这种方式传递数据的第三方应用程序。

Answer 1

Okay, looks like I've found a way to mess up the encoding on purpose. 好的，看来我已经找到了一种故意弄乱编码的方法。

new String("Künstler".getBytes("UTF-8"), "ISO-8859-1")

By getting the Bytes of the String Künstler in UTF-8 and then creating a new String, telling Java that this is Latin1, it converts to KÃ¼nstler . 通过获取UTF-8中的字符串Künstler的字节，然后创建一个新的String，告诉Java这是Latin1，它将转换为KÃ¼nstler 。 It's a hell of a hack but seems to work well. 这真是个骇人听闻的事情，但似乎运作良好。

Answer 2

Already answered by yourself. 已由您自己回答。

An altoghether different approach: If you can search the blob, you could search using 完全不同的方法：如果可以搜索Blob，则可以使用

"SELECT .. FROM ... WHERE"
+ " ... LIKE '%" + key.replaceAll("\\P{Ascii}+", "%") + "%'"

This replaces non-ASCII sequences by the % wildcard: UTF-8 multibyte sequences are non-ASCII by design. 这将用%通配符替换非ASCII序列：UTF-8多字节序列在设计上是非ASCII的。

Java：搜索错误的编码字符串而不修改它

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-05-07 11:45:22

解决方案2
0 2015-05-07 11:56:43

Java：搜索错误的编码字符串而不修改它

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-05-07 11:45:22

解决方案2 0 2015-05-07 11:56:43

解决方案1
1 已采纳 2015-05-07 11:45:22

解决方案2
0 2015-05-07 11:56:43