[英]Java Clipboard: Paste HTML from Firefox on Linux
在Linux上將HTML從Firefox粘貼到Java6應用程序(僅限!)時,我遇到了一個奇怪的問題。 這是一個最小的例子:
import java.awt.Toolkit;
import java.awt.datatransfer.Clipboard;
import java.awt.datatransfer.DataFlavor;
import java.awt.datatransfer.Transferable;
import java.io.Reader;
import java.nio.ByteBuffer;
class ClipboardPrinter {
public static void main( String args[] ) throws Exception
{
Clipboard systemClipboard = Toolkit.getDefaultToolkit()
.getSystemClipboard();
Transferable transferData = systemClipboard.getContents(null);
if (transferData == null) {
System.out.println("no content");
return;
}
// final DataFlavor htmlFlavorString = new DataFlavor("text/html;class=java.lang.String");
// String html = (String)transferData.getTransferData(htmlFlavorString);
// System.out.println("html = '" + html + "'");
final DataFlavor htmlFlavor = new DataFlavor("text/html;class=java.nio.ByteBuffer;charset=US-ASCII");
if (!transferData.isDataFlavorSupported(htmlFlavor)) {
System.out.println("no text/html reader content");
return;
}
ByteBuffer bb = (ByteBuffer)transferData.getTransferData(htmlFlavor);
byte[] bytes = bb.array();
for (byte b: bytes)
{
System.out.format("%02x", b);
}
System.out.println();
final int cutoff = 2;
byte[] bytes2 = new byte[bytes.length - cutoff];
for (int i = cutoff; i < bytes.length; i++)
bytes2[i-cutoff] = bytes[i];
final String htmlContent = new String(bytes2, "UTF-16LE");
System.out.println("htmlContent = '" + htmlContent + "'");
}
}
首先,我嘗試使用new DataFlavor("text/html;class=java.lang.String")
,(上面代碼片段中注釋掉的代碼),但是這會產生一個不可用的字符串,其中包含2個字符,其值為65533(並且切斷這兩個字符無濟於事。
接下來我使用了charset=US-ASCII
的ByteBuffer數據風格(我故意使用ASCII!): charset=UTF-16LE
(或UTF-16或UTF-16BE)根本不起作用。 使用上面的charset=US-ASCII
解決方案(以及new String(bytes2, "UTF-16LE")
),7位字符可以工作(但是例如變音符號不起作用,而是打印'?')。
我切斷了兩個字節,因為一開始似乎有兩個boms(不確定,可能是其他的)?
我得到一個類似的結果,數據風格為charset=UTF-8
和cutoff = 6(開頭兩個三字節“替換字符”0xEFBFBD,而變音符號編碼為兩個錯誤字符)。 在這兩種情況下,我都使用了new String(bytes2, "UTF-16LE")
。
您對如何:有任何建議嗎?
謝謝! 任何提示都表示贊賞!
BTW:以下是我(Linux)系統支持的數據風格(來自transferable.getTransferDataFlavors()
):
[java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.Reader]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.lang.String]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.CharBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[C]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.io.InputStream;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=java.nio.ByteBuffer;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/html;representationclass=[B;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=application/x-java-serialized-object;representationclass=java.lang.String]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.Reader]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.lang.String]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.CharBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[C]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=unicode]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-16]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-8]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-16BE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=UTF-16LE]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=ISO-8859-1]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.io.InputStream;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=java.nio.ByteBuffer;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/plain;representationclass=[B;charset=US-ASCII]
java.awt.datatransfer.DataFlavor[mimetype=text/x-moz-url-priv;representationclass=java.io.InputStream]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlinfo;representationclass=java.io.InputStream]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlcontext;representationclass=java.io.InputStream]
java.awt.datatransfer.DataFlavor[mimetype=text/x-moz-url-priv;representationclass=java.nio.ByteBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlinfo;representationclass=java.nio.ByteBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlcontext;representationclass=java.nio.ByteBuffer]
java.awt.datatransfer.DataFlavor[mimetype=text/x-moz-url-priv;representationclass=[B]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlinfo;representationclass=[B]
java.awt.datatransfer.DataFlavor[mimetype=text/_moz_htmlcontext;representationclass=[B]]
我相信問題是相關的,因為他read from clipboard as US-ASCII
,然后轉換為unicode並期望保持德語變音完整。 由於US-ASCII是7位字符集,因此不包括德語變音符號,並且在以US-ASCII讀取剪貼板后已經丟失。
public class CharsetDemo {
public static void main(String[] args) throws Exception {
byte[] bytes;
// convert the German umlaut to bytes in US-ASCII charset
bytes = "ö".getBytes("US-ASCII");
System.out.println("US-ASCII");
System.out.println("bytes : " + asHexString(bytes));
System.out.println("string: " + new String(bytes, "US-ASCII"));
System.out.println();
// create a unicode string from the US-ASCII bytes
String utf8String = new String(bytes, "UTF-8");
bytes = utf8String.getBytes("UTF-8");
System.out.println("UTF-8");
System.out.println("bytes : " + asHexString(bytes));
System.out.println("string: " + utf8String);
System.out.println();
// convert the German umlaut to bytes in ISO-8859-1 charset
bytes = "ö".getBytes("ISO-8859-1");
System.out.println("ISO 8859-1");
System.out.println("bytes : " + asHexString(bytes));
System.out.println("string: " + new String(bytes, "ISO-8859-1"));
System.out.println();
// create a unicode string from the ISO-8859-1 bytes
utf8String = new String(bytes, "UTF-8");
bytes = utf8String.getBytes("UTF-8");
System.out.println("UTF-8");
System.out.println("bytes : " + asHexString(bytes));
System.out.println("string: " + utf8String);
System.out.println();
// bytes of the "REPLACEMET CHARACTER"
System.out.println("replacement character bytes: "
+ asHexString("\uFFFD".getBytes("UTF-8")));
}
static String asHexString(byte[] bytes) {
StringBuilder sb = new StringBuilder();
for (byte b : bytes) {
sb.append(String.format("%X ", b));
}
return sb.toString();
}
}
產量
US-ASCII
bytes : 3F
string: ? <--- the question mark represents here the "REPLACEMENT CHARACTER"
UTF-8
bytes : 3F
string: ?
ISO 8859-1
bytes : F6
string: ö
UTF-8
bytes : EF BF BD <-- the "REPLACEMENT CHARACTER", as "F6" is not a valid UTF-8 codepoint
string: �
replacement character bytes: EF BF BD
不再支持Java 6。 所以,問題已經過時了。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.