[英]Text changes when copied from word document to web page
I am creating a blog engine and it includes a <textarea>
which takes in the input of the whole article. 我正在创建一个博客引擎,它包含一个<textarea>
,它接受了整篇文章的输入。
I then use ajax and store it to the Text
variable provided by the GAE datastore 然后,我使用ajax并将其存储到GAE数据存储区提供的Text
变量中
The Problem: If a user copies the text from a word document, them I see various random characters on the screen when embedded on the web page. 问题:如果用户从Word文档中复制文本,则当他们嵌入网页时,他们在屏幕上会看到各种随机字符。 I know this is because the word file uses XML encoding and a HTML page uses utf-8 encoding(in my case) 我知道这是因为word文件使用XML编码,而HTML页面使用utf-8编码(在我的情况下)
The question: How do I change the encoding of the inputted text? 问题:如何更改输入文本的编码? Or how can I avoid the XML encoding? 或者如何避免XML编码? Or if changing the encoding of my web page might help solve this problem? 或者,如果更改网页的编码可能有助于解决此问题?
Points to be noted: I want to make it automated.. I have read on Google that you should 1st copy the text to some simple text editor which formats the encoding and them copy it to the web page. 需要注意的要点:我想使其自动化。在Google上阅读过,您应该首先将文本复制到一些简单的文本编辑器中,该编辑器会格式化编码并将它们复制到网页上。 But this option is not feasible for me. 但是此选项对我而言不可行。
Also I have used weebly before, and that time I copied text from a word file, if someone knows how weebly manages the encoding conflict! 同样,我以前曾经使用过weebly,而那时,如果有人知道weebly如何管理编码冲突,那么我会从word文件中复制文本!
Answers are expected in java :) 答案应该在java :)
that is because word documment ' (comma) are not covered in UTF - 8 standards so you need to handle it in programmatic way. 这是因为UTF-8标准未涵盖单词documment'(逗号),因此您需要以编程方式进行处理。
below is some example on javascript 以下是关于javascript的一些示例
<textarea rows="4" onkeyup="replaceWordChars(this.value)" cols="50">
//your text area
</textarea>
function replaceWordChars(text) {
var s = text;
// smart single quotes and apostrophe
s = s.replace(/[\u2018|\u2019|\u201A]/g, "\'");
// smart double quotes
s = s.replace(/[\u201C|\u201D|\u201E]/g, "\"");
// ellipsis
s = s.replace(/\u2026/g, "...");
// dashes
s = s.replace(/[\u2013|\u2014]/g, "-");
// circumflex
s = s.replace(/\u02C6/g, "^");
// open angle bracket
s = s.replace(/\u2039/g, "<");
// close angle bracket
s = s.replace(/\u203A/g, ">");
// spaces
s = s.replace(/[\u02DC|\u00A0]/g, " ");
document.getElementById("your Textarea ID ").value = s;
}
on text area you need to fire this javascript function onKeyup event 在文本区域上,您需要触发此javascript函数onKeyup事件
Not sure if this will help anyone, but I spent a few days trying to figure out this issue. 不知道这是否会帮助任何人,但是我花了几天的时间来弄清楚这个问题。 My use case was very similar except I discovered my problem related to the way the clipboard copied (this changed slightly depending upon OS) and subsequently pasted the text. 我的用例非常相似,除了我发现我的问题与剪贴板复制(取决于操作系统而稍有变化)并随后粘贴文本的方式有关。 (I used ClipSpy to investigate what was happening "under the hood".) (我使用ClipSpy调查了“ 幕后 ”的情况。)
Forgive my layman's explanation: The clipboard stores text in multiple formats and when the paste command is given it attempts to match the charset/encoding of the recipient program, or in my case <textarea>
box of my webpage. 请原谅我的外行解释:剪贴板以多种格式存储文本,并且在发出粘贴命令时,剪贴板将尝试匹配收件人程序的字符集/编码,或者与我的网页的<textarea>
框匹配。 These sites and forum posts helped immensely: 这些站点和论坛帖子极大地帮助了:
Ultimately all I had to do was declare early on <head> <meta charset="UTF-8"> </head>
let the browser do the "hard" work for me, by expecting UTF-8 encoded text and the clipboard attempts to honour that. 最终,我要做的就是尽早在<head> <meta charset="UTF-8"> </head>
让浏览器通过对UTF-8编码的文本和剪贴板的尝试来为我完成“艰苦”的工作。为了纪念这一点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.