简体   繁体   English

从Word文档复制到网页时文本发生变化

[英]Text changes when copied from word document to web page

I am creating a blog engine and it includes a <textarea> which takes in the input of the whole article. 我正在创建一个博客引擎,它包含一个<textarea> ,它接受了整篇文章的输入。

I then use ajax and store it to the Text variable provided by the GAE datastore 然后,我使用ajax并将其存储到GAE数据存储区提供的Text变量中

The Problem: If a user copies the text from a word document, them I see various random characters on the screen when embedded on the web page. 问题:如果用户从Word文档中复制文本,则当他们嵌入网页时,他们在屏幕上会看到各种随机字符。 I know this is because the word file uses XML encoding and a HTML page uses utf-8 encoding(in my case) 我知道这是因为word文件使用XML编码,而HTML页面使用utf-8编码(在我的情况下)

The question: How do I change the encoding of the inputted text? 问题:如何更改输入文本的编码? Or how can I avoid the XML encoding? 或者如何避免XML编码? Or if changing the encoding of my web page might help solve this problem? 或者,如果更改网页的编码可能有助于解决此问题?

Points to be noted: I want to make it automated.. I have read on Google that you should 1st copy the text to some simple text editor which formats the encoding and them copy it to the web page. 需要注意的要点:我想使其自动化。在Google上阅读过,您应该首先将文本复制到一些简单的文本编辑器中,该编辑器会格式化编码并将它们复制到网页上。 But this option is not feasible for me. 但是此选项对我而言不可行。

Also I have used weebly before, and that time I copied text from a word file, if someone knows how weebly manages the encoding conflict! 同样,我以前曾经使用过weebly,而那时,如果有人知道weebly如何管理编码冲突,那么我会从word文件中复制文本!

Answers are expected in java :) 答案应该在java :)

that is because word documment ' (comma) are not covered in UTF - 8 standards so you need to handle it in programmatic way. 这是因为UTF-8标准未涵盖单词documment'(逗号),因此您需要以编程方式进行处理。

below is some example on javascript 以下是关于javascript的一些示例

<textarea rows="4" onkeyup="replaceWordChars(this.value)" cols="50">
//your text area
</textarea> 


function replaceWordChars(text) {
    var s = text;
    // smart single quotes and apostrophe
    s = s.replace(/[\u2018|\u2019|\u201A]/g, "\'");
    // smart double quotes
    s = s.replace(/[\u201C|\u201D|\u201E]/g, "\"");
    // ellipsis
    s = s.replace(/\u2026/g, "...");
    // dashes
    s = s.replace(/[\u2013|\u2014]/g, "-");
    // circumflex
    s = s.replace(/\u02C6/g, "^");
    // open angle bracket
    s = s.replace(/\u2039/g, "<");
    // close angle bracket
    s = s.replace(/\u203A/g, ">");
    // spaces
    s = s.replace(/[\u02DC|\u00A0]/g, " ");
    document.getElementById("your Textarea ID ").value = s;
}

on text area you need to fire this javascript function onKeyup event 在文本区域上,您需要触发此javascript函数onKeyup事件

Not sure if this will help anyone, but I spent a few days trying to figure out this issue. 不知道这是否会帮助任何人,但是我花了几天的时间来弄清楚这个问题。 My use case was very similar except I discovered my problem related to the way the clipboard copied (this changed slightly depending upon OS) and subsequently pasted the text. 我的用例非常相似,除了我发现我的问题与剪贴板复制(取决于操作系统而稍有变化)并随后粘贴文本的方式有关。 (I used ClipSpy to investigate what was happening "under the hood".) (我使用ClipSpy调查了“ 幕后 ”的情况。)

Forgive my layman's explanation: The clipboard stores text in multiple formats and when the paste command is given it attempts to match the charset/encoding of the recipient program, or in my case <textarea> box of my webpage. 请原谅我的外行解释:剪贴板以多种格式存储文本,并且在发出粘贴命令时,剪贴板将尝试匹配收件人程序的字符集/编码,或者与我的网页的<textarea>框匹配。 These sites and forum posts helped immensely: 这些站点和论坛帖子极大地帮助了:

Ultimately all I had to do was declare early on <head> <meta charset="UTF-8"> </head> let the browser do the "hard" work for me, by expecting UTF-8 encoded text and the clipboard attempts to honour that. 最终,我要做的就是尽早在<head> <meta charset="UTF-8"> </head>让浏览器通过对UTF-8编码的文本和剪贴板的尝试来为我完成“艰苦”的工作。为了纪念这一点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从Microsoft Word或在线其他内容复制时,JEditorPane无法将html正确转换为文本 - JEditorPane is not converting html to text properly when copied from Microsoft word or other content from online 如何在网页中显示Word文档? - how to display word document in a web page? 当从MS Word复制的文本通过HTML表单发送到Java时,会出现奇怪的字符并且文本长度会增加 - When text copied from MS Word is sent to Java via HTML form, strange characters appear and text length increases 如何使用 GWT 优化从 MS Word 复制的 HTML 文本? - How to optimize the HTML text copied from MS Word with GWT? Apache poi:在段落中插入文本并查看 word 文档中的跟踪更改 - Apache poi: insert text in a paragraph and view track changes in a word document 从文本文档中随机选择一个单词 - Randomly choose a word from a text document 一个正则表达式,用于从解析的Word文档中返回文本 - A regex to return text from parsed word document 将文本从Word文档拖到Java文本组件? - Dragging text from a Word document to a Java text component? 如何通过基于Web的工具在Word文档中填充数据 - How to fill data in a Word document from a Web based tool 如何从Java Web服务器打印Microsoft Word文档? - How to print a microsoft word document from Java web server?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM