简体   繁体   English

如何将非英语文本分解为javascript中的组成字符?

[英]How to break up non-english text into constituent characters in javascript?

I am trying to draw text along a curve on html5 canvas. 我试图在html5画布上沿曲线绘制文本。 To do this, I need to break up input text into constituent characters which can individually be rotated and translated etc. The breaking up of text is easy for English. 要做到这一点,我需要将输入文本分解成可以单独旋转和翻译的组成字符。文本的分解对于英语来说很容易。 Given input string s , s[i] gives the ith character. 给定输入字符串ss[i]给出第i个字符。 But this does not work for non-english strings. 但这不适用于非英语字符串。 I have a jsfiddle here illustrating the problem: http://jsfiddle.net/c6HV8/ . 我在这里有一个jsfiddle来说明问题: http//jsfiddle.net/c6HV8/ Note that the fiddle appears differently in Chrome and IE at time of this writing. 请注意,在撰写本文时,Chrome和IE中的小提琴显示不同。 To see what the problem is, consider you have non-english text in a string s . 看是什么问题,考虑你有一个字符串非英语文本s Create a text node to which you pass s . 创建到你传递一个文本节点s Next, create a text node for each s[i] and display the text nodes adjacent to each other. 接下来,为每个s[i]创建一个文本节点,并显示彼此相邻的文本节点。 Now compare the results. 现在比较结果。 They are not the same. 他们不一样。 How can I break up non-english text into constituent characters in javascript, so that the two results are the same? 如何将非英语文本分解为javascript中的组成字符,以便两个结果相同?

在此输入图像描述

भाईसाब :) So as I'm sure you already know, the problem is that fillText and createText both work on the entire string and so it is able to evaluate the string along with all the diacritic marks (combining characters). भाईसाब:)所以我相信你已经知道了,问题是fillTextcreateText都适用于整个字符串,所以它能够评估字符串以及所有变音符号(组合字符)。 However, when you call fillText and createText per character, none of the diacritics appear along with the characters they are supposed to be attached to. 但是,当您为每个字符调用fillTextcreateText ,没有任何变音符号与它们应该附加的字符一起出现。 Hence they are evaluated and drawn individually, which is why you see the diacritic along with the dotted circle (kind of a place holder that says: put a character here). 因此,它们会被单独评估和绘制,这就是为什么你会看到变音符号和虚线圆圈(一种占位符的形式:在这里放置一个字符)。

There is no easy way to do this, really. 真的,没有简单的方法可以做到这一点。 Your algorithm would basically have to be like this: 你的算法基本上必须是这样的:

  • Look up the current character from the string. 从字符串中查找当前字符。
  • Find all successive characters that are diacritics and then combine all of them into a new string. 找到变音符号的所有连续字符,然后将它们全部组合成一个新字符串。
  • Render that string using fillText . 使用fillText渲染该字符串。

You can check out the results here on a forked version of your fiddle . 您可以在小提琴的分叉版本上查看结果。 I modified the sample text to add some more complex characters just to make sure that the algorithm works properly. 我修改了示例文本以添加一些更复杂的字符,以确保算法正常工作。 The code could definitely be cleaned up; 代码肯定可以清理; I just did it as a proof-of-concept. 我只是把它作为一个概念验证。

The hard part is coming up with a list of code-points for diacritics for all languages if you want to internationalize this. 如果你想要将这种语言国际化,那么困难的部分就是为所有语言提供变音符号的代码点列表。 This answer provides a list that should help you get started. 这个答案提供了一个列表,可以帮助您入门。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM