标准化Elixir / Phoenix中的字符串

Question

I want to normalize the Unicode(UTF-8) strings posted from users thru a <form> . 我想规范化用户通过<form>发布的Unicode（UTF-8）字符串。 Is there any library which treats those things in Elixir(or in Phoenix or in Erlang)? 是否有任何图书馆在Elixir（或凤凰城或Erlang）处理这些东西？ I'm used to do it in Python like following, but I don't know Elixir has those libraries. 我习惯在Python中这样做，但我不知道Elixir有这些库。

import unicodedata
import zenhan
import jctconv

def normalize(strings, unistr = 'NFKC')
    norm = unicodedata.normalize(unistr, strings)
    zenhan = zenhan.z2h(norm, mode=2)
    katahira = jctconv.kata2hira(zenhan)

    return katahira

Answer 1

Since Elixir 1.2 there is a String.normalize/2 function. 从Elixir 1.2开始，有一个String.normalize/2函数。 I'm not sure what those python libraries are doing, but this functions is probably a good start for what you want to achieve. 我不确定那些python库在做什么，但是这个函数可能是你想要实现的目标的良好开端。

Answer 2

If you type h String.normalize inside iex , you'll get the right information and some examples. 如果在iex键入h String.normalize ，您将获得正确的信息和一些示例。

Converts all characters in binary to Unicode normalization form 
identified by
form.

Forms

The supported forms are:

  • :nfd - Normalization Form Canonical Decomposition. Characters are
    decomposed by canonical equivalence, and multiple combining characters are
    arranged in a specific order.
  • :nfc - Normalization Form Canonical Composition. Characters are
    decomposed and then recomposed by canonical equivalence.

Examples

┃ iex> String.normalize("yêṩ", :nfd)
┃ "yêṩ"
┃
┃ iex> String.normalize("leña", :nfc)
┃ "leña"

标准化Elixir / Phoenix中的字符串

问题描述

2 个解决方案

解决方案1
3 2016-01-10 13:01:51

解决方案2
1 2016-03-25 18:10:13

标准化Elixir / Phoenix中的字符串

问题描述

2 个解决方案

解决方案1 3 2016-01-10 13:01:51

解决方案2 1 2016-03-25 18:10:13

解决方案1
3 2016-01-10 13:01:51

解决方案2
1 2016-03-25 18:10:13