简体   繁体   English

Ruby 1.8.7中的Ruby字符串编码

[英]Ruby string encoding in Ruby 1.8.7

I am creating a Ruby string using the Ruby C API (from Objective C) and it happens to hold Finnish characters. 我正在使用Ruby C API(来自Objective C)创建一个Ruby字符串,它恰好包含芬兰语字符。

Once in Ruby I call a gem that does some manipulation which truncates the string but the encoded characters get truncated improperly - very much like in this question: 一旦进入Ruby,我就会调用一个gem进行一些操作,它会截断字符串,但编码的字符会被截断不正确 - 非常像这个问题:

How to get a Ruby substring of a Unicode string? 如何获取Unicode字符串的Ruby子字符串?

An example string is H pääsee syvemmälle A elämään - the umlauts get changed into things like \\30333 but when truncated this ends up as \\303 which is a problem. 一个示例字符串是HpääseesyvemmälleAelämään - 变音符号变为类似\\ 30333之类的东西但是当被截断时最终会变成\\ 303这是一个问题。

I don't want to hack the gem to get round this issue as I have tested with the same string opened directly in Ruby and it worked fine. 我不想破解gem以解决这个问题,因为我已经使用Ruby中直接打开的相同字符串进行了测试,并且它工作正常。

So I know that I'm passing in something incorrectly to Ruby. 所以我知道我把错误的东西传给了Ruby。

Here is how I turn the NSString into a VALUE to be used in Ruby. 以下是我将NSString转换为值以在Ruby中使用的方法。

- (VALUE) toRubyValue {
    size_t data_length = [self lengthOfBytesUsingEncoding:NSUTF8StringEncoding];
    size_t buffer_length = data_length + 1;
    char buf[buffer_length];
    [self getCString:buf maxLength:buffer_length encoding:NSUTF8StringEncoding];
    return rb_str_new(buf, data_length);
}

I'm on Ruby 1.8.7 我在Ruby 1.8.7上

What is the best way to address this problem - I'm happy to do it in either in Ruby or C (or Objective C) but I would rather not use any Ruby Gems that have native C extensions 解决这个问题的最佳方法是什么 - 我很高兴用Ruby或C(或Objective C)来做,但我宁愿不使用任何具有本机C扩展的Ruby Gems

I don't think you're passing something incorrectly to Ruby. 我不认为你把错误的东西传给了Ruby。 You are creating a UTF-8 encoded Ruby 1.8 string. 您正在创建一个UTF-8编码的Ruby 1.8字符串。 Ruby 1.8 doesn't care about encodings though and treats strings as arrays of bytes. Ruby 1.8并不关心编码,而是将字符串视为字节数组。 This means that any incorrect piece of Ruby code can produce the results you talk about. 这意味着任何不正确的Ruby代码都可以产生您所谈论的结果。 'Hacking' the gem is really your only option. '黑客'宝石真的是你唯一的选择。

And upgrading to 1.9 or even 2.0 your best way out. 并升级到1.9甚至2.0你最好的出路。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM