简体   繁体   English

我不了解计算/ javascript的字符串到字节方面

[英]I don't understand the string to byte aspect of computing/javascript

After two years coming back to this/another topic where I see people discussing the same; 两年之后,我又回到了这个话题。 I still don't understand what is going on. 我还是不明白发生了什么。

following this SO post: 遵循此SO帖子:

String length in bytes in JavaScript JavaScript中的字符串长度(以字节为单位)

I want to understand this part of javascript! 我想了解这部分javascript! I am also interested in calculating the kb size of a bitcoin transaction before I push it to the blockchain. 我也有兴趣在将比特币交易推入区块链之前计算其比特币大小。 The more important of the two though is that I finally understand what these users are doing because its come up more than once and I just don't get it! 不过,这两者中最重要的是,我最终了解了这些用户在做什么,因为它不止一次出现,而我却一无所获!

I've tried three of the functions outlined as answers but they all seem to do nothing more than return the string.length whereas I would expect them to return a different value (the overhead of the string in bytes/kilobytes/megabytes) 我已经尝试了三个作为答案列出的函数,但是它们似乎都只返回了string.length而已,而我希望它们返回一个不同的值(字符串的开销(以字节/千字节/兆字节为单位))

function byteCount(s) {
    return encodeURI(s).split(/%..|./).length - 1;
    }

console.log(byteCount('hello'),'hello'.length);//5,5


function getLengthInBytes(str) {
    var b = str.match(/[^\x00-\xff]/g);
    return (str.length + (!b ? 0: b.length)); 
    }

console.log(getLengthInBytes('hello'),'hello'.length);//5,5


console.log((new TextEncoder('utf-8').encode('hello')).length,'hello'.length);//5,5

It's annoying that this makes no sense to me! 令人讨厌的是,这对我来说毫无意义! Clearly these people would not be talking about how to get something that they can easily get with string.length so what are they trying and succeeding in returning? 显然,这些人不会在谈论如何获得可以通过string.length轻松获得的东西, 那么他们在尝试什么并成功返回?

Should the string instead be binary? 字符串应该改为二进制吗? (like so: How to convert text to binary code in JavaScript? ) (例如: 如何在JavaScript中将文本转换为二进制代码?

You are testing with the base ascii characters (well, they are utf8, but you can think of them a little like ascii and these characters work very similarly in both encodings). 您正在测试基本的ascii字符(嗯,它们是utf8,但是您可以想到它们有点像ascii,并且这些字符在两种编码中的作用都非常相似)。 Try with an extended character. 尝试使用扩展字符。

 console.log((new TextEncoder('utf-8').encode('😁')).length, '😁'.length); 

There are a lot of different signs in the world. 世界上有很多不同的迹象。 They dont fit in one byte of data. 它们不适合一个字节的数据。 Thats why some chars use more than one byte of data. 这就是为什么某些字符使用多个字节的数据的原因。 Some examples: "Äüöôś" 例如:“Äüööś”

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM