简体   繁体   English

Delphi WideString和Delphi 2009+

[英]Delphi WideString and Delphi 2009+

I am writing a class that will save wide strings to a binary file. 我正在写一个将宽字符串保存到二进制文件的类。 I'm using Delphi 2005 for this but the app will later be ported to Delphi 2010. I'm feeling very unsure here, can someone confirm that: 我为此使用了Delphi 2005,但是该应用程序稍后将移植到Delphi2010。我对此不太确定,有人可以确认一下:

  1. A Delphi 2005 WideString is exactly the same type as a Delphi 2010 String Delphi 2005 WideString与Delphi 2010 String类型完全相同

  2. A Delphi 2005 WideString char as well as a Delphi 2010 String char is guaranteed to always be 2 bytes in size. Delphi 2005 WideString字符和Delphi 2010 String字符保证始终为2个字节。

With all the Unicode formats out there I don't want to be hit with one of the chars in my string suddenly being 3 bytes wide or something like that. 有了所有的Unicode格式后,我不想被字符串中的一个字符突然变成3个字节宽或类似的字符所打。

Edit: Found this: "I indeed said UnicodeString, not WideString. WideString still exists, and is unchanged. WideString is allocated by the Windows memory manager, and should be used for interacting with COM objects. WideString maps directly to the BSTR type in COM." 编辑:发现: “我确实说过UnicodeString,而不是WideString。WideString仍然存在,并且没有改变。WideString由Windows内存管理器分配,应该用于与COM对象进行交互。WideString直接映射到COM中的BSTR类型。 ”。 at http://www.micro-isv.asia/2008/08/get-ready-for-delphi-2009-and-unicode/ http://www.micro-isv.asia/2008/08/get-ready-for-delphi-2009-and-unicode/

Now I'm even more confused. 现在我更加困惑了。 So a Delphi 2010 WideString is not the same as a Delphi 2005 WideString ? 因此,Delphi 2010 WideString与Delphi 2005 WideString吗? Should I use UnicodeString instead? 我应该改用UnicodeString吗?

Edit 2: There's no UnicodeString type in Delphi 2005. FML. 编辑2: Delphi 2005中没有UnicodeString类型UnicodeString

For your first question: WideString is not exactly the same type as D2010's string . 对于第一个问题: WideString与D2010的string类型不完全相同。 WideString is the same COM BSTR type that it has always been. WideString与以前一直是相同的COM BSTR类型。 It's managed by Windows, with no reference counting, so it makes a copy of the whole BSTR every time you pass it somewhere. 它由Windows管理,没有引用计数,因此,每次将其传递到某个地方时,它都会复制整个BSTR。

UnicodeString , which is the default string type in D2009 and on, is basically a UTF-16 version of the AnsiString we all know and love. UnicodeString是D2009及更高版本中的默认字符串类型,基本上是我们都知道和喜欢的AnsiString的UTF-16版本。 It's got a reference count and is managed by the Delphi compiler. 它有一个参考计数,由Delphi编译器管理。

For the second, the default char type is now WideChar , which are the same chars that have always been used in WideString . 对于第二个,默认的char类型现在为WideChar ,这与WideString始终使用的字符相同。 It's a UTF-16 encoding, 2 bytes per char. 这是UTF-16编码,每个字符2个字节。 If you save WideString data to a file, you can load it into a UnicodeString without trouble. 如果将WideString数据保存到文件,则可以UnicodeString将其加载到UnicodeString The difference between the two types has to do with memory management, not the data format. 两种类型之间的差异与内存管理有关,与数据格式无关。

As others mentioned, string (actually UnicodeString) data type in Delphi 2009 and above is not equivalent to WideString data type in previous versions, but the data content format is the same. 就像其他人提到的那样,Delphi 2009及更高版本中的字符串(实际上是UnicodeString)数据类型与以前版本中的WideString数据类型不等效,但是数据内容格式相同。 Both of them save the string in UTF-16. 他们两个都将字符串保存在UTF-16中。 So if you save a text using WideString in earlier versions of Delphi, you should be able to read it correctly using string data type in the recent versions of Delphi (2009 and above). 因此,如果在早期版本的Delphi中使用WideString保存文本,则应该能够在最新版本的Delphi(2009及更高版本)中使用字符串数据类型正确读取文本。

You should take note that performance of UnicodeString is way superior than WideString. 您应注意,UnicodeString的性能要比WideString更好。 So if you are going to use the same source code in both Delphi 2005 and Delphi 2010, I suggest you use a string type alias with conditional compiling in your code, so that you can have the best of both worlds: 因此,如果您要在Delphi 2005和Delphi 2010中使用相同的源代码,建议您在代码中使用带条件编译的字符串类型别名,以使两者兼具。

type
  {$IFDEF Unicode}
  MyStringType = UnicodeString;
  {$ELSE}
  MyStringType = WideString;
  {$ENDIF}

Now you can use MyStringType as your string type in your source code. 现在,您可以在源代码中将MyStringType用作字符串类型。 If the compiler is Unicode (Delphi 2009 and above), then your string type would be an alias of UnicodeString type which is introduced in Delphi 2009 to hold Unicode strings. 如果编译器是Unicode(Delphi 2009及更高版本),则您的字符串类型将是UnicodeString类型的别名,Delphi 2009中引入了该别名以容纳Unicode字符串。 If the compiler is not unicode (eg Delphi 2005) then your string type would be an alias for the old WideString data type. 如果编译器不是unicode(例如Delphi 2005),则您的字符串类型将是旧的WideString数据类型的别名。 And since they both are UTF-16, data saved by any of the versions should be read by the other one correctly. 并且由于它们都是UTF-16,因此任何一个版本保存的数据都应由另一个版本正确读取。

  1. A Delphi 2005 WideString is exactly the same type as a Delphi 2010 String Delphi 2005 WideString与Delphi 2010 String的类型完全相同

That is not true - ex Delphi 2010 string has hidden internal codepage field - but probably it does not matter for you. 这是不正确的-ex Delphi 2010字符串具有隐藏的内部代码页字段-但可能对您而言并不重要。

  1. A Delphi 2005 WideString char as well as a Delphi 2010 String char is guaranteed to always be 2 bytes in size. Delphi 2005 WideString字符和Delphi 2010 String字符保证始终为2个字节。

That is true. 那是真实的。 In Delphi 2010 SizeOf(Char) = 2 (Char = WideChar). 在Delphi 2010中,SizeOf(Char)= 2(字符= WideChar)。


There cannot be different codepage for unicode strings - codepage field was introduced to create a common binary format for both Ansi strings (that need codepage field) and Unicode string (that don't need it). unicode字符串不能有不同的代码页-引入codepage字段可为Ansi字符串(需要codepage字段)和Unicode字符串(不需要)创建通用的二进制格式。

If you save WideString data to stream in Delphi 2005 and load the same data to string in Delphi 2010 all should work OK. 如果将WideString数据保存到Delphi 2005中的流中并在Delphi 2010中将相同的数据加载到字符串中,则一切正常。

WideString = BSTR and that is not changed between Delphi 2005 and 2010 WideString = BSTR,在Delphi 2005和2010之间不会更改

UnicodeString = WideString in Delphi 2005 (if UnicodeString type exists in Delphi 2005 - I don't know) UnicodeString = string in Delphi 2009 and above. UnicodeString = Delphi 2005中的WideString(如果Delphi 2005中存在UnicodeString类型-我不知道)UnicodeString = Delphi 2009及更高版本中的字符串。


@Marco - Ansi and Unicode strings in Delphi 2009+ have common binary format (12-byte header). @Marco-Delphi 2009+中的Ansi和Unicode字符串具有通用的二进制格式(12字节的标头)。

UnicodeString codepage CP_UTF16 = 1200; UnicodeString代码页CP_UTF16 = 1200;

The rule is simple: 规则很简单:

  • If you want to work with unicode strings inside your module only - use UnicodeString type (*). 如果只想在模块内部使用unicode字符串,请使用UnicodeString类型(*)。
  • If you want to communicate with COM or with other cross-module purposes - use WideString type. 如果要与COM或其他跨模块用途进行通信-请使用WideString类型。

You see, WideString is a special type, since it's not native Delphi type. 您会看到, WideString是一种特殊类型,因为它不是本机的Delphi类型。 It is an alias/wrapper for BSTR - a system string type, intendent for using with COM or cross-module communications. 它是BSTR的别名/包装器BSTR是系统字符串类型,打算与COM或跨模块通信一起使用。 Being a unicode - is just a side-effect. 成为unicode-只是一个副作用。

On the other hand, AnsiString and UnicodeString - are native Delphi types, which have no analog in other languages. 另一方面, AnsiStringUnicodeString是本地的Delphi类型,在其他语言中没有类似物。 String is just an alias for either AnsiString or UnicodeString . String只是AnsiStringUnicodeString的别名。

So, if you need to pass string to some other code - use WideString , otherwise - use either AnsiString or UnicodeString . 因此,如果需要将字符串传递给其他代码,请使用WideString ,否则请使用AnsiStringUnicodeString Simple. 简单。

PS 聚苯乙烯

(*) For old Delphi - just place (*)对于老德尔斐-只是地方

{$IFNDEF Unicode}

type
  UnicodeString = WideString;

{$ENDIF}

somewhere in your code. 您代码中的某个地方。 This fix will allow you to write the same code for any Delphi version. 此修复程序将使您可以为任何Delphi版本编写相同的代码。

While a D2010 char is always and exactly 2 bytes, the same character folding and combining issues are present in UTF-16 characters as in UTF-8 characters. 虽然D2010字符始终为2个字节,但UTF-16字符和UTF-8字符存在相同的字符折叠和合并问题。 You don't see this with narrow strings because they're codepage based, but with unicode strings it's possible (and in some situations common) to have affective but non-visible characters. 对于狭窄的字符串,您不会看到它,因为它们是基于代码页的,但是对于unicode字符串,则可能(在某些情况下很常见)具有情感但不可见的字符。 Examples include the byte order mark (BOM) at the start of a unicode file or stream, left to right/right to left indicator characters, and a huge range of combining accents. 示例包括unicode文件或流的开始处的字节顺序标记(BOM),从左到右/从右到左的指示符,以及各种重音符号。 This mostly affects questions of "how many pixels wide will this string be on the screen" and "how many letters are in this string" (as distinct from "how many chars are in this string"), but also means that you can't randomly chop characters out of a string and assume they're printable. 这主要会影响“此字符串在屏幕上将有多少像素宽”和“此字符串中有多少个字母”(不同于“此字符串中有多少个字符”)的问题,但是这也意味着您可以t从字符串中随机截取字符并假定它们是可打印的。 Operations like "remove the last letter from this word" become non-trivial and depend on the language in use. 诸如“删除该单词的最后一个字母”之类的操作变得很简单,并且取决于所使用的语言。

The question about "one of the chars in my string suddenly being 3 bytes long" reflects a little confustion about how UTF works. 关于“我字符串中的一个字符突然变成3个字节长”的问题反映了对UTF如何工作的一些困惑。 It's possible (and valid) to take three bytes in a UTF-8 string to represent one printable character, but each byte will be a valid UTF-8 character. 可以(有效)在UTF-8字符串中使用三个字节来表示一个可打印字符,但是每个字节将是一个有效的UTF-8字符。 Say, a letter plus two combining accents. 说,一个字母加上两个合并的重音。 You will not get a character in UTF-16 or UTF-32 being 3 bytes long, but it might be 6 bytes (or 12 bytes) long, if it's represented using three code points in UTF-16 or UTF-32. 您不会在UTF-16或UTF-32中得到一个3个字节长的字符,但是,如果使用UTF-16或UTF-32中的三个代码点表示一个字符,则它可能是6个字节(或12个字节)长。 Which brings us to normalisation (or not). 这将我们带入规范化(或不规范化)。

But provided you are only dealing with the strings as whole things, it's all very simple - you just take the string, write it to a file, then read it back in. You don't have to worry about the fine print of string display and manipulation, that's all handled by the operating system and libraries. 但是只要您仅将字符串作为一个整体来处理,这一切就非常简单-您只需将字符串,将其写入文件中,然后再读回即可。您不必担心字符串显示的精美印刷和操作,这些都由操作系统和库处理。 Strings.LoadFromFile(name) and Listbox.Items.Add(string) work exactly the same in D2010 as in D2007, the unicode stuff is all transparent to you as a programmer. Strings.LoadFromFile(name)和Listbox.Items.Add(string)在D2010中的工作原理与在D2007中完全相同,对于程序员来说,Unicode的内容对您完全透明。

I am writing a class that will save wide strings to a binary file. 我正在写一个将宽字符串保存到二进制文件的类。

When you write the class in D2005 you will be using Widestring When you migrate to D2010 Widestring will still be valid and work properly. 在D2005中编写类时,将使用Widestring。当您迁移到D2010时,Widestring将仍然有效并且可以正常工作。 Widestring in D2005 is the same as WideString in D2010. D2005中的Widestring与D2010中的WideString相同。

The fact that String=WideString in D2010 need not be considered since the compiler deals with those issues easily. 由于编译器可以轻松解决这些问题,因此无需考虑D2010中的String = WideString这一事实。

Your input routine to save with (AString: String) need only one line entering the proc 您要使用(AString:String)保存的输入例程只需一行输入proc

procedure SaveAStringToBIN_File(AString:String);
var wkstr : Widestring;
begin
{$IFDEF Unicode}  wkstr := AString;      
{$ELSE}           wkstr := UTF8Decode(AString);   {$ENDIF}
...
   the rest is the same saving a widestring to a file stream
  write the length (word) of string then data 

end;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM