简体   繁体   English

是否可以读取实习生池中的所有字符串?

[英]Is it possible to read all strings in the intern pool?

It's well known that in some certain cases when using strings in C#, the CLR does string interning as an optimization.众所周知,在某些特定情况下,在 C# 中使用字符串时,CLR 会将字符串驻留作为优化。

So my questions are:所以我的问题是:

  • It possible to read all the strings that are currently in the intern pool?是否可以读取当前在实习池中的所有字符串?
  • Is there a way to get a reference count to each interned string?有没有办法获得每个实习字符串的引用计数?
  • Would it be possible to read the intern pool from a separate process space?是否可以从单独的进程空间读取实习生池?
  • If none of these are possible, what's the reasoning for not allowing these use cases?如果这些都不可能,那么不允许这些用例的原因是什么?

I could see this being somewhat useful when monitoring memory usage in certain cases.在某些情况下监视内存使用情况时,我可以看到这有点有用。 It may also be useful when working with sensitive information (although I would think SecureString would be more preferable in many scenarios).在处理敏感信息时它也可能很有用(尽管我认为SecureString在许多情况下更可取)。

As far as I can tell, the only public methods related to string interning are String.Intern(string) and String.IsInterned(string)据我所知,与字符串实习相关的唯一公共方法是String.Intern(string)String.IsInterned(string)

I'm asking out of curiosity, not trying to solve a real problem.我是出于好奇而询问,而不是试图解决真正的问题。 I realize that doing any logic based off of the string intern pool would be a bad idea.我意识到根据字符串实习池执行任何逻辑都是一个坏主意。

Looking up the interned strings via code has no use case so it's feature was not added in to the language.通过代码查找实习字符串没有用例,因此它的功能没有添加到语言中。

However looking up the strings in memory while debugging a program is a very common use case, and there are tools to do that.然而,在调试程序时在内存中查找字符串是一个非常常见的用例,有一些工具可以做到这一点。

You will need to use the tool WinDbg.exe that comes with the Windows SDK.您将需要使用 Windows SDK 附带的工具WinDbg.exe After launching it and attaching it to your program you do the command启动它并将其附加到您的程序后,您执行命令

.loadby sos clr

and that will load in the extensions for debugging .NET apps.这将加载到用于调试 .NET 应用程序的扩展中。 Once you have done that you can do the command完成后,您可以执行命令

!DumpHeap -strings

and you can see all string objects in the heap.你可以看到堆中的所有字符串对象。

As for telling if the object in that list that you are looking at is interned or not, I am not entirely sure how.至于告诉您正在查看的列表中的对象是否被拘留,我不完全确定如何。 Hopefully if you ask a new question about WinDbg and how to tell if a string is interned or not someone may be able to answer.希望如果你问一个关于 WinDbg 的新问题以及如何判断一个字符串是否被实习,有人可能会回答。

You can analyze the strings and duplicates which make sense to intern with MemAnalyzer which is based on ClrMD .您可以分析串并复制其意义与MemAnalyzer实习生是基于ClrMD。

https://github.com/Alois-xx/MemAnalyzer https://github.com/Alois-xx/MemAnalyzer

C>MemAnalyzer.exe -dstrings -f 50KStringsx64.dmp

    Strings(Count)  Waste(Bytes)    String
    500             20,958          String 0
    500             20,958          String 1
    500             20,958          String 2
    500             20,958          String 3
    500             20,958          String 4
    500             20,958          String 5

Summary
==========================================
Strings                       61,330 count
Allocated Size             2,529,742 bytes
Waste Duplicate Strings    2,515,898 bytes

This will give you a metric how many duplicate strings you have and which of them might make sense to intern.这将为您提供一个指标,您有多少重复的字符串以及其中哪些可能对实习有意义。 To find out which object references a specific string you can add要找出哪个对象引用了特定字符串,您可以添加

-showAddress -showAddress

to show the first address of each string which might be worth interning.显示可能值得实习的每个字符串的第一个地址。 Then you can use Windbg and !GCRoot address to find out which object hold this string which should give you an idea in which class you need to add String.Intern calls.然后你可以使用 Windbg 和 !GCRoot 地址来找出保存这个字符串的对象,这应该会让你知道你需要在哪个类中添加 String.Intern 调用。

Please note that the .NET String.Intern pool will never release references.请注意,.NET String.Intern 池永远不会释放引用。 If you are dealing with large datasets with different content you should use your own Dictionary pool to be able to release all interned strings when you unload the current dataset and load the next one.如果您正在处理具有不同内容的大型数据集,您应该使用自己的字典池,以便在卸载当前数据集并加载下一个数据集时能够释放所有内部字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM