[英]Store a number as ASCII text in Java?
It's probably a stupid question but here's the thing. 这可能是一个愚蠢的问题,但这就是问题。 I was reading this question:
我在读这个问题:
Storing 1 million phone numbers 存储一百万个电话号码
and the accepted question was what I was thinking: using a trie. 被接受的问题是我在想什么:使用特里。 In the comments Matt Ball suggested:
Matt Ball在评论中建议:
I think storing the phone numbers as ASCII text and compressing is a very reasonable suggestion 我认为将电话号码存储为ASCII文本并进行压缩是一个非常合理的建议
Problem: how do I do that in Java? 问题:如何用Java做到这一点? And ASCII text does stand for String?
ASCII文本代表String吗?
For in-memory storage as indicated in the question: 对于问题中指出的内存中存储:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(
new GZIPOutputStream(baos), "US-ASCII");
for(String number : numbers){
out.write(number);
out.write('\n');
}
byte[] data = baos.toByteArray();
But as Pete remarked: this may be good for memory efficiency, but you can't really do anything with the data afterwards, so it's not really very useful. 但是正如Pete所说:这可能对提高内存效率有好处,但是之后您实际上不能对数据做任何事情,因此它并不是很有用。
Yes, ASCII means Strings in this case. 是的,在这种情况下,ASCII表示字符串。 You can store compressed data in Java using the java.util.zip.GZIPOutputStream .
您可以使用java.util.zip.GZIPOutputStream将压缩数据存储在Java 中 。
In answer to an implied, but different question; 回答一个隐含但又不同的问题;
Q: You have 1 billion phones numbers and you need to send these over a low bandwidth connection. 问:您有10亿个电话号码,您需要通过低带宽连接发送这些电话号码。 You only need to send whether the phone number is in the collection or not.
您仅需要发送电话号码是否在集合中。 (No other information required)
(无需其他信息)
A: This is the general approach 答:这是一般方法
You can use Strings in a sorted TreeMap. 您可以在排序的TreeMap中使用字符串。 One million numbers is not very much and will use about 64 MB.
一百万个数字不是很多,将使用约64 MB。 I don't see the need for a more complex solution.
我认为不需要更复杂的解决方案。
The latest version of Java can store ASCII text efficiently by using a byte[] instead of a char[] however, the overhead of your data structure is likely to be larger. Java的最新版本可以通过使用byte []而不是char []来有效地存储ASCII文本,但是,数据结构的开销可能会更大。
If you need to store a phone numbers as a key, you could store them with the assumption that large ranges will be continous. 如果您需要将电话号码存储为键,则可以在假设大范围连续的情况下存储它们。 As such you could store them like
这样,您可以像
NavigableMap<String, PhoneDetails[]>
In this structure, the key would define the start of the range and you could have a phone details for each number. 在此结构中,键将定义范围的开始,您可以为每个号码提供电话详细信息。 This could be not much bigger than the reference to the PhoneDetails (which is the minimum)
这可能不会比对PhoneDetails的引用大很多(这是最小值)
BTW: You can invent very efficient structures if you don't need access to the data. 顺便说一句:如果您不需要访问数据,则可以发明非常有效的结构。 If you never access the data, don't keep it in memory, in fact you can just discard it as it won't ever be needed.
如果您从不访问数据,请不要将其保存在内存中,实际上,您可以将其丢弃,因为将不再需要它们。
Alot depending on what you want to do with the data and why you have it in memory at all. 很多情况取决于您要处理数据以及为什么要在内存中保存数据。
You can Use DeflatorOutputStream to a ByteArrayOutputStream, which will be very small, but not very useful. 您可以使用DeflatorOutputStream到ByteArrayOutputStream,这将很小,但不是很有用。
I suggest using DeflatorOutputStream as its more light weight/faster/smaller than GZIPOutputStream. 我建议使用DeflatorOutputStream,因为它比GZIPOutputStream重量更轻/更快/更小。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.