简体   繁体   English

在Java中将数字存储为ASCII文本吗?

[英]Store a number as ASCII text in Java?

It's probably a stupid question but here's the thing. 这可能是一个愚蠢的问题,但这就是问题。 I was reading this question: 我在读这个问题:

Storing 1 million phone numbers 存储一百万个电话号码

and the accepted question was what I was thinking: using a trie. 被接受的问题是我在想什么:使用特里。 In the comments Matt Ball suggested: Matt Ball在评论中建议:

I think storing the phone numbers as ASCII text and compressing is a very reasonable suggestion 我认为将电话号码存储为ASCII文本并进行压缩是一个非常合理的建议

Problem: how do I do that in Java? 问题:如何用Java做到这一点? And ASCII text does stand for String? ASCII文本代表String吗?

For in-memory storage as indicated in the question: 对于问题中指出的内存中存储:

ByteArrayOutputStream baos = new ByteArrayOutputStream();
OutputStreamWriter out = new OutputStreamWriter(
    new GZIPOutputStream(baos), "US-ASCII");
for(String number : numbers){
    out.write(number);
    out.write('\n');
}
byte[] data = baos.toByteArray();

But as Pete remarked: this may be good for memory efficiency, but you can't really do anything with the data afterwards, so it's not really very useful. 但是正如Pete所说:这可能对提高内存效率有好处,但是之后您实际上不能对数据任何事情,因此它并不是很有用。

Yes, ASCII means Strings in this case. 是的,在这种情况下,ASCII表示字符串。 You can store compressed data in Java using the java.util.zip.GZIPOutputStream . 您可以使用java.util.zip.GZIPOutputStream将压缩数据存储在Java

In answer to an implied, but different question; 回答一个隐含但又不同的问题;

Q: You have 1 billion phones numbers and you need to send these over a low bandwidth connection. 问:您有10亿个电话号码,您需要通过低带宽连接发送这些电话号码。 You only need to send whether the phone number is in the collection or not. 您仅需要发送电话号码是否在集合中。 (No other information required) (无需其他信息)

A: This is the general approach 答:这是一般方法

  • First sort the list if its not sorted already. 如果列表尚未排序,请先对其进行排序。
  • From the lowest number find regions of continuous numbers. 从最低编号中查找连续编号的区域。 Send the start of the region and the phones which are taken. 发送该地区的起点和所接的电话。 This can be stored a BitSet (1-bit per possible number) Send the phone number at the start and the BitSet whenever the gap is more than some threshold. 可以存储一个BitSet(每个可能的数字1位),在开始时发送电话号码,每当差距超过某个阈值时发送BitSet。
  • Write the stream to a compressed data set. 将流写入压缩的数据集。
  • Test this to compare with a simple sending of all numbers. 进行测试以与所有数字的简单发送进行比较。

You can use Strings in a sorted TreeMap. 您可以在排序的TreeMap中使用字符串。 One million numbers is not very much and will use about 64 MB. 一百万个数字不是很多,将使用约64 MB。 I don't see the need for a more complex solution. 我认为不需要更复杂的解决方案。

The latest version of Java can store ASCII text efficiently by using a byte[] instead of a char[] however, the overhead of your data structure is likely to be larger. Java的最新版本可以通过使用byte []而不是char []来有效地存储ASCII文本,但是,数据结构的开销可能会更大。

If you need to store a phone numbers as a key, you could store them with the assumption that large ranges will be continous. 如果您需要将电话号码存储为键,则可以在假设大范围连续的情况下存储它们。 As such you could store them like 这样,您可以像

NavigableMap<String, PhoneDetails[]>

In this structure, the key would define the start of the range and you could have a phone details for each number. 在此结构中,键将定义范围的开始,您可以为每个号码提供电话详细信息。 This could be not much bigger than the reference to the PhoneDetails (which is the minimum) 这可能不会比对PhoneDetails的引用大很多(这是最小值)

BTW: You can invent very efficient structures if you don't need access to the data. 顺便说一句:如果您不需要访问数据,则可以发明非常有效的结构。 If you never access the data, don't keep it in memory, in fact you can just discard it as it won't ever be needed. 如果您从不访问数据,请不要将其保存在内存中,实际上,您可以将其丢弃,因为将不再需要它们。


Alot depending on what you want to do with the data and why you have it in memory at all. 很多情况取决于您要处理数据以及为什么要在内存中保存数据。

You can Use DeflatorOutputStream to a ByteArrayOutputStream, which will be very small, but not very useful. 您可以使用DeflatorOutputStream到ByteArrayOutputStream,这将很小,但不是很有用。

I suggest using DeflatorOutputStream as its more light weight/faster/smaller than GZIPOutputStream. 我建议使用DeflatorOutputStream,因为它比GZIPOutputStream重量更轻/更快/更小。

Java字符串默认为UTF-8编码 ,如果要操作ASCII文本,则必须更改编码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM