如何查找在特定范围内的字符串数量FAST？

Question

I was given a question: 我收到一个问题：

Given a list of strings, and a list of queries: START and END, both of which are strings. 给定一个字符串列表和一个查询列表：START和END，它们都是字符串。 I have to find the number of strings that are in the range of [START, END) 我必须找到[START，END）范围内的字符串数

For example: a list of strings: A, AA, AB, CD, ZS, XYZ a list of queries: 例如：字符串列表： A, AA, AB, CD, ZS, XYZ查询列表：

A, AA
A, CC
AB, ZZ
AC, CD

The output should be: 输出应为：

The way I approach this problem is that: while iterating through the list of strings, I create an AVL tree by inserting new string one by one. 解决这个问题的方法是：在遍历字符串列表时，我通过一一插入新字符串来创建AVL树。 (At first, I used unbalanced BST but I got Time Limit.) When doing the comparison, I use compareTo function in java String. （起初，我使用不平衡的BST，但有时间限制。）进行比较时，我在Java String中使用compareTo函数。

After creating the AVL tree, I run the query that counts from [start, end). 创建AVL树后，我运行从[start，end）开始计数的查询。 My method is that 我的方法是

1. let v = root.

2. if v==null -> return 0 

   else if v.value < start -> count(v.right)

   else if v.value >= end -> count(v.left)

   else 1 + count(v.right) + count(v.left)

However, I still got time limit pernalty :( 但是，我仍然有时间限制：(

Therefore, I change method by creating hash function by hashing into double and instead of using compareTo, I compared the hash value instead. 因此，我通过将哈希散列为double来创建哈希函数来更改方法，而不是使用compareTo来比较哈希值。

But, I still got time limit! 但是，我还有时间限制！

So, I store the value of subtree size into each vertex, and instead of using count or the time, I add more conditional statements, some of which can use the size of the subtree instead of calling count function recursively. 因此，我将子树大小的值存储到每个顶点中，而不是使用count或time，而是添加更多条件语句，其中一些条件语句可以使用子树的大小，而不是递归调用count函数。

Any suggestion to me to get it run in a particular time? 我有什么建议可以在特定时间运行吗？ :\\ ：\\

Answer 1

Use an order statistic tree: http://en.wikipedia.org/wiki/Order_statistic_tree 使用订单统计树： http : //en.wikipedia.org/wiki/Order_statistic_tree

It is basically a modified balanced bst where each node stores the size of its subtree, which allows you to answer queries about how many items there are before a given item in log(n) time. 它基本上是一种修改过的平衡bst，其中每个节点存储其子树的大小，这使您可以回答在log（n）时间中给定项目之前有多少项目的查询。

如何查找在特定范围内的字符串数量FAST？

问题描述

1 个解决方案

解决方案1
0 2013-08-24 09:39:27

如何查找在特定范围内的字符串数量FAST？

问题描述

1 个解决方案

解决方案1 0 2013-08-24 09:39:27

解决方案1
0 2013-08-24 09:39:27