简体   繁体   English

如何查找在特定范围内的字符串数量FAST?

[英]How to find the number of strings that are in the particular range FAST?

I was given a question: 我收到一个问题:

Given a list of strings, and a list of queries: START and END, both of which are strings. 给定一个字符串列表和一个查询列表:START和END,它们都是字符串。 I have to find the number of strings that are in the range of [START, END) 我必须找到[START,END)范围内的字符串数

For example: a list of strings: A, AA, AB, CD, ZS, XYZ a list of queries: 例如:字符串列表: A, AA, AB, CD, ZS, XYZ查询列表:

A, AA
A, CC
AB, ZZ
AC, CD

The output should be: 输出应为:

1
3
3
0

The way I approach this problem is that: while iterating through the list of strings, I create an AVL tree by inserting new string one by one. 解决这个问题的方法是:在遍历字符串列表时,我通过一一插入新字符串来创建AVL树。 (At first, I used unbalanced BST but I got Time Limit.) When doing the comparison, I use compareTo function in java String. (起初,我使用不平衡的BST,但有时间限制。)进行比较时,我在Java String中使用compareTo函数。

After creating the AVL tree, I run the query that counts from [start, end). 创建AVL树后,我运行从[start,end)开始计数的查询。 My method is that 我的方法是

1. let v = root.

2. if v==null -> return 0 

   else if v.value < start -> count(v.right)

   else if v.value >= end -> count(v.left)

   else 1 + count(v.right) + count(v.left)

However, I still got time limit pernalty :( 但是,我仍然有时间限制:(

Therefore, I change method by creating hash function by hashing into double and instead of using compareTo, I compared the hash value instead. 因此,我通过将哈希散列为double来创建哈希函数来更改方法,而不是使用compareTo来比较哈希值。

But, I still got time limit! 但是,我还有时间限制!

So, I store the value of subtree size into each vertex, and instead of using count or the time, I add more conditional statements, some of which can use the size of the subtree instead of calling count function recursively. 因此,我将子树大小的值存储到每个顶点中,而不是使用count或time,而是添加更多条件语句,其中一些条件语句可以使用子树的大小,而不是递归调用count函数。

Any suggestion to me to get it run in a particular time? 我有什么建议可以在特定时间运行吗? :\\ :\\

Use an order statistic tree: http://en.wikipedia.org/wiki/Order_statistic_tree 使用订单统计树: http : //en.wikipedia.org/wiki/Order_statistic_tree

It is basically a modified balanced bst where each node stores the size of its subtree, which allows you to answer queries about how many items there are before a given item in log(n) time. 它基本上是一种修改过的平衡bst,其中每个节点存储其子树的大小,这使您可以回答在log(n)时间中给定项目之前有多少项目的查询。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM