简体   繁体   English

如何加速我的表转换算法?

[英]How to speedup my table conversion algorithm?

I have a task to convert string tables from one format to another.我有一项将字符串表从一种格式转换为另一种格式的任务。 在此处输入图片说明

I use this class to convert table:我使用这个类来转换表:

import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;

class TableConverter
{
    public String[] Entities; //here I store descriptive fields like DescField1, DescField2...
    public ArrayList<String> ConvertedList; //here I store converted table rows as separate string lines 

    public TableConverter(ArrayList<String> lines) //we receive table rows as separate string lines
    {
        String[] splitted_first_line = lines.get(0).split("\t"); //split first row to get descriptive fields
        this.Entities = new String[splitted_first_line.length - 2]; //allocate size to hold all descriptive fields. -2 because last two columns is Date and Total
        System.arraycopy(splitted_first_line, 0, this.Entities, 0, this.Entities.length); //copy descriptive fields into my arr     

        //--

        int lines_sz = lines.size(); //save lines size to not recalculate it every iteration 
        Map<String, Integer> k_d_map = new HashMap<String, Integer>(); //map to store indecies of every Date column

        for (int i = 1; i < lines_sz; i++)
        {
            if (lines.get(i).isEmpty())
                continue;

            String[] splitted_line = lines.get(i).split("\t"); //splitted line on values    

            if (!k_d_map.containsKey(splitted_line[splitted_line.length - 2])) //if my map does not contain such date
                k_d_map.put(splitted_line[splitted_line.length - 2], 0); //then add it
        }

        String[] known_dates = k_d_map.keySet().toArray(new String[k_d_map.size()]);
        SortStrDates(known_dates); //I sort dates by ASC 
        k_d_map.clear(); //clear map to fill it again with correct indecies

        for (int i = 0; i < known_dates.length; i++) //refilling map and now we know every date index
            k_d_map.put(known_dates[i], i);

        //--

        Map<String, EntitySales> ESs_map = new HashMap<String, EntitySales>(); //map for rows

        for (int i = 1; i < lines_sz; i++)
        {
            if (lines.get(i).isEmpty())
                continue;

            String[] splitted_line = lines.get(i).split("\t"); //split row  
            String curr_entity = GetEntityFromLine(splitted_line); //I get set of descriptive fields separated by \t. It looks like this: asd\tqwe\t...\tzxc
            int dti = k_d_map.get(splitted_line[splitted_line.length - 2]); //I get date column index for Date stored in this row (if it was 02.2017 then index will be 0) 

            if (ESs_map.containsKey(curr_entity)) //I check if map contains row with such descriptive fields set
                ESs_map.get(curr_entity).SalesAmounts[dti] = splitted_line[splitted_line.length - 1]; //if contains, we set sale amount at date index (set 5 to 02.2017 column for example)
            else
            {
                EntitySales es = new EntitySales(curr_entity, known_dates.length); //else we create new object to hold row          
                es.SalesAmounts[dti] = splitted_line[splitted_line.length - 1]; //set sales amount at date
                ESs_map.put(curr_entity, es); //and add to map
            }
        }

        //--

        String first_row = ""; //here and below I build first row text representation, I add stored DescFields and unique dates
        this.ConvertedList = new ArrayList<String>();               

        for (int i = 0; i < this.Entities.length; i++)
            first_row += this.Entities[i] + "\t";

        for (int i = 0; i < known_dates.length; i++)
            first_row += i < known_dates.length - 1 ? known_dates[i] + "\t" : known_dates[i];

        this.ConvertedList.add(first_row);

        //--

        for (EntitySales es : ESs_map.values()) //Here I get rows as separate lines 
            this.ConvertedList.add(es.GetAsLine());
    }

    public String GetEntityFromLine(String[] line)
    {
        String[] entities = new String[line.length - 2];
        System.arraycopy(line, 0, entities, 0, entities.length);

        String entity = "";

        for (int i = 0; i < entities.length; i++)
            entity += i < entities.length - 1 ? entities[i] + "\t" : entities[i];

        return entity;
    }

    public void SortStrDates(String[] dates)
    {
        for (int i = 0; i < dates.length; i++)
            for (int j = i + 1; j < dates.length; j++)
            {
                Date dt_i = MyJunk.ConvertStrToDate(dates[i]);
                Date dt_j = MyJunk.ConvertStrToDate(dates[j]);

                if (dt_j.before(dt_i))
                {
                    String temp_i = dates[i];
                    dates[i] = dates[j];
                    dates[j] = temp_i;
                }
            }
    }
}

class EntitySales
{
    public String Entity;
    public String[] SalesAmounts;

    public EntitySales(String entity, int sales_amounts_size)
    {
        this.Entity = entity;
        this.SalesAmounts = new String[sales_amounts_size];
    }

    public String GetAsLine()
    {
        String line = this.Entity + "\t";

        for (int i = 0; i < this.SalesAmounts.length; i++)
        {
            String val = this.SalesAmounts[i] == null || this.SalesAmounts[i].isEmpty() ? "0" : this.SalesAmounts[i];
            line += i < this.SalesAmounts.length - 1 ? val + "\t" : val;
        }

        return line;
    }
}

It works, but it ultimate slow with huge tables.它有效,但它最终会因大桌子而变慢。 I waited for 1 hour and 20 minutes to convert 800k rows table and cancelled task.我等了 1 小时 20 分钟来转换 800k 行表并取消任务。 200k rows gets converted just in 3 minutes. 200k 行在 3 分钟内被转换。 I don't know why I got such slowdown, but the question is how to speedup my algorithm a lot?我不知道为什么我会这么慢,但问题是如何大大加快我的算法? I tried to assign Integer values to every set of descriptive fields (asd\\tqwe\\t...\\tzxc -> 0, something\\telse -> 1) and compare that integers without Maps , but it was only slower.我尝试将Integer值分配给每组描述性字段 (asd\\tqwe\\t...\\tzxc -> 0, something\\telse -> 1) 并在没有Maps情况下比较这些整数,但它只是更慢。

Whereas you could improve your overall algorithm, the primary slowdown is probably in your GetAsLine function:虽然您可以改进整体算法,但主要的减速可能在您的GetAsLine函数中:

public String GetAsLine()
{
    String line = this.Entity + "\t";

    for (int i = 0; i < this.SalesAmounts.length; i++)
    {
        String val = this.SalesAmounts[i] == null || this.SalesAmounts[i].isEmpty() ? "0" : this.SalesAmounts[i];
        line += i < this.SalesAmounts.length - 1 ? val + "\t" : val;
    }

    return line;
}

Here, you're using string concatenation in a loop to build your key.在这里,您在循环中使用字符串连接来构建您的密钥。 This is highly inefficient because it allocates a new string every time through the loop.这是非常低效的,因为它每次通过循环都会分配一个新字符串。 That involves allocating memory for the new string and copying the existing string to the new string.这涉及为新字符串分配内存并将现有字符串复制到新字符串。 Your garbage collector get lots of exercise.你的垃圾收集器得到了很多锻炼。

To improve this, what you want to do is create a StringBuilder , and construct the string in there:为了改善这一点,您要做的是创建一个StringBuilder ,并在其中构造字符串:

StringBuilder line = new StringBuilder();
for (int i = 0; i < this.SalesAmounts.length; i++)
{
    String val = this.SalesAmounts[i] == null || this.SalesAmounts[i].isEmpty() ? "0" : this.SalesAmounts[i];
    line.append(val+"\t");
}
// remove final tab character
line.remove(line.length()-1, line.length()-1);

return line.toString();

The reason this is faster is because StringBuilder doesn't create a new string every time you append something.之所以更快,是因为StringBuilder不会在每次追加内容时创建新字符串。 So you do a whole lot less copying of strings.所以你对字符串的复制要少得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM