Java：使用JaxB到XML进行编组，如何正确使用多线程

Question

I am trying to take a very long file of strings and convert it to an XML according to a schema I was given. 我正在尝试获取一个很长的字符串文件，并根据给出的模式将其转换为XML。 I used jaxB to create classes from that schema. 我使用jaxB从该架构创建类。 Since the file is very large I created a thread pool to improve the performance but since then it only processes one line of the file and marshalls it to the XML file, per thread. 由于文件很大，因此我创建了一个线程池来提高性能，但是从那时起，它仅处理文件的一行，并且每个线程将其编组为XML文件。

Below is my home class where I read from the file. 以下是我从文件中读取的家庭课程。 Each line is a record of a transaction, for every new user encountered a list is made to store all of that users transactions and each list is put into a HashMap. 每行都是交易的记录，对于每个遇到的新用户，都会创建一个列表来存储所有该用户的交易，并将每个列表放入HashMap中。 I made it a ConcurrentHashMap because multiple threads will work on the map simultaneously, is this the correct thing to do? 我将其设为ConcurrentHashMap，因为多个线程将同时在地图上工作，这是正确的做法吗？

After the lists are created a thread is made for each user. 创建列表后，将为每个用户创建一个线程。 Each thread runs the method ProcessCommands below and receives from home the list of transactions for its user. 每个线程运行下面的ProcessCommands方法，并从家中接收其用户的事务列表。

public class home{
  public static File XMLFile = new File("LogFile.xml");
  Map<String,List<String>> UserMap= new ConcurrentHashMap<String,List<String>>();
  String[] UserNames =  new String[5000];
    int numberOfUsers = 0;
    try{
        BufferedReader reader = new BufferedReader(new FileReader("test.txt"));
            String line;
            while ((line = reader.readLine()) != null)
            {
                parsed = line.split(",|\\s+");
                if(!parsed[2].equals("./testLOG")){
                    if(Utilities.checkUserExists(parsed[2], UserNames) == false){ //User does not already exist
                        System.out.println("New User: " + parsed[2]);
                        UserMap.put(parsed[2],new ArrayList<String>());         //Create list of transactions for new user
                        UserMap.get(parsed[2]).add(line);                       //Add First Item to new list
                        UserNames[numberOfUsers] = parsed[2];                   //Add new user
                        numberOfUsers++;
                    }
                    else{                                                           //User Already Existed
                        UserMap.get(parsed[2]).add(line);
                    }
                }
            }
            reader.close();
    } catch (IOException x) {
        System.err.println(x);
    }

    //get start time
    long startTime = new Date().getTime();
    tCount = numberOfUsers;
    ExecutorService threadPool = Executors.newFixedThreadPool(tCount);
    for(int i = 0; i < numberOfUsers; i++){
        System.out.println("Starting Thread " + i + " for user " + UserNames[i]);
        Runnable worker = new ProcessCommands(UserMap.get(UserNames[i]),UserNames[i], XMLfile);
        threadPool.execute(worker);
    }
    threadPool.shutdown();
    while(!threadPool.isTerminated()){

    }
    System.out.println("Finished all threads");

}

Here is the ProcessCommands class. 这是ProcessCommands类。 The thread receives the list for its user and creates a marshaller. 该线程为其用户接收列表并创建一个编组器。 From what I unserstand marshalling is not thread safe so it is best to create one for each thread, is this the best way to do that? 据我了解，编组不是线程安全的，因此最好为每个线程创建一个，这是最好的方法吗？

When I create the marshallers I know that each from (from each thread) will want to access the created file causing conflicts, I used synchronized, is that correct? 当我创建编组器时，我知道每个线程（每个线程）都将要访问所创建的文件，从而导致冲突，我使用了同步，是否正确？

As the thread iterates through it's list, each line calls for a certain case. 当线程遍历列表时，每一行都会调用某种情况。 There are a lot so I just made pseudo-cases for clarity. 有很多东西，所以为了清楚起见，我只做了伪案例。 Each case calls the function below. 每种情况都调用下面的函数。

public class ProcessCommands implements Runnable{
private static final boolean DEBUG = false;
private List<String> list = null;
private String threadName;
private File XMLfile = null;
public Thread myThread;


public ProcessCommands(List<String> list, String threadName, File XMLfile){
    this.list = list;
    this.threadName = threadName;
    this.XMLfile = XMLfile;
}

public void run(){
    Date start = null;
    int transactionNumber = 0;
    String[] parsed = new String[8];
    String[] quoteParsed = null;
    String[] universalFormatCommand = new String[9];
    String userCommand = null;
    Connection connection = null;
    Statement stmt = null;
    Map<String, UserObject> usersMap = null;
    Map<String, Stack<BLO>> buyMap = null;
    Map<String, Stack<SLO>> sellMap = null;
    Map<String, QLO> stockCodeMap = null;
    Map<String, BTO> buyTriggerMap = null;
    Map<String, STO> sellTriggerMap = null;
    Map<String, USO> usersStocksMap = null;
    String SQL = null;
    int amountToAdd = 0;
    int tempDollars = 0;
    UserObject tempUO = null;
    BLO tempBLO = null;
    SLO tempSLO = null;
    Stack<BLO> tempStBLO = null;
    Stack<SLO> tempStSLO = null;
    BTO tempBTO = null;
    STO tempSTO = null;
    USO tempUSO = null;
    QLO tempQLO = null;
    String stockCode = null;
    String quoteResponse = null;
    int usersDollars = 0;
    int dollarAmountToBuy = 0;
    int dollarAmountToSell = 0;
    int numberOfSharesToBuy = 0;
    int numberOfSharesToSell = 0;
    int quoteStockInDollars = 0;
    int shares = 0;
    Iterator<String> itr = null;

    int transactionCount = list.size();
    System.out.println("Starting "+threadName+" - listSize = "+transactionCount);

    //UO dollars, reserved
    usersMap  = new HashMap<String, UserObject>(3);  //userName -> UO

    //USO shares
    usersStocksMap = new HashMap<String, USO>(); //userName+stockCode -> shares

    //BLO code, timestamp, dollarAmountToBuy, stockPriceInDollars
    buyMap = new HashMap<String, Stack<BLO>>();  //userName -> Stack<BLO>

    //SLO code, timestamp, dollarAmountToSell, stockPriceInDollars
    sellMap = new HashMap<String, Stack<SLO>>();  //userName -> Stack<SLO>

    //BTO code, timestamp, dollarAmountToBuy, stockPriceInDollars
    buyTriggerMap = new ConcurrentHashMap<String, BTO>();  //userName+stockCode -> BTO

    //STO code, timestamp, dollarAmountToBuy, stockPriceInDollars
    sellTriggerMap = new HashMap<String, STO>();  //userName+stockCode -> STO

    //QLO timestamp, stockPriceInDollars
    stockCodeMap = new HashMap<String, QLO>();  //stockCode -> QLO



    //create user object and initialize stacks
    usersMap.put(threadName, new UserObject(0, 0));
    buyMap.put(threadName, new Stack<BLO>());
    sellMap.put(threadName, new Stack<SLO>());
    try {
        //Marshaller marshaller = getMarshaller();
        synchronized (this){
            Marshaller marshaller = init.jc.createMarshaller();
            marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
            marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true);
            marshaller.marshal(LogServer.Root,XMLfile);
            marshaller.marshal(LogServer.Root,System.out);
        }
    } catch (JAXBException M) {
        M.printStackTrace();
    }

    Date timing = new Date();
    //universalFormatCommand = new String[8];
    parsed = new String[8];
    //iterate through workload file
    itr = this.list.iterator();
    while(itr.hasNext()){
        userCommand = (String) itr.next(); 
        itr.remove();
        parsed = userCommand.split(",|\\s+");
        transactionNumber = Integer.parseInt(parsed[0].replaceAll("\\[", "").replaceAll("\\]", ""));
        universalFormatCommand = Utilities.FormatCommand(parsed, parsed[0]);
        if(transactionNumber % 100 == 0){
            System.out.println(this.threadName + " - " +transactionNumber+ " - "+(new Date().getTime() - timing.getTime())/1000);
        }
        /*System.out.print("UserCommand " +transactionNumber + ": ");
        for(int i = 0;i<8;i++)System.out.print(universalFormatCommand[i]+ " ");
        System.out.print("\n");*/
        //switch for user command
        switch (parsed[1].toLowerCase()) {

        case "One"
            *Do Stuff"
            LogServer.create_Log(universalFormatCommand, transactionNumber, CommandType.ADD);
            break;
        case "Two"
            *Do Stuff"
            LogServer.create_Log(universalFormatCommand, transactionNumber, CommandType.ADD);
            break;
        }
     }
  }

The function create_Log has multiple cases so as before, for clarity I just left one. 函数create_Log具有多种情况，与以前一样，为清楚起见，我只剩下一种情况。 The case "QUOTE" only calls one object creation function but other other cases can create multiple objects. 情况“ QUOTE”仅调用一个对象创建函数，而其他情况下可以创建多个对象。 The type 'log' is a complex XML type that defines all the other object types so in each call to create_Log I create a log type called Root. “ log”类型是一种复杂的XML类型，定义了所有其他对象类型，因此在对create_Log的每次调用中，我都会创建一个称为Root的日志类型。 The class 'log' generated by JaxB included a function to create a list of objects. JaxB生成的类“ log”包括一个用于创建对象列表的函数。 The statement: 该声明：

Root.getUserCommandOrQuoteServerOrAccountTransaction().add(quote_QuoteType);

takes the root element I created, creates a list and adds the newly created object 'quote_QuoteType' to that list. 接受我创建的根元素，创建一个列表，然后将新创建的对象'quote_QuoteType'添加到该列表中。 Before I added threading this method successfully created a list of as many objects as I wanted then marshalled them. 在添加线程之前，此方法成功创建了所需数量的对象列表，然后将它们编组。 So I'm pretty positive the bit in class 'LogServer' is not the issue. 因此，我非常肯定“ LogServer”类中的位不是问题。 It is something to do with the marshalling and syncronization in the ProcessCommands class above. 这与上面的ProcessCommands类中的编组和同步有关。

public class LogServer{
    public static log Root = new log();

    public static QuoteServerType Log_Quote(String[] input, int TransactionNumber){
    ObjectFactory factory = new ObjectFactory();
    QuoteServerType quoteCall = factory.createQuoteServerType();

    **Populate the QuoteServerType object called quoteCall**

    return quoteCall;
    }

    public static void create_Log(String[] input, int TransactionNumber, CommandType Command){
    System.out.print("TRANSACTION "+TransactionNumber + " is " + Command + ": ");
    for(int i = 0; i<input.length;i++) System.out.print(input[i] + " ");
    System.out.print("\n");
    switch(input[1]){
    case "QUOTE":
        System.out.print("QUOTE CASE");
        QuoteServerType quote_QuoteType = Log_Quote(input,TransactionNumber);
        Root.getUserCommandOrQuoteServerOrAccountTransaction().add(quote_QuoteType);
        break;
        }
      }

Answer 1

So you wrote a lot of code, but have you try if it is actually working? 因此，您编写了很多代码，但是您是否尝试过是否真的起作用？ After quick look I doubt it. 快速浏览后，我对此表示怀疑。 You should test your code logic part by part not going all the way till the end. 您应该逐步测试代码逻辑，直到最后。 It seems you are just staring with Java. 看来您只是在盯着Java。 I would recommend practice first on simple one threaded applications. 我建议先在简单的单线程应用程序上练习。 Sorry if I sound harsh, but I will try to be constructive as well: 抱歉，如果我听起来很刺耳，但我也会尽量保持建设性：

Per convention, the classes names are starts with capital letter, variables by small, you do it other way. 按照约定，类名以大写字母开头，变量很小，您可以采用其他方式。
You should make a method in you home (Home) class not a put all your code in the static block. 您应该在您的home（Home）类中创建一个方法，而不是将所有代码放入静态块中。
You are reading the whole file to the memory, you do not process it line by line. 您正在将整个文件读取到内存中，而不是逐行处理它。 After the Home is initialized literary whole content of file will be under UserMap variable. Home初始化后，文件的整个文学内容将位于UserMap变量下。 If the file is really large you will run out of the heap memory. 如果文件确实很大，则会用完堆内存。 If you assume large file than you cannot do it and you have to redisign your app to store somewhere partial results. 如果您假设文件过大而无法做，则必须重新分配应用程序以存储部分结果。 If your file is smaller than memmory you could keep it like that (but you said it is large). 如果您的文件小于内存，则可以这样保存（但您说的很大）。
No need for UserNames, the UserMap.containsKey will do the job 不需要用户名，UserMap.containsKey将完成此工作
Your thread pools size should be in the range of your cores not number of users as you will get thread trashing (if you have blocking operation in your code make tCount = 2*processors if not keep it as number of processors). 您的线程池大小应该在核心范围内，而不是在用户数量范围内，因为您将获得线程垃圾（如果代码中有阻塞操作，请使tCount = 2 * processors，如果不将其保留为处理器数量）。 Once one ProcessCommand finish, the executor will start another one till you finish all and you will be efficiently using all your processor cores. 一旦一个ProcessCommand完成，执行程序将启动另一个，直到您完成所有操作，您将有效地使用所有处理器内核。
DO NOT while(!threadPool.isTerminated()), this line will completely consume one processor as it will be constantly checking, call awaitTermination instead 不要while（！threadPool.isTerminated（）），此行将完全消耗一个处理器，因为它将不断进行检查，而是调用awaitTermination
Your ProcessCommand, has view map variables which will only had one entry cause as you said, each will process data from one user. 您的ProcessCommand具有视图映射变量，正如您所说的，只有一个输入原因，每个变量将处理一个用户的数据。
The synchronized(this) is Process will not work, as each thread will synchronized on different object (different isntance of process). 由于每个线程将在不同的对象上进行同步（进程的不同位置），因此已同步（此）为进程将无法工作。
I believe creating marshaller is thread safe (check it) so no need to synchronization at all 我相信创建编组器是线程安全的（检查它），因此根本不需要同步
You save your log (whatever it is) before you did actual processing in of the transactions lists 在事务列表中进行实际处理之前，请保存日志（无论它是什么）
The marshalling will override content of the file with current state of LogServer.Root. 封送将使用LogServer.Root的当前状态覆盖文件的内容。 If it is shared bettween your proccsCommand (seems so) what is the point in saving it in each thread. 如果它在您的proccsCommand之间共享（似乎如此），那么在每个线程中保存它的意义何在？ Do it once you are finished. 完成后再做。
You dont need itr.remove(); 您不需要itr.remove（）;
The log class (for the ROOT variable !!!) needs to be thread-safe as all the threads will call the operations on it (so the list inside the log class must be concurrent list etc). 日志类（用于ROOT变量!!!）需要是线程安全的，因为所有线程都将对其调用操作（因此日志类内的列表必须是并发列表等）。
And so on..... 等等.....

I would recommend, to 我建议，

Start with simple one thread version that actually works. 从实际可行的简单单线程版本开始。
Deal with processing line by line, (store reasults for each users in differnt file, you can have cache with transactions for recently used users so not to keep writing all the time to the disk (see guava cache) 逐行处理（将每个用户的结果存储在不同的文件中，您可以为最近使用过的用户保存事务的缓存，这样就不必一直将其一直写入磁盘（请参阅guava缓存）
Process multithreaded each user transaction to your user log objects (again if it is a lot you have to save them to the disk not keep all in memmory). 将多线程的每个用户事务处理到用户日志对象（同样，如果很多，您必须将它们保存到磁盘中而不是全部保留在内存中）。
Write code that combines logs from diiffernt users to create one (again you may want to do it mutithreaded), though it will be mostly IO operations so not much gain and more tricky to do. 编写将来自不同用户的日志合并以创建一个日志的代码（同样，您可能希望使用多线程日志），尽管这将主要是IO操作，所以收益不多，操作起来也比较棘手。

Good luck override cont 祝你好运覆盖续

Java：使用JaxB到XML进行编组，如何正确使用多线程

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-02-19 13:54:01

Java：使用JaxB到XML进行编组，如何正确使用多线程

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-02-19 13:54:01

解决方案1
2 已采纳 2015-02-19 13:54:01