简体   繁体   English

Unix排序命令需要更长的时间,具体取决于它的执行位置?! (从IDE运行程序中的ProcessBuilder最快,终端最慢)

[英]Unix sort command takes much longer depending on where it is executed?! (fastest from ProcessBuilder in program run from IDE, slowest from terminal)

I have a java program that uses ProcessBuilder to call the unix sort command. 我有一个java程序,它使用ProcessBuilder来调用unix sort命令。 When I run this code within my IDE (intelliJ) it only takes about a second to sort 500,000 lines. 当我在我的IDE(intelliJ)中运行此代码时,只需要大约一秒钟来排序500,000行。 When I package it into an executable jar, and run that from the terminal it takes about 10 seconds. 当我将它打包到一个可执行的jar中,并从终端运行它需要大约10秒。 When I run the sort command myself from the terminal, it takes 20 seconds! 当我自己从终端运行sort命令时,需要20秒!

Why the vast difference in performance and any way I can get the jar to execute with the same performance? 为什么性能上的巨大差异以及我可以以相同的性能执行jar的任何方式? Environment is OSX 10.6.8 and java 1.6.0_26. 环境是OSX 10.6.8和java 1.6.0_26。 The bottom of the sort man page says "sort 5.93 November 2004" 排序手册页的底部显示“2004年11月5.93排序”

The command it is executing is: 它正在执行的命令是:

sort -t'    ' -k5,5f -k4,4f -k1,1n /path/to/imput/file -o /path/to/output/file

Note that when I run sort from the terminal I need to manually escape the tab delimiter and use the argument -t$'\\t' instead of the actual tab (which I can pass to ProcessBuilder). 请注意,当我从终端运行sort时,我需要手动转义制表符分隔符并使用参数-t$'\\t'而不是实际的选项卡(我可以传递给ProcessBuilder)。

Looking as ps everything seems the same except when run from IDE the sort command has a TTY of ?? 看起来像ps一切看起来都一样,除非从IDE运行时,sort命令的TTY值为? instead of ttys000--but from this question I don't think that should make a difference. 而不是ttys000 - 但从这个问题我不认为这应该有所作为。 Perhaps BASH is slowing me down? 也许BASH让我放慢脚步? I am running out of ideas and want to close this 20x performance gap! 我的想法已经不多了,想要缩短20倍的性能差距!

I'm going to venture two guesses: 我打算冒两个猜测:

  • perhaps you are invoking different versions of sort (do a which sort and use the full absolute path to recompare?) 也许你正在调用不同版本的sort(做一个which sort并使用完整的绝对路径来重新比较?)

  • perhaps you are using more complicated locale settings (leading to more complicated character set handling etc.)? 也许你正在使用更复杂的语言环境设置(导致更复杂的字符集处理等)? Try 尝试

      export LANG=C sort -t' ' -k5,5f -k4,4f -k1,1n /input/file -o /output/file 

to compare 比较

Have a look at this project: http://code.google.com/p/externalsortinginjava/ 看看这个项目: http//code.google.com/p/externalsortinginjava/

Avoid the need of calling external sort entirely. 避免完全调用外部排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM