简体   繁体   English

对文本文件中的每一行进行排序

[英]Sort each line in a text file

I have a text file which contains on each line some words, for example: 我有一个文本文件,每行包含一些单词,例如:

stackoverflow coding programming
tag question badges

I must sort each line and preserve the order of lines. 我必须对每一行进行排序并保留行的顺序。 For example, for the above example the output should be: 例如,对于上面的示例,输出应为:

coding programming stackoverflow
badges question tag

My solution until now is to create a temp file, in which all the lines are sorted. 到目前为止,我的解决方案是创建一个临时文件,其中所有行都已排序。 The bash script looks like this: bash脚本如下所示:

FILE_TMP=$FILE".tmp" 
while read line
do
echo $line | xargs -n1 | sort | xargs >>$FILE_TMP
done < $FILE

mv $FILE_TMP $FILE

It works fine, but I'm not pleased that I must create a duplicate file, especially because the files are big. 它工作正常,但我不高兴我必须创建一个重复的文件,特别是因为文件很大。

So, my question is there any solution to sort in place each line of the file? 所以,我的问题是有任何解决方案来排序文件的每一行?

Thank you, 谢谢,

试试这个(如果文件没有空格分隔,你可能需要更改sed):

cat datafile.dat | while read line; do echo $line | sed 's/ /\n/g' | sort | gawk '{line=line " " $0} END {print line}' ; done

If Python were an option, this would be quite easy using the in-place support from the fileinput module 如果Python是一个选项,使用fileinput模块的就地支持将非常容易

>>> import os
>>> import fileinput
>>> for line in fileinput.input('file.txt', inplace=1):
...     line = line.rstrip(os.linesep)
...     print(' '.join(sorted(line.split())))
...

You could script a text editor (vim or emacs, for example) to do it "in place", but that wouldn't really help you avoid using a temp file since text editors will internally use temp files. 您可以编写文本编辑器(例如vim或emacs)来编写“就地”脚本,但这并不能帮助您避免使用临时文件,因为文本编辑器将在内部使用临时文件。

If your real problem is that it is slow to run, that is probably because it is spawning 3 different processes for each line in the source file. 如果您真正的问题是它运行缓慢,那可能是因为它为源文件中的每一行产生了3个不同的进程。 You could get around that by using a scripting language like perl that could go through the file sorting lines without spawning any additional processes. 您可以通过使用像perl这样的脚本语言来解决这个问题,这种语言可以通过文件排序行而不会产生任何其他进程。 You'd still have an additional file for the output. 你还有一个额外的输出文件。

The accepted answer is somewhat slow. 接受的答案有点慢。 Try this: 试试这个:

awk ' {split( $0, a, " " ); asort( a ); for( i = 1; i <= length(a); i++ ) printf( "%s ", a[i] ); printf( "\n" ); }' input >output

Note: Your awk must be GNU, so as to have asort(). 注意:你的awk必须是GNU,以便拥有asort()。

I think that the following awk goodness should do the job: 我认为以下awk善良应该做的工作:

prompt$ cat foo.awk
{
    n = split($0, words)
    do {
        change_occured = 0
        for (idx = 1; idx <= n; ++idx) {
            if (words[idx] > words[idx + 1]) {
                t = words[idx]
                words[idx] = words[idx + 1]
                words[idx + 1] = t
                change_occured = 1
            }
        }
    } while (change_occured != 0)
    for (idx in words) {
        printf("%s ", words[idx])
    }
    split("", array)
    print ""
}
prompt$ awk -f foo.awk <<EOF
heredoc> stackoverflow coding programming
heredoc> tag question badges
heredoc> EOF
coding programming stackoverflow  
badges question tag

EDIT note that this is not an in place edit. 编辑注意,这不是一个就地编辑。 It acts as a filter from stdin to stdout. 它充当从stdin到stdout的过滤器。 You can use awk for this as well but reading and writing files there feels "clunky". 您也可以使用awk,但读取和写入文件感觉“笨拙”。 If you really want to avoid the temporary file, use something like Perl. 如果你真的想避开临时文件,请使用像Perl这样的东西。

Practically any "reasonable" solution for this problem will write the new contents to a new temporary file and then rename. 实际上,针对此问题的任何“合理”解决方案都会将新内容写入新的临时文件,然后重命名。 Even things like perl "in place" processing ( perl -pi... ) or text editors actually do that. 甚至像perl“就地”处理( perl -pi... )或文本编辑器这样的事情实际上也是如此。 If you want to do it really in place , writing to the same physical disk position, it could be done (the new contents occupy exactly the same space as the old) but it's rather painful . 如果你想真正做到这一点,写入相同的物理磁盘位置,它可以完成(新内容占用与旧磁盘完全相同的空间),但这是相当痛苦的

You can compile the code from this answer into a overwrite executable, and then run (WARNING: this is dangerous, backup your file first!) 您可以将此答案中的代码编译为overwrite可执行文件,然后运行(警告:这很危险,请先备份您的文件!)

while read line ; do echo $line | xargs -n1 | sort | xargs ; done < f | ./overwrite f

This is rather fragile, for example, you should be absolutely sure that the sorting that does the script does not mess with blank characters (what about DOS newlines? and consecutive blanks?), the script must spit the same amount (or less) of bytes per line as it eats. 这是相当脆弱的,例如,您应该绝对确定执行脚本的排序不会弄乱空白字符(DOS换行符怎么样?连续空白?),脚本必须吐出相同数量(或更少)的吃的每行字节数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM