简体   繁体   中英

Merging two files column and row-wise in bash

I would like to merge two files, column and row-wise but am having difficulty doing so with bash. Here is what I would like to do.

File1:

1 2 3
4 5 6
7 8 9

File2:

2 3 4
5 6 7
8 9 1

Expected output file:

1/2 2/3 3/4
4/5 5/6 6/7
7/8 8/9 9/1

This is just an example. The actual files are two 1000x1000 data matrices.

Any thoughts on how to do this? Thanks!

paste + perl version that works with an arbitrary number of columns without having to hold an entire file in memory:

paste file1.txt file2.txt | perl -MList::MoreUtils=pairwise -lane '
    my @a = @F[0 .. (@F/2 - 1)]; # The values from file1
    my @b = @F[(@F/2) .. $#F]; # The values from file2
    print join(" ", pairwise { "$a/$b" } @a, @b); # Merge them together again'

It uses the non-standard but useful List::MoreUtils module; install through your OS package manager or favorite CPAN client.

Or use paste + awk

paste file1 file2 | awk '{ n=NF/2; for(i=1; i<=n; i++) printf "%s/%s ", $i, $(i+n); printf "\n"; }'

Note that this script adds a trailing space after the last value. This can be avoided with a more complicated awk script or by piping the output through an additional command, eg

paste file1 file2 | awk '{ n=NF/2; for(i=1; i<=n; i++) printf "%s/%s ", $i, $(i+n); printf "\n"; }' | sed 's/ $//'

awk solution without additional sed . Thanks to Jonathan Leffler . (I knew it is possible but was too lazy to think about this.)

awk '{ n=NF/2; pad=""; for(i=1; i<=n; i++) { printf "%s%s/%s", pad, $i, $(i+n); pad=" "; } printf "\n"; }'

Assumptions:

  • no blank lines in files
  • both files have the same number of rows
  • both files have the same number of fieldds
  • no idea how many rows and/or fields we'll have to deal with

One awk solution:

awk '

# first file (FNR==NR):

FNR==NR { for ( i=1 ; i<=NF ; i++)          # loop through fields
              { line[FNR,i]=$(i) }          # store field in array; array index = row number (FNR) + field number (i)
          next                              # skip to next line in file
        }

# second file:

        { pfx=""                            # init printf prefix as empty string
          for ( i=1 ; i<=NF ; i++)          # loop through fields
              { printf "%s%s/%s",           # print our results:
                    pfx, line[FNR,i], $(i)  # prefix, corresponding field from file #1, "/", current field
                pfx=" "                     # prefix for rest of fields in this line is a space
              }
          printf "\n"                       # append linefeed on end of current line
        }
' file1 file2

NOTES :

  • remove comments to declutter code
  • memory usage will climb as the size of the matrix increases (probably not an issue for the smallish fields and OPs comment about a 1000 x 1000 matrix)

The above generates:

1/2 2/3 3/4
4/5 5/6 6/7
7/8 8/9 9/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM