简体   繁体   中英

Match the number of characters in two lines

I have a file that I'm trying to prepare for some downstream analysis, but I need the number of characters in two lines to be identical. The file is formatted as below, where the 2nd ( CTTATAATGCCGCTCCCTAAG ) and 4th ( bbbeeeeegggggiiiiiiiiigghiiiiiiiiiiiiiiiiiigeccccb ) lines need to contain the same number of characters.

@HWI-ST:8:1101:3346:2198#GTCCGC/1
CTTATAATGCCGCTCCCTAAG
+HWI-ST:8:1101:3346:2198#GTCCGC/1
bbbeeeeegggggiiiiiiiiigghiiiiiiiiiiiiiiiiiigeccccb
@HWI-ST:8:1101:10491:2240#GTCCGC/1
GAGTAGGGAGTATACATCAG
+HWI-ST:8:1101:10491:2240#GTCCGC/1
abbceeeeggggfiiiiiigg`gfhfhhifhifdgg^ggdf_`_Y[aa_R
@HWI-ST:8:1101:19449:2134#GTCCGC/1
AAGAAGAGATCTGTGGACCA

So far I've pulled out the second line from each set of four and generated a file containing a record of the length of each line using:

grep -v '[^A-Z]' file.fastq |awk '{ print length($0); }' > newfile

Now I'm just looking for a way to point to this record to direct a sed command as to how many characters to trim off of the end of the line. Something similar to:

sed -r 's/.{n}$//' file

Replacing n with some regular expression to reference the text file. I wonder if I'm overcomplicating things, but I need the lines to match EXACTLY so I haven't been able to think of another way to go about it. Any help would be awesome, thanks!

This might be what you're looking for:

awk '
  # If 2nd line of 4-line group, save length as len.
  NR % 4 == 2 { len = length($0) }

  # If 4th line of 4-line group, trim the line to len.
  NR % 4 == 0 { $0 = substr($0, 1, len)}

  # print every line
  { print }
' file

This assumes that the file consists of 4-line groups where the 2nd and 4th line of each group are the ones you're interested in. It also assumes that the 2nd line of each group will be no longer than its corresponding 4th line.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM