简体   繁体   中英

Search a CSV file for a value in the first column, if found shift the value of second column one row down

I have CSV files that look like this:

 786,1702
 787,1722
 -,1724
 788,1769
 789,1766

I would like to have a bash command that searches the first column for the - and if found then shifts the values in the second column down. The - reccurr several times in the first column and would need to start from the top to preserve the order of the second column.

The second column would be blank

Desired output:

 786,1702
 787,1722
 -,
 788,1724
 789,1769
 790,1766

So far I have: awk -F ',' '$1 ~ /^-$/' filename.csv to find the hyphens, but shifting the 2nd column down is tricky...

Assuming that the left column continues with incremental IDs to shift the right column until it is empty.

awk 'BEGIN{start=0;FS=","}$1=="-"{stack[stacklen++]=$2;print $1",";next}stacklen-start{stack[stacklen++]=$2;print $1","stack[start];delete stack[start++];next}1;END{for (i=start;i<stacklen;i++){print $1-start+i+1,stack[i]}}' filename.csv
# or
<filename.csv awk -F, -v start=0 '$1=="-"{stack[stacklen++]=$2;print $1",";next}stacklen-start{stack[stacklen++]=$2;print $1","stack[start];delete stack[start++];next}1;END{for (i=start;i<stacklen;i++){print $1-start+i+1,stack[i]}}'

Or, explained:

I am here using a shifted stack to avoid rewriting indexes. With start as the pointer to the first useful element of the stack, and stacklen as the last element. This avoids the costly operation of shifting all array elements whenever we want to remove the first element.

# chmod +x shift_when_dash
./shift_when_dash filename.csv

with shift_when_dash being an executable file containing:

#!/usr/bin/awk -f
BEGIN {              # Everything in this block is executed once before opening the file
  start = 0          # Needed because we are using it in a scalar context before initialization
  FS = ","           # Input field separator is a comma
}
$1 == "-" {          # We match the special case where the first column is a simple dash
  stack[stacklen++] = $2 # We store the second column on top of our stack
  print $1 ","           # We print the dash without a second column as asked by OP
  next                   # We stop processing the current record and go on to the record
}
stacklen - start {          # In case we still have something in our stack
  stack[stacklen++] = $2    # We store the current 2nd column on the stack
  print $1 "," stack[start] # We print the current ID with the first stacked element
  delete stack[start++]     # Free up some memory and increment our pointer
  next
}
1                           # We print the line as-is, without any modification.
                            # This applies to lines which were not skipped by the
                            # 'next' statements above, so in our case all lines before
                            # the first dash is encountered.
END {
  for (i=start;i<stacklen;i++) {    # For every element remaining in the stack after the last line
    print $1-start+i+1 "," stack[i] # We print a new incremental id with the stack element
  }
}

next is an awk statement similar to continue in other languages, with the difference that it skips to the next input line instead of the next loop element . It is useful to emulate a switch-case .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM