简体   繁体   中英

Bash one-liner to mask data in file

I have a file which is quite big. I need to mask all characters in specific postions and from a specific record type. I have searched all over the place but cannot find a solution of this quite simple task. Here is an example

File name: hello.txt

File:

0120140206INPUT FILE
1032682842 MR SIMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 MR GRIFFIN
20231458 Spooner Street
3034560817 RED
3001 

What I would like to do is to mask position 12-16 of all lines beginnning with "10". Like this:

0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

Using sed

sed -r '/^10/ s/^(.{11}).{5}/\1XXXXX/' file

0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

Explanation

  • -r useful option in sed, --regexp-extended
  • /^10/ Search the line beginning with 10.
  • s/^(.{11}).{5}/\\1XXXXX/ mask position 12-16 to XXXXX

With same idea, if your awk is gawk, and support gensub() function:

awk '{$0=gensub(/^(10.{9}).{5}/,"\\1XXXXX",$0)}1' file

update: @tripleee provide a shorter one:

sed -r 's/^(10.{9}).{5}/\1XXXXX/' file

This can be a way:

$ awk 'BEGIN{FS=OFS=""} $1$2=="10" {for(i=12;i<=16;i++) $i="X"}1' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

Explanation

  • BEGIN{FS=OFS=""} set field separator as "", so that first char will be first field, 2nd char will be 2nd field...
  • $1$2=="10" {for(i=12;i<=16;i++) $i="X"} if the first char is 1 and the second 0 , then change from the 12th to the 16th characters to X .
  • 1 true condition, which is evaluated as the default awk behaviour: {print $0} .

这个awk可以工作:

awk '/^10/{q=substr($0, 12, 4); gsub(/./, "*", q); $0=substr($0, 1, 11) q substr($0, 17)}1' file

This should do:

awk '/^10/{q=substr($0,1,11);r=substr($0,17); $0=q "XXXXX" r }1' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

This might work for you (GNU sed):

sed -r '/^10/{s/^(.{0,11})(.{0,5})/\1\n\2\n/;h;s/[^\n]/X/g;G;s/.*\n(.*)\n.*\n(.*)\n.*\n/\2\1/}' file

For lines beginning with 10 : place two markers either side of the intended mask, copy, replace all characters other than the markers with the mask character, append the copy and manipulate the text between the markers to position the mask.

NB This caters for short lines and does not introduce artefacts.

You can use gawk fixed-width data reading capability:

gawk -v FIELDWIDTHS="11 5 9999" -v OFS="" '/^10/ { $2 = "XXXXX" } ; { print }' file

See https://www.gnu.org/software/gawk/manual/gawk.html#Constant-Size .

You can use BASH:

while read f1 f2; do
    if [[ $f1 =~ ^10 ]]; then
            f2="XXXXX${f2:5}"
    fi
    echo $f1 $f2
done < hello.txt

This will work if you only need to replace the first 5 characters of the second field with XXXXX .

If you need to replace the 12th through the 16th characters with XXXXX regardless of field, you could do the longer:

while read l; do
    if [[ $l =~ ^10 ]]; then
            b=${l:11}
            e=${l:16}
            t=${b/$e/}
            l=${l/$t/XXXXX}
    fi
    echo $l
done < hello.txt

perl的替代品

perl -p -i -e 's/^(10\d* )[A-Z ]{6}(.*)/$1XXXXXX$2/g' filename.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM