I have a file which is quite big. I need to mask all characters in specific postions and from a specific record type. I have searched all over the place but cannot find a solution of this quite simple task. Here is an example
File name: hello.txt
File:
0120140206INPUT FILE
1032682842 MR SIMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 MR GRIFFIN
20231458 Spooner Street
3034560817 RED
3001
What I would like to do is to mask position 12-16 of all lines beginnning with "10". Like this:
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001
Using sed
sed -r '/^10/ s/^(.{11}).{5}/\1XXXXX/' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001
-r
useful option in sed, --regexp-extended /^10/
Search the line beginning with 10. s/^(.{11}).{5}/\\1XXXXX/
mask position 12-16 to XXXXX With same idea, if your awk is gawk, and support gensub()
function:
awk '{$0=gensub(/^(10.{9}).{5}/,"\\1XXXXX",$0)}1' file
update: @tripleee provide a shorter one:
sed -r 's/^(10.{9}).{5}/\1XXXXX/' file
This can be a way:
$ awk 'BEGIN{FS=OFS=""} $1$2=="10" {for(i=12;i<=16;i++) $i="X"}1' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001
BEGIN{FS=OFS=""}
set field separator as "", so that first char will be first field, 2nd char will be 2nd field... $1$2=="10" {for(i=12;i<=16;i++) $i="X"}
if the first char is 1
and the second 0
, then change from the 12th to the 16th characters to X
. 1
true condition, which is evaluated as the default awk behaviour: {print $0}
. 这个awk可以工作:
awk '/^10/{q=substr($0, 12, 4); gsub(/./, "*", q); $0=substr($0, 1, 11) q substr($0, 17)}1' file
This should do:
awk '/^10/{q=substr($0,1,11);r=substr($0,17); $0=q "XXXXX" r }1' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001
This might work for you (GNU sed):
sed -r '/^10/{s/^(.{0,11})(.{0,5})/\1\n\2\n/;h;s/[^\n]/X/g;G;s/.*\n(.*)\n.*\n(.*)\n.*\n/\2\1/}' file
For lines beginning with 10
: place two markers either side of the intended mask, copy, replace all characters other than the markers with the mask character, append the copy and manipulate the text between the markers to position the mask.
NB This caters for short lines and does not introduce artefacts.
You can use gawk fixed-width data reading capability:
gawk -v FIELDWIDTHS="11 5 9999" -v OFS="" '/^10/ { $2 = "XXXXX" } ; { print }' file
See https://www.gnu.org/software/gawk/manual/gawk.html#Constant-Size .
You can use BASH:
while read f1 f2; do
if [[ $f1 =~ ^10 ]]; then
f2="XXXXX${f2:5}"
fi
echo $f1 $f2
done < hello.txt
This will work if you only need to replace the first 5 characters of the second field with XXXXX
.
If you need to replace the 12th through the 16th characters with XXXXX
regardless of field, you could do the longer:
while read l; do
if [[ $l =~ ^10 ]]; then
b=${l:11}
e=${l:16}
t=${b/$e/}
l=${l/$t/XXXXX}
fi
echo $l
done < hello.txt
perl的替代品
perl -p -i -e 's/^(10\d* )[A-Z ]{6}(.*)/$1XXXXXX$2/g' filename.txt
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.