简体   繁体   English

猛击一线掩盖文件中的数据

[英]Bash one-liner to mask data in file

I have a file which is quite big. 我有一个很大的文件。 I need to mask all characters in specific postions and from a specific record type. 我需要掩盖特定位置和特定记录类型中的所有字符。 I have searched all over the place but cannot find a solution of this quite simple task. 我到处搜索,但是找不到解决这个非常简单任务的方法。 Here is an example 这是一个例子

File name: hello.txt 文件名:hello.txt

File: 文件:

0120140206INPUT FILE
1032682842 MR SIMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 MR GRIFFIN
20231458 Spooner Street
3034560817 RED
3001 

What I would like to do is to mask position 12-16 of all lines beginnning with "10". 我想做的是屏蔽以“ 10”开头的所有行的位置12-16。 Like this: 像这样:

0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

Using sed 使用sed

sed -r '/^10/ s/^(.{11}).{5}/\1XXXXX/' file

0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

Explanation 说明

  • -r useful option in sed, --regexp-extended sed中的-r有用选项,--regexp-extended
  • /^10/ Search the line beginning with 10. /^10/搜索以10开头的行。
  • s/^(.{11}).{5}/\\1XXXXX/ mask position 12-16 to XXXXX s/^(.{11}).{5}/\\1XXXXX/遮罩位置12-16至XXXXX

With same idea, if your awk is gawk, and support gensub() function: 同样的想法,如果您的awk是gawk,并支持gensub()函数:

awk '{$0=gensub(/^(10.{9}).{5}/,"\\1XXXXX",$0)}1' file

update: @tripleee provide a shorter one: 更新:@tripleee提供了一个简短的:

sed -r 's/^(10.{9}).{5}/\1XXXXX/' file

This can be a way: 这可以是一种方法:

$ awk 'BEGIN{FS=OFS=""} $1$2=="10" {for(i=12;i<=16;i++) $i="X"}1' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

Explanation 说明

  • BEGIN{FS=OFS=""} set field separator as "", so that first char will be first field, 2nd char will be 2nd field... BEGIN{FS=OFS=""}设置字段分隔符为“”,这样第一个字符将是第一个字段,第二个字符将是第二个字段...
  • $1$2=="10" {for(i=12;i<=16;i++) $i="X"} if the first char is 1 and the second 0 , then change from the 12th to the 16th characters to X . $1$2=="10" {for(i=12;i<=16;i++) $i="X"}如果第一个字符为1 ,第二个字符为0 ,则从第12个字符到第16个字符更改为X
  • 1 true condition, which is evaluated as the default awk behaviour: {print $0} . 1真实条件,被评估为默认awk行为: {print $0}

这个awk可以工作:

awk '/^10/{q=substr($0, 12, 4); gsub(/./, "*", q); $0=substr($0, 1, 11) q substr($0, 17)}1' file

This should do: 应该这样做:

awk '/^10/{q=substr($0,1,11);r=substr($0,17); $0=q "XXXXX" r }1' file
0120140206INPUT FILE
1032682842 XXXXXMPSON
20231458 742 Evergreen Terrace
3034560817 GREEN
1032682842 XXXXXIFFIN
20231458 Spooner Street
3034560817 RED
3001

This might work for you (GNU sed): 这可能对您有用(GNU sed):

sed -r '/^10/{s/^(.{0,11})(.{0,5})/\1\n\2\n/;h;s/[^\n]/X/g;G;s/.*\n(.*)\n.*\n(.*)\n.*\n/\2\1/}' file

For lines beginning with 10 : place two markers either side of the intended mask, copy, replace all characters other than the markers with the mask character, append the copy and manipulate the text between the markers to position the mask. 对于以10开头的行:在目标蒙版的两侧放置两个标记,进行复制,将除标记之外的所有其他字符替换为蒙版字符,追加副本并在标记之间操纵文本以放置蒙版。

NB This caters for short lines and does not introduce artefacts. 注意:这只适合短线,不会引入人工制品。

You can use gawk fixed-width data reading capability: 您可以使用gawk定宽数据读取功能:

gawk -v FIELDWIDTHS="11 5 9999" -v OFS="" '/^10/ { $2 = "XXXXX" } ; { print }' file

See https://www.gnu.org/software/gawk/manual/gawk.html#Constant-Size . 参见https://www.gnu.org/software/gawk/manual/gawk.html#Constant-Size

You can use BASH: 您可以使用BASH:

while read f1 f2; do
    if [[ $f1 =~ ^10 ]]; then
            f2="XXXXX${f2:5}"
    fi
    echo $f1 $f2
done < hello.txt

This will work if you only need to replace the first 5 characters of the second field with XXXXX . 如果您只需要用XXXXX替换第二个字段的前5个字符,这将起作用。

If you need to replace the 12th through the 16th characters with XXXXX regardless of field, you could do the longer: 如果您需要将XXXXX的第12到16个字符替换为XXXXX考虑字段),则可以执行以下操作:

while read l; do
    if [[ $l =~ ^10 ]]; then
            b=${l:11}
            e=${l:16}
            t=${b/$e/}
            l=${l/$t/XXXXX}
    fi
    echo $l
done < hello.txt

perl的替代品

perl -p -i -e 's/^(10\d* )[A-Z ]{6}(.*)/$1XXXXXX$2/g' filename.txt

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM