简体   繁体   中英

Matching special characters in with sed

I have a file with lines as below. My goals is to mask the value of fields such as Name, DOB, Email Address, Mailing Address, Residential Address, Phone Number, Other Phone Number with **. The tricky part is, that there might not be a predictable length of the text before the next field starts. For example, where the City ends and the State starts.. so maybe use * to know the end point? I am using a .sed file and running it against this log file. The "|" are all part of the file as well. It is basically a screen outputted into a log file

    -------------------------------------------------------------------------- --------
    | XXX XX Requested function key not allocated.                                  |
    |     ***** System *****                                                         |
    |                           - Maintain  -              11:55 AM                  |
    | < 1 more  P                                                           3 more > |
    | *Action (A,D,M): _                                                                      |
    |  Office Number: 14                                                             |
    | Case ID:    XXXXXXXXX    Email Address: ___________________________________    |
    | Name: TWENTYFIFTEE MAYSEVEN          DOB: 11111950  *Correspondence Lang: _    |
    |                            Street One                    Street Two            |
    | Mailing Address....: 7 MAY____________________    _________________________    |
    | City...............: DALLAS_________ *State: TX Zip Code: 75062 - ____         |
    |                                                                                |
    | Residential Address: 7 MAY____________________    _________________________    |
    | City...............: DALLAS_________ *State: TX Zip Code: 75062 - ____         |
    | Phone Number...:( ___ ) ___ - ____    Other Phone Number:( ___ ) ___ - ____    |
    | Authorized Rep                     Last      TTL   First   MI                  |
    |                              Name........: ____________ ___ _________ _        |
    | Authorized Representative Phone Number: ( ___ ) ___ - ____                     |
    | Last Updt Dttm......: 05/07/2015 11:55:01 AM   Last Update User: JU14          |
    |                         XXXXXX               XXXXXX                            |
    |                                                                                |
    ----------------------------------------------------------------------------------

so maybe use * to know the end point?

I'm not sure if this is a good approach. Doesn't seem like all fields are followed by a * , and this doesn't cover the case where the field's value has a * in it.

Assuming you can just replace the entire field with * characters, I would break this into multiple sed commands (one for each field you want to replace).

It will also require a bit of manual work; here we replace 30 characters of almost any type . with 30 * characters since that is how many characters the "name" value field has.

name_len=30
sed -r "s/(Name: ).{,$name_len}/\1$(printf '*%.0s' {1..$name_len})/g"

The effect of this on on your 9th line is

| Name: ****************************** DOB: 11111950  *Correspondence Lang: _    |

it's case by case option using delimiters

for Balise in '| Name: <-> DOB:' ' DOB: <->   |' ' Email Address: <->   |' 
 do
   sed ":cycle
      s/\(${Balise%<->*}[*]*\)[^*]\(.*${Balise#*<->}\)/\1*\2/
      t cycle" YourFile > TempFile
   mv TempFile YourFile
 done
  • use 2 delimiters per change. Each change delimiters is define in the for in loop entry via a string composed of 1st delimiter followed by <-> and end delimiter.
    • I add 3 samples in this code
    • you can use other sequence of character as separator between delimiter but adapt the sed part in consequence (in fact the <-> in ${Balise...}
  • sed will recursively change character between the 2 delimiters by *
  • you can use -i option with GNU sed in place of a temporary file used here for any version

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM