简体   繁体   中英

Replace multiple strings in a file by tagging them

I would like to replace multiple strings in a file for example an IP Address and would like to tag them so that any re-occurrence will be marked with the same name.

For example, if this is my file:

2018-09-13 19:00:00,317 INFO  -util.SSHUtil: Waiting for channel close
2018-09-13 19:00:01,317 INFO  -util.SSHUtil: Waiting for channel close
2018-09-13 19:00:01,891 INFO  -filters.BasicAuthFilter: Client IP:192.168.100.98
2018-09-13 19:00:01,891 INFO  -filters.BasicAuthFilter: Validating token ... 
2018-09-13 19:00:01,892 INFO  -authentication.Tokenization: Token:192.168.100.98:20180913_183401is present in map
2018-09-13 19:00:01,892 INFO  -configure.ConfigStatusCollector: status.
2018-09-13 19:00:01,909 INFO  -filters.BasicAuthFilter: Client IP:192.168.100.98
2018-09-13 19:00:01,909 INFO  -filters.BasicAuthFilter: Validating token ... 
2018-09-13 19:00:01,910 INFO  -authentication.Tokenization: Token:192.168.100.98:20180913_183401is present in map
2018-09-13 19:00:01,910 INFO  -restadapter.ConfigStatusService: configuration status.
2018-09-13 19:00:01,910 INFO  -configure.Collector: Getting configuration status.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Processing the ssh command execution results standard output.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Processing the ssh command execution standard error.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Remote command using SSH execution status: Host     : [10.2.251.129]   User     : [root]   Password : [***********]    Command  : [shell ntpdate -u 132.132.0.88]  STATUS   : [0]
2018-09-13 19:00:02,318 INFO  -util.SSHUtil:    STDOUT   : [Shell access is granted to root
            14 Sep 01:00:01 ntpdate[16063]: adjust time server 132.132.0.88 offset 0.353427 sec
]
2018-09-13 19:00:02,318 INFO  -util.SSHUtil:    STDERR   : []
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Successfully executed remote command using SSH.
2018-09-13 19:00:02,318 INFO  Successfully executed the command on VCenter :10.2.251.129

It should become:

2018-09-13 19:00:00,317 INFO  -util.SSHUtil: Waiting for channel close
2018-09-13 19:00:01,317 INFO  -util.SSHUtil: Waiting for channel close
2018-09-13 19:00:01,891 INFO  -filters.BasicAuthFilter: Client IP:IP_1
2018-09-13 19:00:01,891 INFO  -filters.BasicAuthFilter: Validating token ... 
2018-09-13 19:00:01,892 INFO  -authentication.Tokenization: Token:IP_1:20180913_183401is present in map
2018-09-13 19:00:01,892 INFO  -configure.ConfigStatusCollector: status.
2018-09-13 19:00:01,909 INFO  -filters.BasicAuthFilter: Client IP:IP_1
2018-09-13 19:00:01,909 INFO  -filters.BasicAuthFilter: Validating token ... 
2018-09-13 19:00:01,910 INFO  -authentication.Tokenization: Token:IP_1:20180913_183401is present in map
2018-09-13 19:00:01,910 INFO  -restadapter.ConfigStatusService: configuration status.
2018-09-13 19:00:01,910 INFO  -configure.Collector: Getting configuration status.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Processing the ssh command execution results standard output.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Processing the ssh command execution standard error.
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Remote command using SSH execution status: Host     : [IP_2]   User     : [root]   Password : [***********]    Command  : [shell ntpdate -u IP_3]  STATUS   : [0]
2018-09-13 19:00:02,318 INFO  -util.SSHUtil:    STDOUT   : [Shell access is granted to root
        14 Sep 01:00:01 ntpdate[16063]: adjust time server IP_3 offset 0.353427 sec]
2018-09-13 19:00:02,318 INFO  -util.SSHUtil:    STDERR   : []
2018-09-13 19:00:02,318 INFO  -util.SSHUtil: Successfully executedremote command using SSH.
2018-09-13 19:00:02,318 INFO  Successfully executed the command on VCenter :IP_2

The below script actually does what i want but then its file specific :

import typing, re
def change_ips(ips:typing.List[str]) -> typing.Generator[str, None, None]:
   val = {}
   count = 1
   for i in ips:
     if i not in val:
       yield f'IP_{count}'
       val[i] = count
       count += 1
     else:
       yield f'IP_{val[i]}'


with open(r'server.log') as f:
  content = f.read()
  with open(r'logfile2.txt', 'w') as f1:

    f1.write(re.sub('\d+\.\d+\.\d+\.\d+', '{}', content).format(*change_ips(re.findall('\d+\.\d+\.\d+\.\d+', content))))

This works but then it is file-specific and doesn't work with other log files, i would like to make it robust in a way that any file where IP address is there in any line, it would work not to a particular log file.

An Example where it doesn't work :

2018-09-15 15:58:20,083 INFO  [Timer-0]-util.SSHUtil:   STDERR   : []
2018-09-15 15:58:20,083 INFO  [Timer-0]-util.SSHUtil: Successfully executed remote command using SSH.
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line

2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
---------------------------------------------------------------------
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
Validate [33mKBDash2121 Node[0m installation BEGIN:
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
Show KBDash2121 system configuration:  [33m1.1.2.371[0m
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
*****************************************************************
2018-09-15 15:58:20,090 INFO  [Timer-0]-util.SSHUtil: Connecting to host [10.60.9.44] using provided credentials.
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
    "cis_url"               : "https://localhost:441/cis/v1.1",
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
    "app_name"              : "KBDash2121",
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
    "node_name"             : "idpa-1-dps",
2018-09-15 15:59:40,093 ERROR [Timer-0]-dashboard.DPSDashboard: Unable to validate ssh credential.Host 10.60.9.44 is not reachable.
2018-09-15 15:59:40,093 ERROR [Timer-0]-dashboard.DPSDashboard: loadDataNodeStatus --> unable to find data node process statuscom.common.exception.ApplianceException: Host 10.60.9.44 is not reachable.
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
    "system_index_name"     : "system",
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
    "worker_id"             : "aWRwYS0xLWRwc3wwMDo1MDo1Njo5RDoyRDo4RSA=",
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
    "work_base_folder": "/mnt/KBDash2121_work",
2018-09-15 15:58:20,083 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
    "service_work_folder"                          : "tmp/dpworker",
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
    "web_download_folder"   : "tmp/dpweb",
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
    "admin_api_url"         : "https://localhost:448/admin_api/v1",
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
    "search_api_url"        : "https://localhost:449/search_api/v1",
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
*****************************************************************
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
[32mDirectory: /usr/local/KBDash2121 has been created [0m
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
[32mFile: /usr/local/KBDash2121/etc/system.conf has been created [0m
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
[32mService: dpworker is on[0m
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
[32mService: nginx is on[0m
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
[32mProccess: WorkerService is running[0m
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
[32mProccess: nginx is running[0m
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
[33mchecking admin api url:https://localhost:448......
2018-09-15 15:58:20,084 INFO  [Timer-0]-dashboard.KBDash: getProcessSummary -->  processing output line
[32mOk: {"status":200,"name":"myspace","version":"1.1.2.371","cis":"online","tagline":"none"}[0m
2018-09-15 15:59:40,106 INFO  [Timer-0]-util.SSHUtil: Connecting to host [10.60.9.59] using provided credentials.
2018-09-15 15:59:40,209 INFO  [Timer-0]-util.SSHUtil: Connected to host [10.60.9.59] using provided credentials.

You could keep an array of unique IP addresses, and use their index in the array as the substitution value.

In the code below, the \\1 in the replace_func refers to the first match in the regex. We look that up in the array (adding if necessary), format it properly, and return it for use as the substitution value of the re.sub below.

Something like this:

import fileinput
import re

ips = []

def replace_func(match):
    ip = match.expand(r'\1')
    if ip not in ips:
        ips.append(ip)
    return 'IP_%s' % ips.index(ip)

with fileinput.input('server.log', inplace=True, backup='.bak') as file:
    for line in file:
        print(re.sub(r'(\d+\.\d+\.\d+\.\d+)', replace_func, line), end='')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM