简体   繁体   中英

python regex find and hash usernames

I would like to hash usernames in my log file - my regex works not as intended

Input-Example:

Account Name:  -  Account Domain: - ImportantStuff Account Name:  Foo bar  Account Domain: my.bar
Account Name:  Foo-bar  Supplied Realm Name: my.bar ImportantStuff 
Account Name:  Foo99bar$  Account Domain: my.bar ImportantStuff Account Name:  -  Account Domain: -

My Regex:

(((?!Account Name:\s+-\s+))(Account Name:\s+(\S+.+(?=\s+Account))))|(Account Name:\s+(\S+.+(?=\s+Supplied)))((?!Account Name:\s+-\s+))

I would like to filter as:

  • if Pattern "Account Name: - " ignore

  • if Pattern not "Account Name: - " get a username

I can't filter for "-" as a delimiter, because some usernames contain "-", that's why I went with \\s-\\s (?!ignore Pattern). Same goes with whitespace.

After that, the username is getting hashed:

result2 = re.sub(r'(((?!Account Name:\s+-\s+))(Account Name:\s+(\S+.+(?=\s+Account))))|(Account Name:\s+(\S+.+(?=\s+Supplied)))((?!Account Name:\s+-\s+))', lambda m: m.group(1) + hashlib.sha512(m.group(2)).hexdigest(), line)

At first I tried to [^Account Name:\\s+-\\s+] but of course, then everything is getting matched, which is not within [^] and it's not treated as a string.

Can I do it like this somehow ? :

((?!Account Name: - )|Account Name:\s+(.+?(?=\s+Account Domain|Supplied)))

I am running python 2.7

You may fail all matches when Account Name: is followed with whitespaces-whitespaces using a (?!\\s+-\\s) negative lookahead:

(Account Name:(?!\s+-\s)\s*)(.*?)(?=\s+(?:Account Domain|Supplied))

See the regex demo .

Details

  • (Account Name:(?!\\s+-\\s)\\s*) - Group 1: Account Name: that is not immediately followed with 1+ whitespaces, - and a whitespace ( (?!\\s+-\\s) ) and then 0+ whitespaces
  • (.*?) - Group 2: any zero or more chars other than line break chars, as few as possible
  • (?=\\s+(?:Account Domain|Supplied)) - a positive lookahead that requires 1+ whitespaces followed with Account Domain or Supplied substrings immediately to the right of the current location.

See the Python 2 demo :

import re,hashlib
l = ['Account Name:  -  Account Domain: - ImportantStuff Account Name:  Foo bar  Account Domain: my.bar','Account Name:  Foo-bar  Supplied Realm Name: my.bar ImportantStuff','Account Name:  Foo99bar$  Account Domain: my.bar ImportantStuff Account Name:  -  Account Domain: -']
for line in l:
    print(re.sub(r'(Account Name:(?!\s+-\s)\s*)(.*?)(?=\s+(?:Account Domain|Supplied))',
        lambda m: m.group(1) + hashlib.sha512(m.group(2)).hexdigest(), line))

Output:

Account Name:  -  Account Domain: - ImportantStuff Account Name:  45a19ebf5c5c04bf71e9819b29e9a71ee7b4f9b5d3de72615b9788da05eceb526cc47b18e108107a3e53ee2068c4da4fca8209e9e2d87560d6848823eebe803b  Account Domain: my.bar
Account Name:  4ac1e08061b7216e9d3e0a44d6ca6512a25577a1e0675ba7cb439fc243e84d566dd0c1aac33f89c5c23e959fef5dc6a71cdd2adba257c81975caa822be4e5018Supplied Realm Name: my.bar ImportantStuff
Account Name:  7228cb36d1d3b5cd41d50d150defd13e06441eb2b6a4689f9356012607fb0ebf5680af49f743baf289a590a07f8da6077f5288a5d4000448bfc7fd303869d31f  Account Domain: my.bar ImportantStuff Account Name:  -  Account Domain: -

Its better to think to split it into multiple problems first.

Since your logs have same structure (I assume here that your usernames do not have spaces). So split by blocks first.

Then you have your username always on a specific block.

On this specific block you can apply any rule you want with simpler regular expression even.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM