pyspark regex to match domain\username pattern

Question

I have string with domain\\username in an array. I want to match it and replace it.

The string has following pattern:

[, DESKTOP-XXQYY56\Adminaccount, ] [, MB4345XX\adminaccount, ]

The code I am using is as follows:

df2= df1.withColumn(
    'str1',
     regexp_replace(
        'str',
        r'^([A-Za-z0-9]+(-[A-Za-z0-9]+)*)+(\\?([A-Za-z0-9])+)*',
        'AB22'
    )
)

I am not able to match the pattern correctly. I want to match the string and replace it. Please suggest.

Answer 1

If you want to match that format and replace the domain\\user\u003c/code> with XXXX you might use 2 capturing groups for the opening [, and closing , ]

You could omit the anchor ^ and in this part ([A-Za-z0-9])+ move the quantifier + to the character class [A-Za-z0-9]+ or else you would repeat the group matching a single char.

If you are not using the capturing groups separately for further processing you could turn them into non capturing groups (?:

The pattern might look like

(\[, )[A-Za-z0-9]+(?:-[A-Za-z0-9]+)*(?:\\?[A-Za-z0-9]+)*(, \])

In parts

(\\[, ) Capture group 1 match [,
[A-Za-z0-9]+ Match 1+ times any of the listed in the character class
(?: Non capturing group -[A-Za-z0-9]+ Match - and match 1+ times any of the listed
)* Close non capturing group and repeat 0+ times
(?: Non capturing group \\\\?[A-Za-z0-9]+ Match optional \\ and 1+ times any of the listed
)* Close non capturing group and repeat 1+ times
(, \\]) Capture group 2 match , ]

In the replacement use the 2 capturing groups

$1XXXX$2

Regex demo

pyspark regex to match domain\username pattern

Question

1 answers

solution1
0 2019-09-19 10:23:12

pyspark regex to match domain\username pattern

Question

1 answers

solution1 0 2019-09-19 10:23:12

solution1
0 2019-09-19 10:23:12