Extract required data between characters from a string using regex in python or pyspark

Question

I want to extract data between few characters from the string data present in the rows of a dataframe column. For example I have the data in the column like below:

+----------------------------------------------------+
|                                               Azure|
+----------------------------------------------------+
|{ref=[As Tailwind Traders gets, started with Azure]}|
|{ref=first steps}                                   |
|{ref=will be to create}                             |
|{ref=at least one Azure subscription}               |
+----------------------------------------------------+

And want to transform in this way

+----------------------------------------------------+
|                                               Azure|
+----------------------------------------------------+
|As Tailwind Traders gets, started with Azure        |
|first steps                                         |
|will be to create                                   |
|at least one Azure subscription                     |
+----------------------------------------------------+

So I should extract data between "[]" and also the the rows with single element and put it back into the same or a new column using pyspark/python regex things to be removed - 'ref=',outer '{}'

Note - I tried using the regex_replace function but it is also replacing the the [],{} inside the required data

So how can I achieve this using regex in pyspark?

Answer 1

You can use the following pattern, putting \1 in the substitution string.

"{ref=\[?([,\w\s]+)\]?\}"gm

See https://regex101.com/r/OyFBkJ/1

Extract required data between characters from a string using regex in python or pyspark

Question

1 answers

solution1
0 ACCPTED

Extract required data between characters from a string using regex in python or pyspark

Question

1 answers

solution1 0 ACCPTED

solution1
0 ACCPTED