I want to extract data between few characters from the string data present in the rows of a dataframe column. For example I have the data in the column like below:
+----------------------------------------------------+
| Azure|
+----------------------------------------------------+
|{ref=[As Tailwind Traders gets, started with Azure]}|
|{ref=first steps} |
|{ref=will be to create} |
|{ref=at least one Azure subscription} |
+----------------------------------------------------+
And want to transform in this way
+----------------------------------------------------+
| Azure|
+----------------------------------------------------+
|As Tailwind Traders gets, started with Azure |
|first steps |
|will be to create |
|at least one Azure subscription |
+----------------------------------------------------+
So I should extract data between "[]" and also the the rows with single element and put it back into the same or a new column using pyspark/python regex things to be removed - 'ref=',outer '{}'
Note - I tried using the regex_replace function but it is also replacing the the [],{} inside the required data
So how can I achieve this using regex in pyspark?
You can use the following pattern, putting \1 in the substitution string.
"{ref=\[?([,\w\s]+)\]?\}"gm
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.