简体   繁体   中英

How to remove special characters including commas, quotes from a column string in apache beam (Google cloud dataflow)

I have few records in my CSV which contain special characters. Consider an example for employee data in CSV. columns id,name,designation, address, salary 1001, Peter Occon, Manager, "42, Willis Way St, Waterloo, Ohio, US", 5000 and so on...

As you can see, I need to remove the commas and quotes present in the 'address' column in the apache beam.

This was acheived using this -

beam.Regex.replace_all(r'"([^"]*)"',lambda x:x.group(1).replace(',',''))

NOTE - this should be written before 'split' function in the pipeline.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM