I am trying to clean a dataset and while doing so I came across a column named "production_companies" with about a 1000 values. This column contains unnecessary symbols for example: The column values are like this [{name: 'Pixar', id:"3}]
. I wish to remove the unnecessary symbols like: " {} [] , the text values "name" and "id" as well as the integers.
list1=[]
list1= data.production_companies
for i in list1:
re.sub('\d+','',list1)
The problem is that re.sub
does not accept list as a parameter. It only accepts a string as an input parameter. I need to use a list to store the production_companies
values and iterate through it using a for loop because there are many values in the column and I need to remove the symbols and unnecessary text from all of them at once.
Can anyone please tell me what should I do?
Thanks a lot
您可以使用列表推导从现有列表创建新列表。
list2 = [re.sub('\d+', '', item) for item in list1]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.