I have a 1m+ row dataset, and each row has a combination of lower/uppercase letters, symbols and numbers. I am looking to clean this data and only keep the last instance of where a lowercase letter and number are beside each other. For speed efficiency, my current plan was to have this data as an array of strings and then use the .findall operation to keep the letter/number combo I'm looking for.
Here is something along the lines of what I am trying to do:
Input
list = Array(["Nd4","0-0","Nxe4","e8+","e4g2"])
newList = list.findall('[a-z]\d')[len(list.findall('[a-z]\d')-1]
Expected Output from newList
newList = ("d4","","e4","e8","g2")
It is not recommend to use "list" to assign a variable since it a built-in function
import re
import numpy as np
lists = np.array(["Nd4","0-0","Nxe4","e8+","e4g2"])
def findall(i,pattern=r'[a-z1-9]+'):
return re.findall(pattern,i)[0] if re.findall(pattern,i) else ""
newList = [findall(i) for i in lists]
# OR if you want to return an array
newList = np.array(list(map(findall,lists)))
# >>> ['d4', '', 'xe4', 'e8', 'e4g2']
This may not be the prettiest way, but I think it gets the job done!
import re
import numpy as np
lists = np.array(["Nd4","0-0","Nxe4","e8+","e4g2"])
def function(i):
try:
return re.findall(r'[a-z]\d',i)[len(re.findall(r'[a-z]\d',i))-1]
except:
return ""
newList = [function(i) for i in lists]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.