简体   繁体   中英

Use “Findall” operation on Python array of strings

I have a 1m+ row dataset, and each row has a combination of lower/uppercase letters, symbols and numbers. I am looking to clean this data and only keep the last instance of where a lowercase letter and number are beside each other. For speed efficiency, my current plan was to have this data as an array of strings and then use the .findall operation to keep the letter/number combo I'm looking for.

Here is something along the lines of what I am trying to do:

Input

list = Array(["Nd4","0-0","Nxe4","e8+","e4g2"])

newList = list.findall('[a-z]\d')[len(list.findall('[a-z]\d')-1]

Expected Output from newList

newList = ("d4","","e4","e8","g2")

It is not recommend to use "list" to assign a variable since it a built-in function

import re
import numpy as np

lists = np.array(["Nd4","0-0","Nxe4","e8+","e4g2"])

def findall(i,pattern=r'[a-z1-9]+'):
    return re.findall(pattern,i)[0] if re.findall(pattern,i) else ""

newList = [findall(i) for i in lists]
# OR if you want to return an array 
newList = np.array(list(map(findall,lists)))

# >>> ['d4', '', 'xe4', 'e8', 'e4g2']

This may not be the prettiest way, but I think it gets the job done!

import re
import numpy as np

lists = np.array(["Nd4","0-0","Nxe4","e8+","e4g2"])

def function(i):
    try:
        return re.findall(r'[a-z]\d',i)[len(re.findall(r'[a-z]\d',i))-1]
    except:
        return ""

newList = [function(i) for i in lists]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM