简体   繁体   English

对 Python 字符串数组使用“Findall”操作

[英]Use “Findall” operation on Python array of strings

I have a 1m+ row dataset, and each row has a combination of lower/uppercase letters, symbols and numbers.我有一个 1m+ 行的数据集,每一行都有小写/大写字母、符号和数字的组合。 I am looking to clean this data and only keep the last instance of where a lowercase letter and number are beside each other.我希望清理这些数据,只保留小写字母和数字并排的最后一个实例。 For speed efficiency, my current plan was to have this data as an array of strings and then use the .findall operation to keep the letter/number combo I'm looking for.为了提高速度效率,我目前的计划是将这些数据作为一个字符串数组,然后使用 .findall 操作来保留我正在寻找的字母/数字组合。

Here is something along the lines of what I am trying to do:这是我正在尝试做的事情:

Input输入

list = Array(["Nd4","0-0","Nxe4","e8+","e4g2"])

newList = list.findall('[a-z]\d')[len(list.findall('[a-z]\d')-1]

Expected Output from newList newList 的预期输出

newList = ("d4","","e4","e8","g2")

It is not recommend to use "list" to assign a variable since it a built-in function不建议使用“list”来分配变量,因为它是一个内置函数

import re
import numpy as np

lists = np.array(["Nd4","0-0","Nxe4","e8+","e4g2"])

def findall(i,pattern=r'[a-z1-9]+'):
    return re.findall(pattern,i)[0] if re.findall(pattern,i) else ""

newList = [findall(i) for i in lists]
# OR if you want to return an array 
newList = np.array(list(map(findall,lists)))

# >>> ['d4', '', 'xe4', 'e8', 'e4g2']

This may not be the prettiest way, but I think it gets the job done!这可能不是最漂亮的方式,但我认为它可以完成工作!

import re
import numpy as np

lists = np.array(["Nd4","0-0","Nxe4","e8+","e4g2"])

def function(i):
    try:
        return re.findall(r'[a-z]\d',i)[len(re.findall(r'[a-z]\d',i))-1]
    except:
        return ""

newList = [function(i) for i in lists]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM