如何在pandas数据框列上打印给定字符串的出现？

Question

I have the following dataframe. 我有以下数据帧。

import pandas as pd

data = [['Alexa',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
df

to check whether certain characters are present in Name column. 检查“名称”列中是否存在某些字符。

mylist=['a','e']
pattern = '|'.join(mylist)
df['contains']=df['Name'].str.contains(pattern)

Above code will give true or false if mylist values are present. 如果存在mylist值，则上面的代码将给出true或false。

How to get letters column in the output. 如何在输出中获取字母列。

    Name    Age contains  letters
0   Alexa   10  True      e a 
1   Bob     12  False     
2   Clarke  13  True      a e

Answer 1

You can use set intersection here, and a list comprehension, which will be faster than the pandas string methods: 你可以在这里使用set intersection和list comprehension，这将比pandas字符串方法更快：

check = set('ae')
df.assign(letters=[set(n.lower()) & check for n in df.Name])

     Name  Age letters
0   Alexa   10  {a, e}
1     Bob   12      {}
2  Clarke   13  {a, e}

The alternative would be something like: 替代方案将是这样的：

df.assign(letters=df.Name.str.findall(r'(?i)(a|e)'))

     Name  Age    letters
0   Alexa   10  [A, e, a]
1     Bob   12         []
2  Clarke   13     [a, e]

The second approach A) will include duplicates, and B), will be slower: 第二种方法A）将包括重复，而B）将更慢：

In [89]: df = pd.concat([df]*1000)

In [90]: %timeit df.Name.str.findall(r'(?i)(a|e)')
2.34 ms ± 93.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [91]: %timeit [set(n.lower()) & check for n in df.Name]
1.45 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

如何在pandas数据框列上打印给定字符串的出现？

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-10-11 04:27:02

如何在pandas数据框列上打印给定字符串的出现？

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-10-11 04:27:02

解决方案1
3 已采纳 2018-10-11 04:27:02