[英]How to print occurence of given string on pandas data frame column?
I have the following dataframe. 我有以下数据帧。
import pandas as pd
data = [['Alexa',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
df
to check whether certain characters are present in Name column. 检查“名称”列中是否存在某些字符。
mylist=['a','e']
pattern = '|'.join(mylist)
df['contains']=df['Name'].str.contains(pattern)
Above code will give true or false if mylist values are present. 如果存在mylist值,则上面的代码将给出true或false。
How to get letters column in the output. 如何在输出中获取字母列。
Name Age contains letters
0 Alexa 10 True e a
1 Bob 12 False
2 Clarke 13 True a e
You can use set
intersection here, and a list comprehension, which will be faster than the pandas
string methods: 你可以在这里使用
set
intersection和list comprehension,这将比pandas
字符串方法更快:
check = set('ae')
df.assign(letters=[set(n.lower()) & check for n in df.Name])
Name Age letters
0 Alexa 10 {a, e}
1 Bob 12 {}
2 Clarke 13 {a, e}
The alternative would be something like: 替代方案将是这样的:
df.assign(letters=df.Name.str.findall(r'(?i)(a|e)'))
Name Age letters
0 Alexa 10 [A, e, a]
1 Bob 12 []
2 Clarke 13 [a, e]
The second approach A) will include duplicates, and B), will be slower: 第二种方法A)将包括重复,而B)将更慢:
In [89]: df = pd.concat([df]*1000)
In [90]: %timeit df.Name.str.findall(r'(?i)(a|e)')
2.34 ms ± 93.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [91]: %timeit [set(n.lower()) & check for n in df.Name]
1.45 ms ± 23.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.