[英]Find index of substring within string from a dataframe
I have a dataframe with two columns (and alot of rows), one column is the full sequence the other contains a sub sequence.
我有一个 dataframe 有两列(和很多行),一列是完整序列,另一列contains a sub sequence.
I want to find the index of where the sub sequence starts within the full sequence and add this as a another column:我想找到子序列在完整序列中开始的索引,并将其添加为另一列:
I have tried this:我试过这个:
df["start"] = df.sequence.index(df.sub_sequence)
But this returns: TypeError: 'RangeIndex' object is not callable
但这会返回: TypeError: 'RangeIndex' object is not callable
What am i doing wrong?我究竟做错了什么?
Heres the df and the df i wish to end up with:这是我希望得到的df和df:
Sample dataframe:样品 dataframe:
import pandas as pd
data = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"]}
df = pd.DataFrame (data, columns = ['sequence','sub_sequence'])
sequence sub_sequence
0 abcde cde
1 fghij gh
2 klmno no
Expected result:预期结果:
data2 = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"], "start": [2,1,3]}
df2 = pd.DataFrame (data2, columns = ['sequence','sub_sequence','start'])
sequence sub_sequence start
0 abcde cde 2
1 fghij gh 1
2 klmno no 3
Use zip
andstr.index
in a list comprehension:在列表理解中使用zip
和str.index
:
df['start'] = [seq.index(sub) for seq, sub in zip(df['sequence'], df['sub_sequence'])]
OR Use DataFrame.apply
along axis=1
+str.index
:或使用DataFrame.apply
沿axis=1
+str.index
:
df['start'] = df[['sequence', 'sub_sequence']].apply(lambda s: str.index(*s), axis=1)
Result:结果:
sequence sub_sequence start
0 abcde cde 2
1 fghij gh 1
2 klmno no 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.