繁体   English   中英

在 dataframe 的字符串中查找 substring 的索引

[英]Find index of substring within string from a dataframe

我有一个 dataframe 有两列(和很多行),一列是完整序列,另一列contains a sub sequence.

我想找到子序列在完整序列中开始的索引,并将其添加为另一列:

我试过这个:

df["start"] = df.sequence.index(df.sub_sequence)

但这会返回: TypeError: 'RangeIndex' object is not callable

我究竟做错了什么?

这是我希望得到的df和df:

样品 dataframe:

import pandas as pd 

data = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"]}    
df = pd.DataFrame (data, columns = ['sequence','sub_sequence'])

  sequence sub_sequence
0    abcde          cde
1    fghij           gh
2    klmno           no

预期结果:

data2 = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"], "start": [2,1,3]}
df2 = pd.DataFrame (data2, columns = ['sequence','sub_sequence','start'])

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3

在列表理解中使用zipstr.index

df['start'] = [seq.index(sub) for seq, sub in zip(df['sequence'], df['sub_sequence'])]

或使用DataFrame.apply沿axis=1 +str.index

df['start'] = df[['sequence', 'sub_sequence']].apply(lambda s: str.index(*s), axis=1)

结果:

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM