在 dataframe 的字符串中查找 substring 的索引

Question

I have a dataframe with two columns (and alot of rows), one column is the full sequence the other contains a sub sequence.我有一个 dataframe 有两列（和很多行），一列是完整序列，另一列contains a sub sequence.

I want to find the index of where the sub sequence starts within the full sequence and add this as a another column:我想找到子序列在完整序列中开始的索引，并将其添加为另一列：

I have tried this:我试过这个：

df["start"] = df.sequence.index(df.sub_sequence)

But this returns: TypeError: 'RangeIndex' object is not callable但这会返回： TypeError: 'RangeIndex' object is not callable

What am i doing wrong?我究竟做错了什么？

Heres the df and the df i wish to end up with:这是我希望得到的df和df：

Sample dataframe:样品 dataframe：

import pandas as pd 

data = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"]}    
df = pd.DataFrame (data, columns = ['sequence','sub_sequence'])

  sequence sub_sequence
0    abcde          cde
1    fghij           gh
2    klmno           no

Expected result:预期结果：

data2 = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"], "start": [2,1,3]}
df2 = pd.DataFrame (data2, columns = ['sequence','sub_sequence','start'])

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3

Answer 1

Use zip andstr.index in a list comprehension:在列表理解中使用zip和str.index ：

df['start'] = [seq.index(sub) for seq, sub in zip(df['sequence'], df['sub_sequence'])]

OR Use DataFrame.apply along axis=1 +str.index :或使用DataFrame.apply沿axis=1 +str.index ：

df['start'] = df[['sequence', 'sub_sequence']].apply(lambda s: str.index(*s), axis=1)

Result:结果：

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3

在 dataframe 的字符串中查找 substring 的索引

问题描述

1 个解决方案

解决方案1
4 已采纳 2020-07-13 13:40:26

在 dataframe 的字符串中查找 substring 的索引

问题描述

1 个解决方案

解决方案1 4 已采纳 2020-07-13 13:40:26

解决方案1
4 已采纳 2020-07-13 13:40:26