如何根据列表中的字符串匹配文本并在 Python 中提取小节？

Question

I am trying to generate the structure from an earnings call text which looks like the following sample:我正在尝试从类似于以下示例的收益电话文本生成结构：

"Operator

Ladies and gentlemen, thank you for standing by. And welcome to XYZ Fourth Quarter 2019 Earning Conference Call. At this time, all participants are in a listen-only mode. After the speaker presentation, there will be a question-and-answer session. [Operator Instructions] Please be advised that today’s conference is being recorded. [Operator Instructions]
I would now like to hand the conference to your speaker today,Person1, Head of Investor Relations. Please go ahead, ma’am**

Person1

Hello everyone, blablablablabla. Now let's see what Person2 has to say.

Person2

Thank you and hello everyone. Blablablabla

Person3

I have no further remarks....thank you once again"

From this I have generated a list called list1 = ['Person1','Person2','Person3'] .由此我生成了一个名为list1 = ['Person1','Person2','Person3'] 。 I have generated an empty dataframe which has column names as Person1 , Person2 and Person3 .我生成了一个空数据Person2 ，其列名称为Person1 、 Person2和Person3 。 I now have to extract the text below Person1 , Person2 and Person3 based on the values from list and fill in the dataframe.我现在必须根据列表中的值提取Person1 、 Person2和Person3下方的文本并填充数据框。 Is that possible?那可能吗？

Answer 1

text="""OperatorLadies and gentlemen, thank you for standing by. And welcome to XYZ Fourth Quarter 2019 Earning Conference Call. At this time, all participants are in a listen-only mode. After the speaker presentation, there will be a question-and-answer session. [Operator Instructions] Please be advised that today’s conference is being recorded. [Operator Instructions]I would now like to hand the conference to your speaker today,Person1, Head of Investor Relations. Please go ahead, ma’am**Person1Hello everyone, blablablablabla. Now let's see what Person2 has to say.Person2Thank you and hello everyone. BlablablablaPerson3I have no further remarks....thank you once again"""

import re
say1=text.split('Person1')[2].split('Person2')[0] #getting text of person1
say2=text.split('Person2')[2].split('Person3')[0] #getting text of person2
say3=text.split('Person3')[1] #getting text of person3

#converting to a dataframe
pd.DataFrame({'Person1':say1,'Person2':say2,'Person3':say3},index=[1])

Answer 2

data_list = Data.split("\n")
People_Names = [name.strip() for name in People]

temp_data_list= data_list.copy()
data_dict = defaultdict(list)
isfirst=1
data_idx =0
for idx,line in enumerate(data_list):
    if line in People_Names:
        new_data_list = data_list[idx:]
        break
while len(new_data_list)>0 :
    while True:
        if new_data_list[0] in People_Names:
            key =new_data_list[0]
            break
        else:
            data_dict[key]=data_dict[key]+[new_data_list[0]]
            new_data_list.pop(0)
        if len(new_data_list)==0:
            break
    if len(new_data_list)!=0:
        new_data_list.pop(0)

df_dict = {}
for key,val in data_dict.items() :
    df_dict[key] = "\n".join(val)

df = pa.DataFrame(columns = People_Names)
df = df.append(df_dict,ignore_index=True)
#print(df)
df.to_csv("People_Data.csv")

如何根据列表中的字符串匹配文本并在 Python 中提取小节？

问题描述

2 个解决方案

解决方案1
0 2020-02-21 18:47:38

解决方案2
0 已采纳 2020-02-28 05:37:19

如何根据列表中的字符串匹配文本并在 Python 中提取小节？

问题描述

2 个解决方案

解决方案1 0 2020-02-21 18:47:38

解决方案2 0 已采纳 2020-02-28 05:37:19

解决方案1
0 2020-02-21 18:47:38

解决方案2
0 已采纳 2020-02-28 05:37:19