[英]How to match a text based on string from a list and extract the subsection in Python?
我正在嘗試從類似於以下示例的收益電話文本生成結構:
"Operator
Ladies and gentlemen, thank you for standing by. And welcome to XYZ Fourth Quarter 2019 Earning Conference Call. At this time, all participants are in a listen-only mode. After the speaker presentation, there will be a question-and-answer session. [Operator Instructions] Please be advised that today’s conference is being recorded. [Operator Instructions]
I would now like to hand the conference to your speaker today,Person1, Head of Investor Relations. Please go ahead, ma’am**
Person1
Hello everyone, blablablablabla. Now let's see what Person2 has to say.
Person2
Thank you and hello everyone. Blablablabla
Person3
I have no further remarks....thank you once again"
由此我生成了一個名為list1 = ['Person1','Person2','Person3']
。 我生成了一個空數據Person2
,其列名稱為Person1
、 Person2
和Person3
。 我現在必須根據列表中的值提取Person1
、 Person2
和Person3
下方的文本並填充數據框。 那可能嗎?
text="""OperatorLadies and gentlemen, thank you for standing by. And welcome to XYZ Fourth Quarter 2019 Earning Conference Call. At this time, all participants are in a listen-only mode. After the speaker presentation, there will be a question-and-answer session. [Operator Instructions] Please be advised that today’s conference is being recorded. [Operator Instructions]I would now like to hand the conference to your speaker today,Person1, Head of Investor Relations. Please go ahead, ma’am**Person1Hello everyone, blablablablabla. Now let's see what Person2 has to say.Person2Thank you and hello everyone. BlablablablaPerson3I have no further remarks....thank you once again"""
import re
say1=text.split('Person1')[2].split('Person2')[0] #getting text of person1
say2=text.split('Person2')[2].split('Person3')[0] #getting text of person2
say3=text.split('Person3')[1] #getting text of person3
#converting to a dataframe
pd.DataFrame({'Person1':say1,'Person2':say2,'Person3':say3},index=[1])
data_list = Data.split("\n")
People_Names = [name.strip() for name in People]
temp_data_list= data_list.copy()
data_dict = defaultdict(list)
isfirst=1
data_idx =0
for idx,line in enumerate(data_list):
if line in People_Names:
new_data_list = data_list[idx:]
break
while len(new_data_list)>0 :
while True:
if new_data_list[0] in People_Names:
key =new_data_list[0]
break
else:
data_dict[key]=data_dict[key]+[new_data_list[0]]
new_data_list.pop(0)
if len(new_data_list)==0:
break
if len(new_data_list)!=0:
new_data_list.pop(0)
df_dict = {}
for key,val in data_dict.items() :
df_dict[key] = "\n".join(val)
df = pa.DataFrame(columns = People_Names)
df = df.append(df_dict,ignore_index=True)
#print(df)
df.to_csv("People_Data.csv")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.