[英]Splitting Bot Records from Chatter Records
I have raw chat bot transcripts, and before doing any sentiment analysis, I would like to separate Bot records from Chatter records. 我有原始的聊天机器人成绩单,在进行任何情绪分析之前,我想将Bot记录与Chatter记录分开。
Data is already in a dataframe, and looks like the following: 数据已经在数据框中,如下所示:
Conversation_ID | Transcript
abcdef | BOT: Some text. CHATTER: Some text. BOT: Some text. BOT: Some text. CHATTER: Some text. BOT: Some text. BOT: Some text.
The result should look like: 结果应如下所示:
Conversation_ID | Transcript_BOT | Transcript_CHATTER
abcdef | Some text. Some text. Some text. Some text. Some text. | Some text. Some text.
If i understand correctly, 如果我理解正确,
df = pd.read_clipboard(sep='|')
df['Transcript'] = df['Conversation_ID'].str.split(':',expand=True)[1] # split by delim.
df['Conversation_ID'] = df['Conversation_ID'].str.split(':',expand=True)[0]
print(df)
Conversation_ID Transcript
0 abcdef None
1 BOT Some text.
2 CHATTER Some text.
3 BOT Some text.
4 BOT Some text.
5 CHATTER Some text.
6 BOT Some text.
7 BOT Some text.
and for your intended result : 并为您的预期结果:
new_df = (pd.crosstab(df['Conversation_ID'].iloc[0],
df['Conversation_ID'].iloc[1:],
values=df['Transcript']
,aggfunc='sum')).rename_axis('')
print(new_df)
Conversation_ID BOT CHATTER
abcdef Some text. Some text. Some text. Some text. S... Some text. Some text.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.