I have two pandas dataframes, identical in columns. Each have a timestamp column. One dataframe has text data from user A and another dataframe has text data from user B. When user A is speaking, user B is not speaking, so the data never overlaps. I want to merge them together into one dataframe organized by timestamp.
df_a start stop words 0 2.1 i know honey but what happened we got a job 3.7 6.4 no know but thats a different kind of help but 8.2 11.5 because people that are supposed to be 12.9 15.4 yeah but where else can you go to get one df_b start stop words 2.2 3.6 but he never said 6.5 8.2 but what? 11.6 12.8 i dont think thats true 15.5 19.2 anywhere i dont know desired_output start stop words 0 2.1 i know honey but what happened we got a job 2.2 3.6 but he never said 3.7 6.4 no know but thats a different kind of help but 6.5 8.2 but what? 8.2 11.5 because people that are supposed to be 11.6 12.8 i dont think thats true 12.9 15.4 yeah but where else can you go to get one 15.5 19.2 anywhere i dont know
这应该做:
df = df_a.append(df_b).sort_values(by=['start'])
I would use pd.concat
given the operation feels more like concatenating rather than joining:
output = pd.concat([df_a,df_b]).sort_values(['start'])
print(output)
start stop words
0 0.0 2.1 i know honey but what happened we got a job
0 2.2 3.6 but he never said
1 3.7 6.4 no know but thats a different kind of help but
1 6.5 8.2 but what?
2 8.2 11.5 because people that are supposed to be
2 11.6 12.8 i dont think thats true
3 12.9 15.4 yeah but where else can you go to get one
3 15.5 19.2 anywhere i dont know
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.