[英]Merging rows in a dataframe depending on another column
我已經將pdf提取到數據框中,並且如果B列是同一說話者,則希望合並行:
來自:
Index Column B Column C
1 'I am going' Speaker A
2 'to the zoo' Speaker A
3 'I am going' Speaker B
4 'home ' Speaker B
5 'I am going' Speaker A
6 'to the park' Speaker A
至 :
Index Column B Column C
1 'I am going to the zoo ' Speaker A
2 'I am going home' Speaker B
3 'I am going to the park' Speaker A
我嘗試使用groupby,但是順序在pdf(即語音)的上下文中很重要。
創建標識C列何時更改的系列后,可以使用GroupBy
+ agg
:
res = df.assign(key=df['Column C'].ne(df['Column C'].shift()).cumsum())\
.groupby('key').agg({'Column C': 'first', 'Column B': ' '.join})\
.reset_index()
print(res)
key Column C Column B
0 1 Speaker A 'I am going' 'to the zoo'
1 2 Speaker B 'I am going' 'home '
2 3 Speaker A 'I am going' 'to the park'
請注意,根據您提供的輸入,輸出帶有引號。 這些不會顯示字符串是否定義為不帶引號。
使用groupby
和agg
,如下所示:
import pandas as pd
from functools import reduce
data = {'col1': [1,1,2,2,3], 'col2': ['foo', 'bar', 'baz', 'bag', 'bat']}
df = pd.DataFrame(data)
print(df)
aggregated = df.groupby('col1').agg(lambda x: reduce(lambda s1, s2: s1 + s2, x))
print(aggregated)
將產生以下輸出:
col1 col2
0 1 foo
1 1 bar
2 2 baz
3 2 bag
4 3 bat
col2
col1
1 foobar
2 bazbag
3 bat
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.