[英]Create new pandas row as a result of combination of text values from different rows which has same value in other pandas column
由於連接在其他列中具有相同值的文本值,我想創建一個新的 pandas 數據框。 例如,我得到了以下 dataframe:
example_dct = {
"text": {
"0": "this is my text 1",
"1": "this is my text 2",
"2": "this is my text 3",
"3": "this is my text 4",
"4": "this is my text 5"
},
"article_id": {
"0": "#0001_01_xml",
"1": "#0001_01_xml",
"2": "#0001_02_xml",
"3": "#0001_03_xml",
"4": "#0001_03_xml"
}
}
df_example = pd.DataFrame.from_dict(example_dct)
print(df_example)
text article_id
0 this is my text 1 #0001_01_xml
1 this is my text 2 #0001_01_xml
2 this is my text 3 #0001_02_xml
3 this is my text 4 #0001_03_xml
4 this is my text 5 #0001_03_xml
我想用以下方式連接: text1+'***' +text2
因此,在這種情況下 idx 0,1 應該連接起來,而 3, 4
因此,結果 dataframe 將是:
text article_id
0 'this is my text 1 *** this is my text 2' #0001_01_xml
1 'this is my text 4 *** this is my text 5' #0001_03_xml
如果有 >2 個文本值具有相同的 id 值,例如:
example_dct = {
"text": {
"0": "this is my text 1",
"1": "this is my text 2",
"2": "this is my text 3",
"3": "this is my text 4",
"4": "this is my text 5",
"5": "this is my text 6",
},
"article_id": {
"0": "#0001_01_xml",
"1": "#0001_01_xml",
"2": "#0001_02_xml",
"3": "#0001_03_xml",
"4": "#0001_03_xml",
"5": "#0001_03_xml",
}
}
那么 output dataframe 應該是 1 x 1 文本連接的結果:
text article_id
0 'this is my text 1 *** this is my text 2' #0001_01_xml
1 'this is my text 4 *** this is my text 5' #0001_03_xml
2 'this is my text 4 *** this is my text 6' #0001_03_xml
3 'this is my text 5 *** this is my text 6' #0001_03_xml
我一直在嘗試應用一些 groupby 查詢,將所有具有相同列值的文本連接起來,即df.groupby('article_id', sort=False)['text'].apply('***'.join)
創建只有一行,但我想如上所述創建 1by1 行
有什么想法可以采用這種方法嗎?
在article_id
上使用DataFrame.groupby
並使用自定義Series.explode
Series.dropna
在text
列中生成所有可能的length=2
字符串組合,最后使用 Series。
from itertools import combinations
f = lambda g: [*map(' *** '.join, combinations(g['text'], r=2))]
df = df.groupby('article_id').apply(f).explode().dropna().reset_index(name='text')
結果:
# example1
article_id text
0 #0001_01_xml this is my text 1 *** this is my text 2
1 #0001_03_xml this is my text 4 *** this is my text 5
# example 2
article_id text
0 #0001_01_xml this is my text 1 *** this is my text 2
1 #0001_03_xml this is my text 4 *** this is my text 5
2 #0001_03_xml this is my text 4 *** this is my text 6
3 #0001_03_xml this is my text 5 *** this is my text 6
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.