簡體   English   中英

Groupby多行並在熊貓中保留兩列

[英]Groupby multiple row and keep two columns in pandas

我有格式name, actor_link, movie_link的數據框,我想通過演員鏈接將其分組到格式name, actor_link, list_of_movies_where_actor_appears

示例輸入

Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,https://www.csfd.cz/film/10135-forrest-gump/
Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,https://www.csfd.cz/film/2292-zelena-mile/

示例輸出

Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,[https://www.csfd.cz/film/10135-forrest-gump/,https://www.csfd.cz/film/2292-zelena-mile/]

我的代碼給出了輸出

Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,"['Tom Hanks', 'https://www.csfd.cz/film/10135-forrest-gump/', 'Tom Hanks', 'https://www.csfd.cz/film/2292-zelena-mile/']

代碼

original_actor_df = pd.read_csv('actors.csv')
actor_movies_df = original_actor_df.set_index("actor_link").stack().groupby(level=0).apply(list).reset_index(name="movies")
original_actor_df.drop(['movie_link'], axis=1, inplace=True)
original_actor_df.drop_duplicates(inplace=True)
actor_df = pd.merge(original_actor_df, actor_movies_df, on="actor_link")

如何擺脫帶有電影鏈接的列中的演員姓名? 我可以在沒有那么多步驟的情況下使用數據框簡化這個過程嗎? 我認為可以通過調整

original_actor_df.set_index("actor_link").stack().groupby(level=0).apply(list).reset_index(name="movies")

我試過了,我失敗了……

嘿,捷克程序員! <3 這可能會完成工作,我認為......

import pandas as pd

df = pd.read_csv('MOCK_DATA.csv')

# group by actor_link and count the number of movies
g = df.groupby(['actor','actor_link'])['movie'].apply(list).reset_index(name='list_of_movies')
# output res to a csv file
g.to_csv('actor_movies.csv', header=False, index=False)

對此文件進行了測試:

actor,actor_link,movie
Yardley McGilroy,www.csfd/Yardley McGilroy.cz,Sharknado 2: The Second One
Woodman Meese,www.csfd/Woodman Meese.cz,The Amazing Catfish
Marijo Thorburn,www.csfd/Marijo Thorburn.cz,"Soldier, The"
Arron Rosenfield,www.csfd/Arron Rosenfield.cz,Barcelona
Olga Wainscot,www.csfd/Olga Wainscot.cz,Addicted
Bradan Ivashev,www.csfd/Bradan Ivashev.cz,Fury
Lu Manjin,www.csfd/Lu Manjin.cz,"Sea Inside, The (Mar adentro)"
Shelby Kitt,www.csfd/Shelby Kitt.cz,"Set-Up, The"
Barbaraanne Yakushkin,www.csfd/Barbaraanne Yakushkin.cz,"Princess Blade, The (Shura Yukihime)"
Abbey Munkton,www.csfd/Abbey Munkton.cz,Twin Peaks: Fire Walk with Me
Vittoria Clayal,www.csfd/Vittoria Clayal.cz,"Cider House Rules, The"
Vickie Ormrod,www.csfd/Vickie Ormrod.cz,Hustler White
Artemas Solomonides,www.csfd/Artemas Solomonides.cz,Satyricon
Lucita Whittick,www.csfd/Lucita Whittick.cz,"Exterminator, The"
Cullen Kear,www.csfd/Cullen Kear.cz,Non-Stop
Tine Slaney,www.csfd/Tine Slaney.cz,Measuring the World (Die Vermessung der Welt)
Allister Caulcott,www.csfd/Allister Caulcott.cz,Beat the Devil
Dannie Sheara,www.csfd/Dannie Sheara.cz,Casanova's Big Night
Montague Casetti,www.csfd/Montague Casetti.cz,Someone Like Him (Einer wie Bruno)
Dara French,www.csfd/Dara French.cz,Gozu (Gokudô kyôfu dai-gekijô: Gozu)
Debby Winterson,www.csfd/Debby Winterson.cz,Dr. Jekyll and Mr. Hyde
Phaedra Eneas,www.csfd/Phaedra Eneas.cz,Edges of the Lord
Obidiah Bastiman,www.csfd/Obidiah Bastiman.cz,Rendezvous
Lindy Lilbourne,www.csfd/Lindy Lilbourne.cz,Night Moves
Devon Obert,www.csfd/Devon Obert.cz,Rabbit Without Ears 2 (Zweiohrküken)
Conrade Urrey,www.csfd/Conrade Urrey.cz,Heavens Fall
Jaine Chasson,www.csfd/Jaine Chasson.cz,Jimi Hendrix: Hear My Train A Comin'
Filberte Southerton,www.csfd/Filberte Southerton.cz,"Capture of Bigfoot, The"
Ulric Hargitt,www.csfd/Ulric Hargitt.cz,Pilgrimage
Gal Pavia,www.csfd/Gal Pavia.cz,Big Hero 6
Niels Dannell,www.csfd/Niels Dannell.cz,More Than a Game
Kari Jobe,www.csfd/Kari Jobe.cz,Jimi: All Is by My Side
Tonia Hatton,www.csfd/Tonia Hatton.cz,Star Trek: Nemesis
Rozele Kaas,www.csfd/Rozele Kaas.cz,Captive (Cautiva) 
Urson Bourdel,www.csfd/Urson Bourdel.cz,RKO 281
Teddi Mohammed,www.csfd/Teddi Mohammed.cz,Canvas
Blair Mosedale,www.csfd/Blair Mosedale.cz,Escape from Fort Bravo
Cleon Sloley,www.csfd/Cleon Sloley.cz,Survival Quest
Yasmin Snap,www.csfd/Yasmin Snap.cz,Strictly Sexual
Audy Rubinfeld,www.csfd/Audy Rubinfeld.cz,Queen of Montreuil
Shepperd Matusiak,www.csfd/Shepperd Matusiak.cz,"Dark Side of the Heart, The (Lado oscuro del corazón, El)"
Storm Harrowing,www.csfd/Storm Harrowing.cz,"Artist, The"
Vlad Geare,www.csfd/Vlad Geare.cz,Pleasure at Her Majesty's
Stacey Kiff,www.csfd/Stacey Kiff.cz,Marilyn in Manhattan
Darla Dongall,www.csfd/Darla Dongall.cz,Hometown Legend
Nathan Lythgoe,www.csfd/Nathan Lythgoe.cz,Tales of Terror
Krishnah Bernet,www.csfd/Krishnah Bernet.cz,Circus of Horrors
Elnore Haggett,www.csfd/Elnore Haggett.cz,"Thing About My Folks, The"

無法擺脫引號,因為這會破壞 csv 解析,希望這不是太大的問題......

我認為您可以使用 group by 聚合獲得所需的輸出,如下所示。

import pandas as pd
import io   

string = """name,actor_link,movie_link
Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,https://www.csfd.cz/film/10135-forrest-gump/
Tom Hanks,https://www.csfd.cz/tvurce/330-tom-hanks/,https://www.csfd.cz/film/2292-zelena-mile/"""

df = pd.read_csv(io.StringIO(string), sep=",")
df = df.groupby(['name','actor_link']).agg(list_of_movies=('movie_link',list)).reset_index()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM