[英]How can I create a pandas data frame in a certain way?
I need to create a pandas dataframe that contains all of the required information where each row of the dataframe should be one track.我需要创建一个 pandas dataframe ,其中包含所有必需的信息,其中 dataframe 的每一行都应该是一个轨道。 I also need to sort the dataframe by popularity score, so that the most popular track is at the top and the least popular is at the bottom.
我还需要将 dataframe 按流行度分数排序,这样最受欢迎的曲目在顶部,最不受欢迎的曲目在底部。 I tried many ways but they did not work.
我尝试了很多方法,但没有奏效。 Your help is much appreciated.
非常感谢您的帮助。
I am sharing my nested dictionary.我正在分享我的嵌套字典。
{'Artist name': ['Paramore', 'Weezer', 'Lizzo'],
'Track name': (['Still into You',
"Ain't It Fun",
'Hard Times',
'Misery Business',
'The Only Exception',
'Ignorance',
'Rose-Colored Boy',
'Fake Happy',
"That's What You Get",
'Brick by Boring Brick'],
['Island In The Sun',
"Say It Ain't So",
'Buddy Holly',
'Beverly Hills',
'Africa',
'The End of the Game',
'Hash Pipe',
'Undone - The Sweater Song',
'My Name Is Jonas',
'Take On Me'],
['Truth Hurts',
'Good As Hell',
'Good As Hell (feat. Ariana Grande) - Remix',
'Juice',
'Boys',
'Tempo (feat. Missy Elliott)',
'Blame It on Your Love (feat. Lizzo)',
'Soulmate',
'Water Me',
'Like A Girl']),
'Release date': (['2013-04-05',
'2013-04-05',
'2017-05-12',
'2007-06-11',
'2009-09-28',
'2009-09-28',
'2017-05-12',
'2017-05-12',
'2007-06-11',
'2009-09-28'],
['2001-05-15',
'1994-05-10',
'1994-05-10',
'2005-05-10',
'2019-01-24',
'2019-09-10',
'2001-05-15',
'1994-05-10',
'1994-05-10',
'2019-01-24'],
['2019-05-03',
'2016-03-09',
'2019-10-25',
'2019-04-19',
'2019-04-18',
'2019-04-19',
'2019-09-13',
'2019-04-19',
'2019-04-18',
'2019-04-19']),
'Popularity score': ([76, 74, 73, 73, 72, 69, 66, 66, 65, 65],
[77, 75, 73, 71, 67, 67, 66, 65, 63, 62],
[94, 90, 86, 84, 72, 78, 68, 72, 58, 71])}
There are definitely more efficient ways, but here's a solution肯定有更有效的方法,但这里有一个解决方案
import pandas as pd
def gen_artist_frame(d):
categories = [c for c in d.keys()]
for idx, artist in enumerate(d['Artist name']):
artist_mat = [d[j][idx] for j in categories[1:]]
artist_frame = pd.DataFrame(artist_mat, index=categories[1:]).T
artist_frame[categories[0]] = artist
yield artist_frame
def collapse_nested_artist(d):
return pd.concat([
a for a in gen_artist_frame(d)
])
d = {'Artist name': ['Paramore', 'Weezer', 'Lizzo'],
'Track name': (['Still into You',
"Ain't It Fun",
'Hard Times',
'Misery Business',
'The Only Exception',
'Ignorance',
'Rose-Colored Boy',
'Fake Happy',
"That's What You Get",
'Brick by Boring Brick'],
['Island In The Sun',
"Say It Ain't So",
'Buddy Holly',
'Beverly Hills',
'Africa',
'The End of the Game',
'Hash Pipe',
'Undone - The Sweater Song',
'My Name Is Jonas',
'Take On Me'],
['Truth Hurts',
'Good As Hell',
'Good As Hell (feat. Ariana Grande) - Remix',
'Juice',
'Boys',
'Tempo (feat. Missy Elliott)',
'Blame It on Your Love (feat. Lizzo)',
'Soulmate',
'Water Me',
'Like A Girl']),
'Release date': (['2013-04-05',
'2013-04-05',
'2017-05-12',
'2007-06-11',
'2009-09-28',
'2009-09-28',
'2017-05-12',
'2017-05-12',
'2007-06-11',
'2009-09-28'],
['2001-05-15',
'1994-05-10',
'1994-05-10',
'2005-05-10',
'2019-01-24',
'2019-09-10',
'2001-05-15',
'1994-05-10',
'1994-05-10',
'2019-01-24'],
['2019-05-03',
'2016-03-09',
'2019-10-25',
'2019-04-19',
'2019-04-18',
'2019-04-19',
'2019-09-13',
'2019-04-19',
'2019-04-18',
'2019-04-19']),
'Popularity score': ([76, 74, 73, 73, 72, 69, 66, 66, 65, 65],
[77, 75, 73, 71, 67, 67, 66, 65, 63, 62],
[94, 90, 86, 84, 72, 78, 68, 72, 58, 71])}
frame = collapse_nested_artist(d)
Dictionaries as dataframes are easier to handle if all the values in the key value pairings are the same size, and can make it more straightforward.如果键值对中的所有值都具有相同的大小,则字典作为数据帧更容易处理,并且可以使其更直接。 If possible, I would reformat your dictionary slightly.
如果可能的话,我会稍微重新格式化你的字典。 For example, nest each column into the artist to avoid assumptions about positions:
例如,将每一列嵌套到艺术家中以避免对位置的假设:
ex = {'foo':{'title':[1,2],'letter':['a','b']},
'bar':{'title':[3,4],'letter':['c','d']},
'fob':{'title':[5,6],'letter':['e','f']},
}
df = []
for key, value in ex.items():
minidf = pd.DataFrame(value)
minidf['label'] = key
df.append(minidf)
pd.concat(df, ignore_index=True)
will return将返回
title letter label
0 1 a foo
1 2 b foo
2 3 c bar
3 4 d bar
4 5 e fob
5 6 f fob
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.