[英]How do I write a function that takes one row and returns a list of 2-dimension tuples
So I am working on this dataset.所以我正在研究这个数据集。
I wanted to take one row and returns with 2-dimension tuples.我想取一行并返回二维元组。 For example, for row 0, it returns: [('Action', 7.9), ('Adventure', 7.9), ('Fantasy', 7.9), ('Sci-Fi', 7.9)].
例如,对于第 0 行,它返回:[('Action', 7.9), ('Adventure', 7.9), ('Fantasy', 7.9), ('Sci-Fi', 7.9)]。 So that every genre from the movie will be the same imdb score.
这样电影中的每种类型都将是相同的 imdb 分数。
This is from a school project and I can't think of a way that this could be done.这是来自一个学校项目,我想不出办法可以做到这一点。 Can anyone help me?
谁能帮我?
Im sorry, for the lack of details in this question, I will try to lay out all the details now.对不起,由于这个问题缺乏细节,我现在将尝试列出所有细节。
The dataset is movie_metadata.csv.数据集是movie_metadata.csv。 I cant seem to attach the file here.
我似乎无法在此处附加文件。
After i got the function I am supposed to apply the function to all the rows until i have a one list containing all 2-dimensional tuples.在我得到 function 之后,我应该将 function 应用于所有行,直到我有一个包含所有二维元组的列表。 Then i would have to convert the list of tuples into a dataframe.
然后我必须将元组列表转换为 dataframe。 Ideally, I want to create a new data set named 'genre_score' that has two columns: genre, and imdb_score.
理想情况下,我想创建一个名为“genre_score”的新数据集,它有两列:genre 和 imdb_score。 Each row will have only one genre and the IMDB rating of the movie from that genre.Then i would have to calculate the mean IMDB rating per genre and make the following graph.
每行将只有一个流派和该流派的电影的 IMDB 评级。然后我必须计算每个流派的平均 IMDB 评级并制作下图。
I can probably figure something out with everything else except the function.除了 function 之外,我可能可以用其他所有东西来解决问题。 Writing the function is the struggle for me.
编写 function 对我来说是一场斗争。
Use list comprehension with flatten values splitted by |
使用列表推导和由
|
分割的展平值: :
df = pd.DataFrame({'genres':['Action|Adventure|Fantasy|Sci-Fi','Action|Adventure|Fantasy'],
'imdb_score':[7.9,7.1]})
print (df)
genres imdb_score
0 Action|Adventure|Fantasy|Sci-Fi 7.9
1 Action|Adventure|Fantasy 7.1
row = 0
L = [(x, i) for g,i in df.loc[[row], ['genres','imdb_score']].values for x in g.split('|')]
print (L)
[('Action', 7.9), ('Adventure', 7.9), ('Fantasy', 7.9), ('Sci-Fi', 7.9)]
EDIT: Use Series.str.get_dummies
for indicator columns, multiple by DataFrame.mul
, replace 0
to missing values and get mean
s, last convert Series
to DataFrame
by Series.rename_axis
and Series.reset_index
:编辑:对指标列使用
Series.str.get_dummies
,乘以DataFrame.mul
,将0
替换为缺失值并获得mean
,最后通过Series.rename_axis
和Series.reset_index
将Series
转换为DataFrame
:
df1 = (df['genres'].str.get_dummies()
.replace(0, np.nan)
.mul(df['imdb_score'], axis=0)
.mean()
.rename_axis('genres')
.reset_index(name='imdb_score'))
print (df1)
genres imdb_score
0 Action 7.5
1 Adventure 7.5
2 Fantasy 7.5
3 Sci-Fi 7.9
Another solution is use Series.str.split
for lists and DataFrame.explode
, last aggregate mean
:另一种解决方案是使用
Series.str.split
列表和DataFrame.explode
,最后一个聚合mean
:
df1 = (df.assign(genres=df['genres'].str.split('|'))
.explode('genres')
.groupby('genres', as_index=False)['imdb_score']
.mean())
print (df1)
genres imdb_score
0 Action 7.5
1 Adventure 7.5
2 Fantasy 7.5
3 Sci-Fi 7.9
Try this:尝试这个:
array = [ (col,val) for col,val in dataframe.iloc[row_num].items() ]
print(array)
You can use Dictionary inside a Dictionary您可以在字典中使用字典
dataset = {'R1':{'C1':'V1','C2':'V2','C3':'V3'},
'R2':{'C1':'V1','C2':'V2','C3':'V3'},
'R3':{'C1':'V1','C2':'V2','C3':'V3'}
}
U can make ur function like this你可以像这样制作你的 function
def myFunction(row):
row += 1
// Your list
mylist = [
// first row
[
('genres', 'Action|Adventure|Fantasy|Sci-Fi'),
('num_user_for_reviews', 3054.0)],
],
// second row
[
('genres', 'Action|Adventure|Fantasy'),
('num_user_for_reviews', 1238.0)]
]
return myList[row]
Then call the function and fill with row u want然后调用 function 并填写你想要的行
// return firstrow
muFunction(1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.