使用从现有DataFrame获得的列表的输出来创建新的DataFrame

Question

使用.tolist（）函数，我创建了一个列表“ genrelist”。

genrelist = movies_1000.Genre.str.split().tolist()
print(genrelist)

粘贴我得到的输出：

[['Action,Crime,Drama'], ['Action,Adventure,Sci-Fi'], 
['Action,Biography,Drama'], ['Adventure,Drama,Sci-Fi'], 
['Animation,Drama,Fantasy'], ['Biography,Comedy,Drama'], 
['Drama,Music'], ['Drama,Mystery,Sci-Fi'], ['Crime,Drama,Thriller'], 
['Drama,Family,Music'], ['Action,Thriller'], ['Drama,Thriller'], 
['Animation,Adventure,Family'], ['Comedy,Drama'], 
['Animation,Drama,Romance']]

然后，我使用以下函数从此嵌套列表中获取唯一项。

genrecount = Counter()
for arr in genrelist:
    genrecount.update(arr[0].split(','))

print(genrecount)

粘贴我得到的输出：

Counter({'Drama': 12, 'Action': 4, 'Adventure': 3, 'Sci-Fi': 3, 
'Animation': 3, 'Thriller': 3, 'Crime': 2, 'Biography': 2, 'Comedy': 
2, 'Music': 2, 'Family': 2, 'Fantasy': 1, 'Mystery': 1, 'Romance': 
1})

我想使用上面获得的输出来创建一个新的DataFrame。 因此，我使用以下内容：

genre_df = pd.DataFrame(genrecount.items())

粘贴我得到的错误：-

ValueError: DataFrame constructor not properly called!

也尝试了不带.items的情况，如下所示：-

genre_df = pd.DataFrame(genrecount.items())

粘贴我得到的错误：-

ValueError: If using all scalar values, you must pass an index

所以我试图从上面获得的genrecount的输出中创建一个新的DataFrame。 请提供关于可用于获得所需输出的内容以及如何将这些列适当地标记为流派和计数的建议。 我相信genrecount输出中的Counter一词正在引起人们的注意。 但是不知道如何纠正它。

如果我的输入类型列表在字符串之间有一些空格，则还需要了解逻辑的变化，如下所示：

[['Action',' Crime','  Drama'], ['Action','  Adventure','Sci-Fi'], 
['  Action',' Biography','Drama'], ['Adventure','Drama',' Sci-Fi'], 
['Animation','Drama','Fantasy'], ['Biography',' Comedy',' Drama'], 
['Drama','   Music   '], ['Drama','Mystery','  Sci-Fi'], 
['Crime  ','Drama',' Thriller'], ['Drama', ' Family ' ,' Music'], 
['Action', 'Thriller'], ['Drama',' Thriller'], 
['Animation',' Adventure',' Family'], ['Comedy',' Drama'], 
['Animation',' Drama',' Romance']]

提前致谢..！

Answer 1

如我的评论所建议，您可以使用from_dict()函数，因为Counter是from_dict()的子类。 这是一个完整的示例：

import pandas as pd
from collections import Counter

genrecount = Counter({'Drama': 12, 'Action': 4, 'Adventure': 3, 'Sci-Fi': 3, 'Animation': 3, 'Thriller': 3, 'Crime': 2, 'Biography': 2, 'Comedy': 2, 'Music': 2, 'Family': 2, 'Fantasy': 1, 'Mystery': 1, 'Romance': 1})

genre_df = pd.DataFrame.from_dict(genrecount.items())
genre_df.columns = ["genre", "count"]

print genre_df

输出：

        genre  count
0     Mystery      1
1     Romance      1
2      Sci-Fi      3
3      Family      2
4   Biography      2
5       Crime      2
6       Drama     12
7     Fantasy      1
8   Animation      3
9       Music      2
10  Adventure      3
11     Action      4
12     Comedy      2
13   Thriller      3

使用从现有DataFrame获得的列表的输出来创建新的DataFrame

问题描述

1 个解决方案

解决方案1
1 2019-02-04 09:04:19

使用从现有DataFrame获得的列表的输出来创建新的DataFrame

问题描述

1 个解决方案

解决方案1 1 2019-02-04 09:04:19

解决方案1
1 2019-02-04 09:04:19