将 panda dataframe 值组转换为多个列表

Question

I have pandas dataframe, where I listed items, and categorised them:我有 pandas dataframe，我在其中列出了项目，并对它们进行了分类：

col_name    |col_group
-------------------------
id          | Metadata
listing_url | Metadata
scrape_id   | Metadata
name        | Text
summary     | Text
space       | Text

To reproduce:重现：

import pandas

df = pandas.DataFrame([
    ['id','metadata'],
    ['listing_url','metadata'],
    ['scrape_id','metadata'],
    ['name','Text'],
    ['summary','Text'],
    ['space','Text']],
    columns=['col_name', 'col_group'])

Can you suggest how I can convert this dataframe to multiple lists based on "col_group":你能建议我如何将这个 dataframe 转换为基于“col_group”的多个列表：

Metadata = ['id','listing_url','scraping_id]
Text = ['name','summary','space']

This is to allow me to pass these lists of columns to panda and drop columns.这是为了允许我将这些列列表传递给 panda 并删除列。

I googled a lot and got stuck: all answers are about converting lists to df, not vice versa.我用谷歌搜索了很多并卡住了：所有答案都是关于将列表转换为 df，反之亦然。 Should I aim to convert into dictionary, or list of lists?我的目标应该是转换成字典还是列表？

I have over 100 rows, belonging to 10 categories, so would like to avoid manual hard-coding.我有超过 100 行，属于 10 个类别，所以想避免手动硬编码。

Answer 1

Like this:像这样：

In [245]: res = df.groupby('col_group', as_index=False)['Col_name'].apply(list)

In [248]: res.tolist()                                                                                                                                                                                      
Out[248]: [['id', 'listing_url', 'scrape_id'], ['name', 'summary', 'space']]

Answer 2

I've try this code:我试过这段代码：

import pandas

df = pandas.DataFrame([
    [1, 'url_a', 'scrap_a', 'name_a', 'summary_a', 'space_a'],
    [2, 'url_b', 'scrap_b', 'name_b', 'summary_b', 'space_b'],
    [3, 'url_c', 'scrap_c', 'name_c', 'summary_c', 'space_ac']],
    columns=['id', 'listing_url', 'scrape_id', 'name', 'summary', 'space'])
print(df)

for row in df.iterrows():
    print(row[1].to_list())

which give this answer:给出了这个答案：

[1, 'url_a', 'scrap_a', 'name_a', 'summary_a', 'space_a']
[2, 'url_b', 'scrap_b', 'name_b', 'summary_b', 'space_b']
[3, 'url_c', 'scrap_c', 'name_c', 'summary_c', 'space_ac']

You can use您可以使用

for row in df[['name', 'summary', 'space']].iterrows():

to only iter over specific columns.仅迭代特定列。

Answer 3

my_vars = df.groupby('col_group').agg(list)['col_name'].to_dict()

Output: Output：

>>> my_vars
{'Text': ['name', 'summary', 'space'], 'metadata': ['id', 'listing_url', 'scrape_id']}

The recommended usage would be just my_vars['Text'] to access the Text , and etc. If you must have this as distinct names you can force it upon your target scope, eg globals :推荐的用法只是my_vars['Text']来访问Text等。如果您必须将其作为不同的名称，您可以在目标 scope 上强制使用它，例如globals ：

globals().update(df.groupby('col_group').agg(list)['col_name'].to_dict())

Result:结果：

>>> Text
['name', 'summary', 'space']
>>> metadata
['id', 'listing_url', 'scrape_id']

However I would advise against that as you might unwittingly overwrite some of your other objects, or they might not be in the proper scope you needed (eg locals ).但是，我建议您不要这样做，因为您可能会不知不觉地覆盖您的其他一些对象，或者它们可能不在您需要的正确 scope 中（例如locals ）。

将 panda dataframe 值组转换为多个列表

问题描述

3 个解决方案

解决方案1
2 2020-05-08 19:19:51

解决方案2
1 2020-05-08 19:26:32

解决方案3
1 已采纳 2020-05-08 20:11:22

将 panda dataframe 值组转换为多个列表

问题描述

3 个解决方案

解决方案1 2 2020-05-08 19:19:51

解决方案2 1 2020-05-08 19:26:32

解决方案3 1 已采纳 2020-05-08 20:11:22

解决方案1
2 2020-05-08 19:19:51

解决方案2
1 2020-05-08 19:26:32

解决方案3
1 已采纳 2020-05-08 20:11:22