[英]Convert panda dataframe group of values to multiple lists
I have pandas dataframe, where I listed items, and categorised them:我有 pandas dataframe,我在其中列出了项目,并对它们进行了分类:
col_name |col_group
-------------------------
id | Metadata
listing_url | Metadata
scrape_id | Metadata
name | Text
summary | Text
space | Text
To reproduce:重现:
import pandas
df = pandas.DataFrame([
['id','metadata'],
['listing_url','metadata'],
['scrape_id','metadata'],
['name','Text'],
['summary','Text'],
['space','Text']],
columns=['col_name', 'col_group'])
Can you suggest how I can convert this dataframe to multiple lists based on "col_group":你能建议我如何将这个 dataframe 转换为基于“col_group”的多个列表:
Metadata = ['id','listing_url','scraping_id]
Text = ['name','summary','space']
This is to allow me to pass these lists of columns to panda and drop columns.这是为了允许我将这些列列表传递给 panda 并删除列。
I googled a lot and got stuck: all answers are about converting lists to df, not vice versa.我用谷歌搜索了很多并卡住了:所有答案都是关于将列表转换为 df,反之亦然。 Should I aim to convert into dictionary, or list of lists?
我的目标应该是转换成字典还是列表?
I have over 100 rows, belonging to 10 categories, so would like to avoid manual hard-coding.我有超过 100 行,属于 10 个类别,所以想避免手动硬编码。
Like this:像这样:
In [245]: res = df.groupby('col_group', as_index=False)['Col_name'].apply(list)
In [248]: res.tolist()
Out[248]: [['id', 'listing_url', 'scrape_id'], ['name', 'summary', 'space']]
I've try this code:我试过这段代码:
import pandas
df = pandas.DataFrame([
[1, 'url_a', 'scrap_a', 'name_a', 'summary_a', 'space_a'],
[2, 'url_b', 'scrap_b', 'name_b', 'summary_b', 'space_b'],
[3, 'url_c', 'scrap_c', 'name_c', 'summary_c', 'space_ac']],
columns=['id', 'listing_url', 'scrape_id', 'name', 'summary', 'space'])
print(df)
for row in df.iterrows():
print(row[1].to_list())
which give this answer:给出了这个答案:
[1, 'url_a', 'scrap_a', 'name_a', 'summary_a', 'space_a']
[2, 'url_b', 'scrap_b', 'name_b', 'summary_b', 'space_b']
[3, 'url_c', 'scrap_c', 'name_c', 'summary_c', 'space_ac']
You can use您可以使用
for row in df[['name', 'summary', 'space']].iterrows():
to only iter over specific columns.仅迭代特定列。
my_vars = df.groupby('col_group').agg(list)['col_name'].to_dict()
Output: Output:
>>> my_vars
{'Text': ['name', 'summary', 'space'], 'metadata': ['id', 'listing_url', 'scrape_id']}
The recommended usage would be just my_vars['Text']
to access the Text
, and etc. If you must have this as distinct names you can force it upon your target scope, eg globals
:推荐的用法只是
my_vars['Text']
来访问Text
等。如果您必须将其作为不同的名称,您可以在目标 scope 上强制使用它,例如globals
:
globals().update(df.groupby('col_group').agg(list)['col_name'].to_dict())
Result:结果:
>>> Text
['name', 'summary', 'space']
>>> metadata
['id', 'listing_url', 'scrape_id']
However I would advise against that as you might unwittingly overwrite some of your other objects, or they might not be in the proper scope you needed (eg locals
).但是,我建议您不要这样做,因为您可能会不知不觉地覆盖您的其他一些对象,或者它们可能不在您需要的正确 scope 中(例如
locals
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.