[英]Splitting a data frame by a column using a dynamic list of that columns unique values in python
Very new python user here. 这里是非常新的python用户。 I have a data frame that I am trying to subset by whatever unique values that are in the column "Level".
我有一个数据框,我试图通过“级别”列中的任何唯一值来对其进行子集化。 I would like each sub-setting result in a list or its own data frame.
我希望每个子设置结果都在一个列表或自己的数据框中。 In this example I have Level 1, 2, 3, 4, 5 so I would want either 5 separate data frames with only one unique value in each data frame or a list with 5 different values.
在此示例中,我具有1、2、3、4、5级,因此我想要么5个单独的数据帧,每个数据帧中只有一个唯一值,要么要一个具有5个不同值的列表。 Here is the data frame:
这是数据帧:
import pandas as pd
import numpy as np
data = [['Bill', 21, 'Level 1'], ['Joe', 25, 'Level 1'],['Sam', 22, 'Level 2'],['Ash', 19, 'Level 3'],['Mike', 28, 'Level 3'],['Ang', 20, 'Level 4'],['Paul', 25, 'Level 4'],['Kathy', 29, 'Level 5']]
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Level'])
I can do get the desired results if I know the names of the different levels and can hard code it into the code. 如果我知道不同级别的名称,并且可以将其硬编码为代码,则可以得到预期的结果。 My problem is I do not always know what will be in the 'Level' column.
我的问题是我不总是知道“级别”列中的内容。 The code would need to be smart enough to detect the different levels, split by that, and saved the result in a data frames or a list.
该代码将需要足够聪明,以检测不同的级别,然后将其拆分,并将结果保存在数据帧或列表中。 I am not really sure how to go about getting this started..
我不太确定该如何开始。
Thank you! 谢谢!
Take a look if this solves your problem 看看是否能解决您的问题
To get all the unique levels in your data: 要获取数据中的所有唯一级别:
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Level'])
levels = list()
levels = df['Level']
levels = set(levels)
print(levels) # gives you all the unique levels (1 to 5)
To get data of each level (all together): 要获取每个级别的数据(全部):
data = [['Bill', 21, 'Level 1'], ['Joe', 25, 'Level 1'],['Sam', 22, 'Level 2'],['Ash', 19, 'Level 3'],['Mike', 28, 'Level 3'],['Ang', 20, 'Level 4'],['Paul', 25, 'Level 4'],['Kathy', 29, 'Level 5']]
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Level'])
levels = list()
levels = df['Level']
levels = set(levels) ## gets unique levels {'Level 1', 'Level 2', 'Level 3', 'Level 4', 'Level 5'}
for l in levels:
df_level = df.loc[df['Level'] == l]
print("Data for Level:"+l)
print(df_level[['Name','Age']])
print("======================")
Output 产量
Data for Level:Level 4
Name Age
5 Ang 20
6 Paul 25
======================
Data for Level:Level 5
Name Age
7 Kathy 29
======================
Data for Level:Level 3
Name Age
3 Ash 19
4 Mike 28
======================
Data for Level:Level 1
Name Age
0 Bill 21
1 Joe 25
======================
Data for Level:Level 2
Name Age
2 Sam 22
======================
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.