[英]Pandas Dataframe - convert columns to size classes and counts
I'm pretty new to python and pandas. I have a dataframe with columns that hold the number of seedlings for a particular size class:我是 python 和 pandas 的新手。我有一个 dataframe,其中包含特定大小 class 的幼苗数量的列:
Plot Location Year Species 6-12_in 12-24_in 24-54_in GreaterThan_54_in
1 VMU 2015 BC 3 8
What I want to do is convert that dataframe to a format like this, where each size class (6-12_in, 12-24_in, 24-54_in, and GreaterThan_54_in) are numbered 1-4 and put into Size_Class/Count columns like this:我想要做的是将 dataframe 转换为这样的格式,其中每个尺寸 class(6-12_in、12-24_in、24-54_in 和 GreaterThan_54_in)编号为 1-4 并放入 Size_Class/Count 列,如下所示:
Plot Location Year Species Size_Class Count
1 VMU 2015 BC 1 3
1 VMU 2015 BC 3 8
I arbitrarily named the columns from dataframe 1, so我任意将列命名为 dataframe 1,所以
6-12_in =1
12-24_in =2
24-54_in=3
GreaterThan_54_in=4
I could easily write this looping through each row and building the new dataframe with if statements, but I feel like there must be a map/apply solution that is more efficient?我可以很容易地编写循环遍历每一行并使用 if 语句构建新的 dataframe,但我觉得必须有一个更有效的映射/应用解决方案? I found this thread, which is kind of similar, but I'm not sure how to easily map the column names and make multiple rows?
我找到了这个线程,它有点相似,但我不确定如何轻松地 map 列名并制作多行? Merge multiple column values into one column in python pandas
将多列值合并为一列 python pandas
Any help to get started is appreciated- thank you!感谢任何入门帮助 - 谢谢!
You can use melt to create a new row for each of your size columns.您可以使用 melt 为每个尺寸列创建一个新行。 Then group by Plot and assign each row an incremental id using
cumcount
.然后按 Plot 分组,并使用
cumcount
为每一行分配一个增量 id。 Drop the null values after and you should get your desired result.在后面删除 null 值,您应该会得到想要的结果。
import pandas as pn
import numpy as np
df = pd.DataFrame({'Plot': [1],
'Location': ['VMU'],
'Year': [2015],
'Species': ['BC'],
'6-12_in': [3],
'12-24_in': [np.nan],
'24-54_in': [8],
'GreaterThan_54_in': [np.nan]})
df = df.melt(id_vars=['Plot','Location','Year','Species'],
var_name='Size_Class',
value_name='Count')
df['Size_Class'] = df.groupby('Plot')['Size_Class'].cumcount()+1
df.dropna()
Output Output
Plot Location Year Species Size_Class Count
0 1 VMU 2015 BC 1 3.0
2 1 VMU 2015 BC 3 8.0
# define dictionary to map the sizes to size class
d={'6-12_in' :1,
'12-24_in' :2,
'24-54_in':3,
'GreaterThan_54_in':4}
# melt the dataframe
df2=df.melt(id_vars=['Plot','Location','Year','Species'],
var_name='size_class',
value_name='count')
df2
# apply map
df2['size_class']=df2['size_class'].map(d)
# drop where count is null
df2[df2['count'].notna()]
Plot Location Year Species size_class count
0 1 VMU 2015 BC 1 3.0
1 1 VMU 2015 BC 2 8.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.