简体   繁体   English

Pandas Dataframe - 将列转换为尺寸类别和计数

[英]Pandas Dataframe - convert columns to size classes and counts

I'm pretty new to python and pandas. I have a dataframe with columns that hold the number of seedlings for a particular size class:我是 python 和 pandas 的新手。我有一个 dataframe,其中包含特定大小 class 的幼苗数量的列:

Plot    Location    Year    Species 6-12_in 12-24_in    24-54_in    GreaterThan_54_in
1       VMU         2015    BC      3                   8   

What I want to do is convert that dataframe to a format like this, where each size class (6-12_in, 12-24_in, 24-54_in, and GreaterThan_54_in) are numbered 1-4 and put into Size_Class/Count columns like this:我想要做的是将 dataframe 转换为这样的格式,其中每个尺寸 class(6-12_in、12-24_in、24-54_in 和 GreaterThan_54_in)编号为 1-4 并放入 Size_Class/Count 列,如下所示:

Plot    Location    Year    Species Size_Class  Count
1       VMU         2015    BC      1           3
1       VMU         2015    BC      3           8

I arbitrarily named the columns from dataframe 1, so我任意将列命名为 dataframe 1,所以

6-12_in =1
12-24_in =2
24-54_in=3
GreaterThan_54_in=4

I could easily write this looping through each row and building the new dataframe with if statements, but I feel like there must be a map/apply solution that is more efficient?我可以很容易地编写循环遍历每一行并使用 if 语句构建新的 dataframe,但我觉得必须有一个更有效的映射/应用解决方案? I found this thread, which is kind of similar, but I'm not sure how to easily map the column names and make multiple rows?我找到了这个线程,它有点相似,但我不确定如何轻松地 map 列名并制作多行? Merge multiple column values into one column in python pandas 将多列值合并为一列 python pandas

Any help to get started is appreciated- thank you!感谢任何入门帮助 - 谢谢!

You can use melt to create a new row for each of your size columns.您可以使用 melt 为每个尺寸列创建一个新行。 Then group by Plot and assign each row an incremental id using cumcount .然后按 Plot 分组,并使用cumcount为每一行分配一个增量 id。 Drop the null values after and you should get your desired result.在后面删除 null 值,您应该会得到想要的结果。

import pandas as pn
import numpy as np
df = pd.DataFrame({'Plot': [1],
 'Location': ['VMU'],
 'Year': [2015],
 'Species': ['BC'],
 '6-12_in': [3],
 '12-24_in': [np.nan],
 '24-54_in': [8],
 'GreaterThan_54_in': [np.nan]})

df = df.melt(id_vars=['Plot','Location','Year','Species'],
             var_name='Size_Class',
             value_name='Count')
df['Size_Class'] = df.groupby('Plot')['Size_Class'].cumcount()+1
df.dropna()

Output Output

   Plot Location  Year Species  Size_Class  Count
0     1      VMU  2015      BC           1    3.0
2     1      VMU  2015      BC           3    8.0
# define dictionary to map the sizes to size class
d={'6-12_in' :1,
'12-24_in' :2,
'24-54_in':3,
'GreaterThan_54_in':4}


# melt the dataframe
df2=df.melt(id_vars=['Plot','Location','Year','Species'], 
            var_name='size_class', 
            value_name='count')
df2

# apply map 
df2['size_class']=df2['size_class'].map(d)

# drop where count is null
df2[df2['count'].notna()]
    Plot    Location    Year    Species     size_class  count
0      1    VMU         2015         BC              1    3.0
1      1    VMU         2015         BC              2    8.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM