![](/img/trans.png)
[英]Create a new column in pandas depending on values from two other columns
[英]Python - pandas: create new columns and transpose depending on column names
给定以下 dataframe 定义,
data = pd.DataFrame({"sec":["dim1", "dim2", "dim3"], # Construct a pandas DataFrame
"m1vc_15":[1,2,3],
"m1vu_15":[10,11,12],
"m2vc_15":[20,21,22],
"m2vu_15":[30,31,32],
"m1vc_16":[4,5,6],
"m1vu_16":[13,14,15],
"m2vc_16":[23,24,25],
"m2vu_16":[33,34,35]})
生成此表结构:
sec m1vc_15 m1vu_15 m2vc_15 m2vu_15 m1vc_16 m1vu_16 m2vc_16 m2vu_16
dim1 1 10 20 30 4 13 23 33
dim2 2 11 21 31 5 14 24 34
dim3 3 12 22 32 6 15 25 35
是否有任何function 或 package允许我创建此预期结果?
data2 = pd.DataFrame({"sec":np.repeat(["dim1", "dim2", "dim3", "dim1", "dim2", "dim3"],2), # Construct a pandas DataFrame
"type":np.repeat(['vc','vu'],6),
"year":["2015", "2016", "2015", "2016", "2015", "2016",
"2015", "2016", "2015", "2016", "2015", "2016"],
"m1":[1, 4, 2, 5, 3, 6, 10, 13, 11, 14, 12, 15],
"m2":[20, 23, 21, 24, 22, 25, 30, 33, 31, 34, 32, 35]})
"""
# Output
sec type year m1 m2
dim1 vc 2015 1 20
dim1 vc 2016 4 23
dim2 vc 2015 2 21
dim2 vc 2016 5 24
dim3 vc 2015 3 22
dim3 vc 2016 6 25
dim1 vu 2015 10 30
dim1 vu 2016 13 33
dim2 vu 2015 11 31
dim2 vu 2016 14 34
dim3 vu 2015 12 32
dim3 vu 2016 15 35
"""
转置的逻辑是:
_integer
。 integer 是年份的最后两位数字。vu_
或vc_
。 此字符串区分不同类型的对象。 数据源中没有其他类型的值。您可以使用wide_to_long
和几个步骤来清理列:
df = pd.wide_to_long(data, stubnames=['m1', 'm2'], i=['sec'], j='type_year', suffix='\w+').reset_index()
df[['type', 'year']] = df.type_year.str.split('_', expand=True)
df.year = '20' + df.year
df.drop('type_year', axis=1, inplace=True)
df
sec m1 m2 type year
0 dim1 1 20 vc 2015
1 dim2 2 21 vc 2015
2 dim3 3 22 vc 2015
3 dim1 10 30 vu 2015
4 dim2 11 31 vu 2015
5 dim3 12 32 vu 2015
6 dim1 4 23 vc 2016
7 dim2 5 24 vc 2016
8 dim3 6 25 vc 2016
9 dim1 13 33 vu 2016
10 dim2 14 34 vu 2016
11 dim3 15 35 vu 2016
一种有效的选择是使用pyjanitor的pivot_longer :
# pip install pyjanitor
import pandas as pd
import janitor
(data
.pivot_longer(
index = 'sec',
names_to = ('.value', 'type', 'year'),
names_pattern = r"(.\d)(.+)_(\d+)")
.assign(year = lambda df: ("20"+df.year).astype(int))
)
sec type year m1 m2
0 dim1 vc 2015 1 20
1 dim2 vc 2015 2 21
2 dim3 vc 2015 3 22
3 dim1 vu 2015 10 30
4 dim2 vu 2015 11 31
5 dim3 vu 2015 12 32
6 dim1 vc 2016 4 23
7 dim2 vc 2016 5 24
8 dim3 vc 2016 6 25
9 dim1 vu 2016 13 33
10 dim2 vu 2016 14 34
11 dim3 vu 2016 15 35
.value
确定哪些列保留为 header,在本例中为m1/m2
; rest 集中在type
和year
列中
更快的选择是使用names_transform
选项将year
列转换为pivot_longer
中的 integer :
(data
.pivot_longer(
index = 'sec',
names_to = ('.value', 'type', 'year'),
names_pattern = r"(.\d)(.+)_(\d+)",
names_transform = {'year': int})
.assign(year = lambda df: df.year + 2000)
)
sec type year m1 m2
0 dim1 vc 2015 1 20
1 dim2 vc 2015 2 21
2 dim3 vc 2015 3 22
3 dim1 vu 2015 10 30
4 dim2 vu 2015 11 31
5 dim3 vu 2015 12 32
6 dim1 vc 2016 4 23
7 dim2 vc 2016 5 24
8 dim3 vc 2016 6 25
9 dim1 vu 2016 13 33
10 dim2 vu 2016 14 34
11 dim3 vu 2016 15 35
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.