[英]Create new column depending on values from other column
I have a DataFrame that looks something like this: 我有一个看起来像这样的DataFrame:
import numpy as np
import pandas as pd
df=pd.DataFrame([['vt 40462',5,6],[5,6,6],[5,5,8],[4,3,1],['vl 6450',5,6],[5,6,7],
[1,2,3],['vt 40462',5,6],[5,5,8],['vl 658',6,7],[5,5,8],[4,3,1],['vt 40461',5,6],[5,5,8],
[7,8,5]],columns=['A','B','C'])
df df
A B C
0 vt 40462 5 6
1 5 6 6
2 5 5 8
3 4 3 1
4 vl 6450 5 6
5 5 6 7
6 1 2 3
7 vt 40462 5 6
8 5 5 8
9 vl 658 6 7
10 5 5 8
11 4 3 1
12 vt 40461 5 6
13 5 5 8
14 7 8 5
I want to give indexes the values that are between vt
and vl
in column A
and create a new columns as : 我想给索引
A
列中vt
和vl
之间的值,并创建一个新列:
A B C D
0 vt 40462 5 6 vt 40462
1 5 6 6 vt 40462
2 5 5 8 vt 40462
3 4 3 1 vt 40462
4 vl 6450 5 6 vl 6450
5 5 6 7 vl 6450
6 1 2 3 vl 6450
7 vt 40462 5 6 vt 40462
8 5 5 8 vt 40462
9 vl 658 6 7 vl 658
10 5 5 8 vl 658
11 4 3 1 vl 658
12 vt 40461 5 6 vt 40461
13 5 5 8 vt 40461
14 7 8 5 vt 40461
Use str.split
, if ' ' not found the it returns NaN use ffill
to fill NaN and join fields together and assign to 'D': 使用
str.split
,如果未找到',则返回NaN,使用ffill
填充NaN并将字段连接在一起并分配给'D':
#Thanks @user3483203 for the upgrade in syntax
df['D'] = df['A'].str.split().ffill().apply(' '.join)
print(df)
Output: 输出:
A B C D
0 vt 40462 5 6 vt 40462
1 5 6 6 vt 40462
2 5 5 8 vt 40462
3 4 3 1 vt 40462
4 vl 6450 5 6 vl 6450
5 5 6 7 vl 6450
6 1 2 3 vl 6450
7 vt 40462 5 6 vt 40462
8 5 5 8 vt 40462
9 vl 658 6 7 vl 658
10 5 5 8 vl 658
11 4 3 1 vl 658
12 vt 40461 5 6 vt 40461
13 5 5 8 vt 40461
14 7 8 5 vt 40461
Another way would be to assign
column D
to be all values of A
that start with a letter, and then use df.ffill()
to get rid of NaN
s: 另一种方法是
assign
D
列assign
为以字母开头的A
所有值,然后使用df.ffill()
摆脱NaN
:
df.assign(D=df.loc[df.A.str.contains('^[A-Za-z]', na=False), 'A']).ffill()
A B C D
0 vt 40462 5 6 vt 40462
1 5 6 6 vt 40462
2 5 5 8 vt 40462
3 4 3 1 vt 40462
4 vl 6450 5 6 vl 6450
5 5 6 7 vl 6450
6 1 2 3 vl 6450
7 vt 40462 5 6 vt 40462
8 5 5 8 vt 40462
9 vl 658 6 7 vl 658
10 5 5 8 vl 658
11 4 3 1 vl 658
12 vt 40461 5 6 vt 40461
13 5 5 8 vt 40461
14 7 8 5 vt 40461
Or, more or less equivalently, but in 2 steps: 或者,或多或少等效,但分两个步骤:
df.loc[df.A.astype(str).str.contains('^[A-Za-z]'), 'D'] = df.A
df.ffill()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.