[英]Pandas: Conditionally fill column using a function based on other columns values
[英]Conditionally fill column based off values in other columns in a pandas df
这个问题类似于关于条件填充列的几个问题,但是我的df
有点复杂。
我有一个包含浮点数和字符串的列的df
。 我试图有条件地填充包含基于字符串的浮点数的列。
根据以下df
:
如果Code
的值以A
开头,我想保持原样。
如果值Code
以B
开头,我想保留相同的初始值,并将nan's
返回到以下各行,直到Code
的下一个值。
如果Code
的值以C
开头,则我希望保持相同的第一个值,直到下一个浮点在['Numx','Numy]
import pandas as pd
import numpy as np
d = ({
'Code' :['A1','A1','','B1','B1','A2','A2','','B2','B2','','A3','A3','A3','','B1','','B4','B4','A2','A2','A1','A1','','B4','B4','C1','C1','','','D1','','B2'],
'Numx' : [30.2,30.5,30.6,35.6,40.2,45.5,46.1,48.1,48.5,42.2,'',30.5,30.6,35.6,40.2,45.5,'',48.1,48.5,42.2, 40.1,48.5,42.2,'',48.5,42.2,43.1,44.1,'','','','',45.1],
'Numy' : [1.9,2.3,2.5,2.2,2.5,3.1,3.4,3.6,3.7,5.4,'',2.3,2.5,2.2,2.5,3.1,'',3.6,3.7,5.4,6.5,8.5,2.2,'',8.5,2.2,2.3,2.5,'','','','',3.2]
})
df = pd.DataFrame(data = d)
输出:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 40.2 2.5
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 42.2 5.4
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 48.5 3.7
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 42.2 2.2
26 C1 43.1 2.3
27 C1 44.1 2.5
28 nan nan
29 nan nan
30 D1 nan nan
31 nan nan
32 B2 45.1 3.2
当Code
值为B
时,我在想这样的事情:
df['Numx'] = np.where(df['Code'] == 'B-'.ffill())
df['Numy'] = np.where(df['Code'] == 'B-'.ffill())
所以我想要的输出将是:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 nan nan
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 nan nan
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 nan nan
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 nan nan
26 C1 43.1 2.3
27 C1 43.1 2.3
28 43.1 2.3
29 43.1 2.3
30 D1 43.1 2.3
31 43.1 2.3
32 B2 45.1 3.2
我相信需要:
df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()
df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated())
mask = df['Code_new'] == 'BB'
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 AA 30.2 1.9 AA
1 NaN NaN AA
2 NaN NaN AA
3 BB 35.6 2.2 BB
4 35.6 2.2 BB
5 35.6 2.2 BB
6 35.6 2.2 BB
7 CC 35.6 2.2 BB
8 35.6 2.2 BB
9 DD 35.6 2.2 BB
要么:
df = df.replace('nan', np.nan)
df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()
m1 = df['Code_new'].duplicated() & (df['Code_new'] == 'AA')
df[['Numx','Numy']] = df[['Numx','Numy']].mask(m1)
m2 = df['Code_new'] == 'BB'
df.loc[m2, ['Numx','Numy']] = df.loc[m2, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 AA 30.2 1.9 AA
1 NaN NaN AA
2 NaN NaN AA
3 BB 35.6 2.2 BB
4 40.2 2.5 BB
5 45.5 3.1 BB
6 45.5 3.1 BB
7 CC 45.5 3.1 BB
8 45.5 3.1 BB
9 DD 42.2 5.4 BB
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.