[英]Fastest way to fill multiple columns by a given condition on other columns pandas
I'm working with a very long dataframe, so I'm looking for the fastest way to fill several columns at once given certain conditions.我正在使用一个非常长的数据框,因此我正在寻找在特定条件下一次填充几列的最快方法。 So let's say you have this dataframe:
所以假设你有这个数据框:
data = {
'col_A1':[1,'','',''],
'col_A2':['','','',''],
'col_A3':['','','',''],
'col_B1':['','',1,''],
'col_B2':['','','',''],
'col_B3':['','','',''],
'col_C1':[1,1,'',''],
'col_C2':['','','',''],
'col_C3':['','','',''],
}
df = pd.DataFrame(data)
df
Input:输入:
col_A1 ![]() |
col_A2 ![]() |
col_A3 ![]() |
col_B1 ![]() |
col_B2 ![]() |
col_B3 ![]() |
col_C1 ![]() |
col_C2 ![]() |
col_C3 ![]() |
---|---|---|---|---|---|---|---|---|
1 ![]() |
1 ![]() |
|||||||
1 ![]() |
||||||||
1 ![]() |
||||||||
And we want to find all '1' values in columns A1,B1 and C1 and then replace other values in the matching rows and columns A2,A3, B2,B3 and C2,C3 as well:我们希望在 A1、B1 和 C1 列中找到所有“1”值,然后替换匹配行和列 A2、A3、B2、B3 和 C2、C3 中的其他值:
Output:输出:
col_A1 ![]() |
col_A2 ![]() |
col_A3 ![]() |
col_B1 ![]() |
col_B2 ![]() |
col_B3 ![]() |
col_C1 ![]() |
col_C2 ![]() |
col_C3 ![]() |
---|---|---|---|---|---|---|---|---|
1 ![]() |
2 ![]() |
3 ![]() |
1 ![]() |
2 ![]() |
3 ![]() |
|||
1 ![]() |
2 ![]() |
3 ![]() |
||||||
1 ![]() |
2 ![]() |
3 ![]() |
||||||
I am currently iterating over columns A and looking for where A1 == 1 matches and then replacing the values for A2 and A3 in the matching rows, and the same for B, C... But speed is important, so I'm wondering if I can do this for all columns at once, or in a more vectorized way.我目前正在遍历 A 列并查找 A1 == 1 匹配的位置,然后替换匹配行中 A2 和 A3 的值,B、C 也是如此……但是速度很重要,所以我想知道如果我可以一次对所有列执行此操作,或者以更矢量化的方式执行此操作。
You can use:您可以使用:
# extract letters/numbers from column names
nums = df.columns.str.extract('(\d+)$', expand=False)
# ['1', '2', '3', '1', '2', '3', '1', '2', '3']
letters = df.columns.str.extract('_(\D)', expand=False)
# ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C']
# or in a single line
# letters, nums = df.columns.str.extract(r'(\D)(\d+)$').T.to_numpy()
# compute a mask of values to fill
mask = df.ne('').groupby(letters, axis=1).cummax(axis=1)
# NB. alternatively use df.eq('1')...
# set the values
df2 = mask.mul(nums)
output:输出:
col_A1 col_A2 col_A3 col_B1 col_B2 col_B3 col_C1 col_C2 col_C3
0 1 2 3 1 2 3
1 1 2 3
2 1 2 3
3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.