如何根据 python 中多列的值填充一列？

Question

I have a data frame where I am trying to populate a column based on the values in the other columns.我有一个数据框，我试图根据其他列中的值填充一个列。 There should only be one Sxx per row, but it could be in a different column in each row.每行应该只有一个 Sxx，但它可以在每一行的不同列中。 So in row 1, S11 could be in column 4, but in row 2 S15 could be in column 5. What I want to do is to take all of the Sxx values and put them in one column at the end.所以在第 1 行中，S11 可能在第 4 列中，但在第 2 行中，S15 可能在第 5 列中。我想要做的是获取所有 Sxx 值并将它们放在末尾的一列中。 Any help here would be much appreciated!在这里的任何帮助将不胜感激！

Here is an example of the data frame.这是数据框的示例。 If you look at the second to last and last row, that is emblematic of what I am trying to solve for.如果您查看倒数第二行和最后一行，那是我要解决的问题的象征。

    Entity1 Entity2 Entity3 Entity4 Entity5 Entity6 Entity7 Entity8 Entity9 School
0   C12 CAD G01 S09 None    None    None    None    None    NaN
1   C12 CAD G01 S09 None    None    None    None    None    NaN
2   C12 CAD G01 S09 None    None    None    None    None    NaN
3   C12 CAD G01 S09 None    None    None    None    None    NaN
4   C12 CAD G01 S09 None    None    None    None    None    NaN
... ... ... ... ... ... ... ... ... ... ...
322976  C07 CAD G01 S09 None    None    None    None    None    NaN
322977  C13 CAD G01 S06 None    None    None    None    None    NaN
322978  C13 CAD G01 S06 None    None    None    None    None    NaN
322979  C13 CAD G01 S06 None    None    None    None    None    NaN
322980  CAD G01 S14 W04 None    None    None    None    None    NaN
322981 rows × 10 columns

Answer 1

Here is one way of doing it, probably not the most optimal, using regex.这是使用正则表达式的一种方法，可能不是最佳方法。 It assumes there is always one Sxx at each row.它假设每一行总是有一个 Sxx。 Assuming your DataFrame is data_df :假设您的 DataFrame 是data_df ：

import pandas as pd
import re

last_col = list()
for index, row in data_df.iterrows():
    for cell in row.to_list():
        if re.match('S[0-9]+', cell):
            last_col.append(cell)
            break

data_df['last_col'] = last_col

Answer 2

Find the locations of Sxx's with the apply(lambda match) then propagate those to the last column with ffill(axis=1) then add that last column ( .iloc[:,-1:] ) to the DF.使用apply(lambda match)查找 Sxx 的位置，然后使用ffill(axis=1)将这些位置传播到最后一列，然后将最后一列 ( .iloc[:,-1:] ) 添加到 DF。

df['Last_S_Col'] = \
    df[df.apply(lambda x: x.str.match('S[0-9]{2}'), axis=1)].ffill(axis=1).iloc[:,-1:]

df: df:

  Entity1 Entity2 Entity3 Entity4 Entity5 Entity6 Entity7 Entity8 Entity9  \
0     C12     CAD     S12     G01    None    None    None    None    None   
1     C12     CAD     G01     S09    None    None    None    None    None   
2     C12     CAD     G01     S09    None    None    None    None    None   
3     C12     S14     G01     CAD    None    None    None    None    None   
4     S01     CAD     G01     C12    None    None    None    None    None   

   School Last_S_Col  
0     NaN        S12  
1     NaN        S09  
2     NaN        S09  
3     NaN        S14  
4     NaN        S01

如何根据 python 中多列的值填充一列？

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-04-13 22:28:30

解决方案2
0 2021-04-13 23:29:58

如何根据 python 中多列的值填充一列？

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-04-13 22:28:30

解决方案2 0 2021-04-13 23:29:58

解决方案1
0 已采纳 2021-04-13 22:28:30

解决方案2
0 2021-04-13 23:29:58