简体   繁体   English

如何根据 python 中多列的值填充一列?

[英]How to populate a column based on values from multiple columns in python?

I have a data frame where I am trying to populate a column based on the values in the other columns.我有一个数据框,我试图根据其他列中的值填充一个列。 There should only be one Sxx per row, but it could be in a different column in each row.每行应该只有一个 Sxx,但它可以在每一行的不同列中。 So in row 1, S11 could be in column 4, but in row 2 S15 could be in column 5. What I want to do is to take all of the Sxx values and put them in one column at the end.所以在第 1 行中,S11 可能在第 4 列中,但在第 2 行中,S15 可能在第 5 列中。我想要做的是获取所有 Sxx 值并将它们放在末尾的一列中。 Any help here would be much appreciated!在这里的任何帮助将不胜感激!

Here is an example of the data frame.这是数据框的示例。 If you look at the second to last and last row, that is emblematic of what I am trying to solve for.如果您查看倒数第二行和最后一行,那是我要解决的问题的象征。

    Entity1 Entity2 Entity3 Entity4 Entity5 Entity6 Entity7 Entity8 Entity9 School
0   C12 CAD G01 S09 None    None    None    None    None    NaN
1   C12 CAD G01 S09 None    None    None    None    None    NaN
2   C12 CAD G01 S09 None    None    None    None    None    NaN
3   C12 CAD G01 S09 None    None    None    None    None    NaN
4   C12 CAD G01 S09 None    None    None    None    None    NaN
... ... ... ... ... ... ... ... ... ... ...
322976  C07 CAD G01 S09 None    None    None    None    None    NaN
322977  C13 CAD G01 S06 None    None    None    None    None    NaN
322978  C13 CAD G01 S06 None    None    None    None    None    NaN
322979  C13 CAD G01 S06 None    None    None    None    None    NaN
322980  CAD G01 S14 W04 None    None    None    None    None    NaN
322981 rows × 10 columns

Here is one way of doing it, probably not the most optimal, using regex.这是使用正则表达式的一种方法,可能不是最佳方法。 It assumes there is always one Sxx at each row.它假设每一行总是有一个 Sxx。 Assuming your DataFrame is data_df :假设您的 DataFrame 是data_df

import pandas as pd
import re

last_col = list()
for index, row in data_df.iterrows():
    for cell in row.to_list():
        if re.match('S[0-9]+', cell):
            last_col.append(cell)
            break

data_df['last_col'] = last_col

Find the locations of Sxx's with the apply(lambda match) then propagate those to the last column with ffill(axis=1) then add that last column ( .iloc[:,-1:] ) to the DF.使用apply(lambda match)查找 Sxx 的位置,然后使用ffill(axis=1)将这些位置传播到最后一列,然后将最后一列 ( .iloc[:,-1:] ) 添加到 DF。

df['Last_S_Col'] = \
    df[df.apply(lambda x: x.str.match('S[0-9]{2}'), axis=1)].ffill(axis=1).iloc[:,-1:]

df: df:

  Entity1 Entity2 Entity3 Entity4 Entity5 Entity6 Entity7 Entity8 Entity9  \
0     C12     CAD     S12     G01    None    None    None    None    None   
1     C12     CAD     G01     S09    None    None    None    None    None   
2     C12     CAD     G01     S09    None    None    None    None    None   
3     C12     S14     G01     CAD    None    None    None    None    None   
4     S01     CAD     G01     C12    None    None    None    None    None   

   School Last_S_Col  
0     NaN        S12  
1     NaN        S09  
2     NaN        S09  
3     NaN        S14  
4     NaN        S01  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 基于pandas / python中多列的字符串对列中的值求和 - Sum values in a column based on strings from multiple columns in pandas/python (Python)根据来自多个其他列的值在 df 中创建列 - (Python) Creating a column in a df based on values from multiple other columns 如何基于python中其他列中的多个条件设置列的值? - How to set values of a column based on multiple conditions in other columns in python? 如何根据具有布尔值的多列在 Python 中查找列的平均值 - How to find mean of a column in Python based on multiple columns with Boolean values 如何在新列中填充值 - How to populate values inside a new column based values from other columns in a dataframe in Pandas 如何根据 pandas 中多列的条件替换列中的值 - How to replace values in a column based on conditions from multiple columns in pandas 如何根据多列的值计算新列 - How to calculate new column based on values from multiple columns 如何根据其他列中的值填充新列? - How to populate new column based on values in other columns? Python:如何根据其他列的值将函数应用于列 - Python: How to apply a function to a column based on values from other columns 通过解析列值为数据框创建新列,并使用来自另一列python的值填充新列 - Create new columns for a dataframe by parsing column values and populate new columns with values from another column python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM