[英]How to populate a column based on values from multiple columns in python?
I have a data frame where I am trying to populate a column based on the values in the other columns.我有一个数据框,我试图根据其他列中的值填充一个列。 There should only be one Sxx per row, but it could be in a different column in each row.每行应该只有一个 Sxx,但它可以在每一行的不同列中。 So in row 1, S11 could be in column 4, but in row 2 S15 could be in column 5. What I want to do is to take all of the Sxx values and put them in one column at the end.所以在第 1 行中,S11 可能在第 4 列中,但在第 2 行中,S15 可能在第 5 列中。我想要做的是获取所有 Sxx 值并将它们放在末尾的一列中。 Any help here would be much appreciated!在这里的任何帮助将不胜感激!
Here is an example of the data frame.这是数据框的示例。 If you look at the second to last and last row, that is emblematic of what I am trying to solve for.如果您查看倒数第二行和最后一行,那是我要解决的问题的象征。
Entity1 Entity2 Entity3 Entity4 Entity5 Entity6 Entity7 Entity8 Entity9 School
0 C12 CAD G01 S09 None None None None None NaN
1 C12 CAD G01 S09 None None None None None NaN
2 C12 CAD G01 S09 None None None None None NaN
3 C12 CAD G01 S09 None None None None None NaN
4 C12 CAD G01 S09 None None None None None NaN
... ... ... ... ... ... ... ... ... ... ...
322976 C07 CAD G01 S09 None None None None None NaN
322977 C13 CAD G01 S06 None None None None None NaN
322978 C13 CAD G01 S06 None None None None None NaN
322979 C13 CAD G01 S06 None None None None None NaN
322980 CAD G01 S14 W04 None None None None None NaN
322981 rows × 10 columns
Here is one way of doing it, probably not the most optimal, using regex.这是使用正则表达式的一种方法,可能不是最佳方法。 It assumes there is always one Sxx at each row.它假设每一行总是有一个 Sxx。 Assuming your DataFrame is data_df
:假设您的 DataFrame 是data_df
:
import pandas as pd
import re
last_col = list()
for index, row in data_df.iterrows():
for cell in row.to_list():
if re.match('S[0-9]+', cell):
last_col.append(cell)
break
data_df['last_col'] = last_col
Find the locations of Sxx's with the apply(lambda match)
then propagate those to the last column with ffill(axis=1)
then add that last column ( .iloc[:,-1:]
) to the DF.使用apply(lambda match)
查找 Sxx 的位置,然后使用ffill(axis=1)
将这些位置传播到最后一列,然后将最后一列 ( .iloc[:,-1:]
) 添加到 DF。
df['Last_S_Col'] = \
df[df.apply(lambda x: x.str.match('S[0-9]{2}'), axis=1)].ffill(axis=1).iloc[:,-1:]
df: df:
Entity1 Entity2 Entity3 Entity4 Entity5 Entity6 Entity7 Entity8 Entity9 \
0 C12 CAD S12 G01 None None None None None
1 C12 CAD G01 S09 None None None None None
2 C12 CAD G01 S09 None None None None None
3 C12 S14 G01 CAD None None None None None
4 S01 CAD G01 C12 None None None None None
School Last_S_Col
0 NaN S12
1 NaN S09
2 NaN S09
3 NaN S14
4 NaN S01
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.