[英]Pandas dataframe, in a row, to find the max in selected column, and find value of another column based on that
I have a dataframe like this:我有一个这样的 dataframe:
import pandas as pd
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})
x1 y1 x2 y2
0 20 5 22 10
1 25 8 27 2
X and Y pair together. X 和 Y 配对在一起。 I need to compare y1 and y2 and get the max in every row.
我需要比较 y1 和 y2 并在每一行中获得最大值。 And find the corresponding x.
并找到对应的x。 Hence the max of row [0] is y2 (=10), and the corresponding x is x2 (=22).
因此第[0]行的最大值是y2(=10),对应的x是x2(=22)。 The second row will be y1 (=8) and x1(=25).
第二行将是 y1 (=8) 和 x1(=25)。 Expected result, new columns x and y:
预期结果,新列 x 和 y:
x1 y1 x2 y2 x y
0 20 5 22 10 22 10
1 25 8 27 2 25 8
This is a simple dataframe I made to elaborate on the question.这是一个简单的 dataframe 我为了详细说明这个问题。 X and Y pairs, in my case, can be 30 pairs.
X 和 Y 对,在我的例子中,可以是 30 对。
# get a hold on "y*" columns
y_cols = df.filter(like="y")
# get the maximal y-values' suffixes, and then add from front "x" to them
max_x_vals = y_cols.idxmax(axis=1).str.extract(r"(\d+)$", expand=False).radd("x")
# get the locations of those x* values
max_x_ids = df.columns.get_indexer(max_x_vals)
# now we have the indexes of x*'s in the columns; NumPy's indexing
# helps to get a cross section
df["max_xs"] = df.to_numpy()[np.arange(len(df)), max_x_ids]
# for y*'s, it's directly the maximum per row
df["max_ys"] = y_cols.max(axis=1)
to get要得到
>>> df
x1 y1 x2 y2 max_xs max_ys
0 20 5 22 10 22 10
1 25 8 27 2 25 8
You can do it with the help of.apply function.您可以在申请 function 的帮助下完成。
import pandas as pd
import numpy as np
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})
y_cols = [col for col in df.columns if col[0] == 'y']
x_cols = [col for col in df.columns if col[0] == 'x']
def find_corresponding_x(row):
max_y_index = np.argmax(row[y_cols])
return row[f'{x_cols[max_y_index]}']
df['corresponding_x'] = df.apply(find_corresponding_x, axis = 1)
this is one solution:这是一个解决方案:
a = df[df['y1'] < df['y2']].drop(columns=['y1','x1']).rename(columns={'y2':'y', 'x2':'x'})
b = df[df['y1'] >= df['y2']].drop(columns=['y2','x2']).rename(columns={'y1':'y', 'x1':'x'})
result = pd.concat([a,b])
if you need to keep order then maybe add another column with original index and sort by it after concatenation如果您需要保持顺序,则可以添加另一列具有原始索引并在连接后按它排序
you can use the function below.您可以使用下面的 function。 remember to import pandas and numpy like I did in this code.
请记住像我在此代码中所做的那样导入 pandas 和 numpy。 import your data set and use Max_number function.
导入您的数据集并使用Max_number function。
import pandas as pd
import numpy as np
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})
def Max_number (df):
columns = list(df.columns)
rows = df.shape[0]
max_value = []
column_name = []
for i in range(rows):
row_array = list(np.array(df[i:i+1])[0])
maximum = max(row_array)
max_value.append(maximum)
index=row_array.index(maximum)
column_name.append(columns[index])
return pd.DataFrame({"column":column_name,"max_value":max_value})
returns this:返回这个:
row index![]() |
column![]() |
max_value![]() |
---|---|---|
0 ![]() |
x2 ![]() |
22 ![]() |
1 ![]() |
x2 ![]() |
27 ![]() |
if x1 column comes first and then y1, then x2, y2 and so on, you can just try:如果先是 x1 列,然后是 y1,然后是 x2、y2 等等,您可以尝试:
a = df.columns.get_indexer(y_cols.idxmax(axis=1))
df[['y', 'x']] = df.to_numpy()[np.arange(len(df)), [a, a - 1]].T
I hope it works for your solution,我希望它适用于您的解决方案,
import pandas as pd
df = pd.DataFrame({'x1':[20,25],'y1':[5,8],'x2':[22,27],'y2':[10,2]})
df['x_max'] = df[['x1', 'x2']].max(axis=1)
df['y_max'] = df[['y1', 'y2']].max(axis=1)
df
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.