简体   繁体   English

根据其他变量的值生成新变量

[英]Generating a new variable based on the values of other variables

I have the following data set我有以下数据集

import pandas as pd
df = pd.DataFrame({"ID": [1,1,1,1,1,2,2,2,2,2],
 "TP1": [1,2,3,4,5,9,8,7,6,5],
 "TP2": [11,22,32,43,53,94,85,76,66,58],
 "TP10": [114,222,324,443,535,94,385,76,266,548],
 "count": [1,2,3,4,10,1,2,3,4,10]})
print (df)

I want a "Final" variable in the df that will be based on the ID, TP and count variable.我想要一个基于 ID、TP 和计数变量的 df 中的“最终”变量。

The final result will look like following.最终结果将如下所示。

import pandas as pd
import numpy as np
df = pd.DataFrame({"ID": [1,1,1,1,1,2,2,2,2,2], "TP1": [1,2,3,4,5,9,8,7,6,5],
                   "TP2": [11,22,32,43,53,94,85,76,66,58], "TP10": [114,222,324,443,535,94,385,76,266,548],
                   "count": [1,2,3,4,10,1,2,3,4,10],
                   "final" : [71,1836,np.nan,np.nan,1993,291,1832,np.nan,np.nan,1810]})
print (df)

So for example, the loop of if will do the following因此,例如, if 的循环将执行以下操作

  1. It will look at the ID它将查看 ID
  2. Then for 1st ID it should look at value of count, if the value of count is 1然后对于第一个 ID,它应该查看 count 的值,如果 count 的值为 1
  3. Then if should look at the variable TP1 and its 1st value should be placed in "final" variable.然后如果应该查看变量 TP1 并且它的第一个值应该放在“final”变量中。

The look will then look at count 2 for ID 1 and the value of TP2 should come in the "final" variable and so on.然后查看 ID 1 的计数 2,TP2 的值应该出现在“final”变量中,依此类推。

I hope my question is clear.我希望我的问题很清楚。 I am looking for a loop because there are 1000 TP variables in the original dataset.我正在寻找一个循环,因为原始数据集中有 1000 个 TP 变量。

I tried to make a code something like the following but it is utterly rubbish.我试图制作类似以下的代码,但它完全是垃圾。

for col in df.columns:
    if col.startswith('TP') and count == int(col[2:])
        df["Final"] = count

Thanks谢谢

If my understanding is correct, if count=1 then pick TP1 , if count=2 then pick TP2 etc.如果我的理解是正确的,如果count=1则选择TP1 ,如果count=2则选择TP2等等。

This can be done with numpy.select() .这可以通过numpy.select()来完成。 Note that I have added the condition if f"TP{x}" in df.columns because not all columns TP1, TP2, TP3, ... TP10 are available in the dataframe.请注意,我if f"TP{x}" in df.columns因为并非所有列TP1, TP2, TP3, ... TP10在 dataframe 中都可用。 If all are available in your actual dataframe then this if statement is not required.如果所有这些都在您的实际 dataframe 中可用,则不需要此if语句。

import numpy as np

conds = [df["count"] == x for x in range(1,11) if f"TP{x}" in df.columns]
output = [df[f"TP{x}"] for x in range(1,11) if f"TP{x}" in df.columns]
df["final"] = np.select(conds, output, np.nan)

print(df)

Output: Output:

   ID  TP1  TP2  TP10  count  final
0   1    1   11   114      1    1.0
1   1    2   22   222      2   22.0
2   1    3   32   324      3    NaN
3   1    4   43   443      4    NaN
4   1    5   53   535     10  535.0
5   2    9   94    94      1    9.0
6   2    8   85   385      2   85.0
7   2    7   76    76      3    NaN
8   2    6   66   266      4    NaN
9   2    5   58   548     10  548.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM