[英]Generating a new variable based on the values of other variables
I have the following data set我有以下数据集
import pandas as pd
df = pd.DataFrame({"ID": [1,1,1,1,1,2,2,2,2,2],
"TP1": [1,2,3,4,5,9,8,7,6,5],
"TP2": [11,22,32,43,53,94,85,76,66,58],
"TP10": [114,222,324,443,535,94,385,76,266,548],
"count": [1,2,3,4,10,1,2,3,4,10]})
print (df)
I want a "Final" variable in the df that will be based on the ID, TP and count variable.我想要一个基于 ID、TP 和计数变量的 df 中的“最终”变量。
The final result will look like following.最终结果将如下所示。
import pandas as pd
import numpy as np
df = pd.DataFrame({"ID": [1,1,1,1,1,2,2,2,2,2], "TP1": [1,2,3,4,5,9,8,7,6,5],
"TP2": [11,22,32,43,53,94,85,76,66,58], "TP10": [114,222,324,443,535,94,385,76,266,548],
"count": [1,2,3,4,10,1,2,3,4,10],
"final" : [71,1836,np.nan,np.nan,1993,291,1832,np.nan,np.nan,1810]})
print (df)
So for example, the loop of if will do the following因此,例如, if 的循环将执行以下操作
The look will then look at count 2 for ID 1 and the value of TP2 should come in the "final" variable and so on.然后查看 ID 1 的计数 2,TP2 的值应该出现在“final”变量中,依此类推。
I hope my question is clear.我希望我的问题很清楚。 I am looking for a loop because there are 1000 TP variables in the original dataset.我正在寻找一个循环,因为原始数据集中有 1000 个 TP 变量。
I tried to make a code something like the following but it is utterly rubbish.我试图制作类似以下的代码,但它完全是垃圾。
for col in df.columns:
if col.startswith('TP') and count == int(col[2:])
df["Final"] = count
Thanks谢谢
If my understanding is correct, if count=1
then pick TP1
, if count=2
then pick TP2
etc.如果我的理解是正确的,如果count=1
则选择TP1
,如果count=2
则选择TP2
等等。
This can be done with numpy.select()
.这可以通过numpy.select()
来完成。 Note that I have added the condition if f"TP{x}" in df.columns
because not all columns TP1, TP2, TP3, ... TP10
are available in the dataframe.请注意,我if f"TP{x}" in df.columns
因为并非所有列TP1, TP2, TP3, ... TP10
在 dataframe 中都可用。 If all are available in your actual dataframe then this if
statement is not required.如果所有这些都在您的实际 dataframe 中可用,则不需要此if
语句。
import numpy as np
conds = [df["count"] == x for x in range(1,11) if f"TP{x}" in df.columns]
output = [df[f"TP{x}"] for x in range(1,11) if f"TP{x}" in df.columns]
df["final"] = np.select(conds, output, np.nan)
print(df)
Output: Output:
ID TP1 TP2 TP10 count final
0 1 1 11 114 1 1.0
1 1 2 22 222 2 22.0
2 1 3 32 324 3 NaN
3 1 4 43 443 4 NaN
4 1 5 53 535 10 535.0
5 2 9 94 94 1 9.0
6 2 8 85 385 2 85.0
7 2 7 76 76 3 NaN
8 2 6 66 266 4 NaN
9 2 5 58 548 10 548.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.