[英]How can I get a selection of one data frame for each row in another data frame based on conditions in that row?
I have the following 2 dataframes:我有以下2个数据框:
df1
x p s
0 2 1 1
1 4 2 1
2 6 1 3
3 8 2 4
df2
ts 1 2
0 1000 45 44
1 1001 46 46
2 1002 47 46
3 1003 48 48
4 1004 49 48
5 1005 50 50
6 1006 51 50
7 1007 52 52
8 1008 53 52
I would like to create a 3rd data frame with the same number of rows as df1 using values in df2 but based on the column values in df1.我想使用 df2 中的值但基于 df1 中的列值创建与 df1 具有相同行数的第三个数据框。 For example, for the first row of df1, I want to get every 'p' row from the 's' column up until the 'x' index in df2.例如,对于 df1 的第一行,我想从 's' 列中获取每个 'p' 行,直到 df2 中的 'x' 索引。 I know how to do that using df.apply() as shown below but it is too slow of an operation for the program I am writing.我知道如何使用 df.apply() 来做到这一点,如下所示,但是对于我正在编写的程序来说,它的操作太慢了。
def foo(row):
return str(df2[row['p']].iloc[0:row['x']+1:row['s']].to_list())
df3 = df1.apply(lambda x: foo(x), axis=1)
df3
0 [45, 46, 47]
1 [44, 46, 46, 48, 48]
2 [45, 48, 51]
3 [44, 48, 52]
I'm not sure how large the datasets are, but try the following我不确定数据集有多大,但请尝试以下操作
# We need to do "CROSS JOIN" so we add a dummy key to both datasets to allow this
df1["temp_key"] = 0
df2["temp_key"] = 0
# Next we need to shift the index into the DataFrame and call it row_number
df2 = df2.reset_index().rename(columns={"index":"row_number"})
# Now we perform the "CROSS JOIN"
df = df1.merge(df2, on="temp_key").drop(columns=["temp_key"])
df1
should now have 7 columns: ["x", "p", "s", "ts", "1", "2", "row_number"]
df1
现在应该有 7 列: ["x", "p", "s", "ts", "1", "2", "row_number"]
# We can now apply the 'x' logic
df = df[df["row_number"] <= df["x"]]
# And then the 's' logic
df = df[df["row_number"].mod(df["p"]) == 0]
# Next we chose the appropriate column based on the p value
df["value"] = df["1"]
df.iloc[df["p"] == 2, "value"] = df["2"]
# Finally we can group the DataFrame by the 'x' value and create the lists
# Note: I've made the assumption that x is unique in df1
df = df.groupby(["x"])["value"].apply(list).reset_index()
This should return a DataFrame with two columns: ["x", "value"]
with x
corresponding to the x
value in df1 and value
being the list of values similar to df3
in your example.这应该返回一个包含两列的 DataFrame: ["x", "value"]
,其中x
对应于 df1 中的x
值,而value
是类似于示例中的df3
的值列表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.