如何根据该行中的条件为另一个数据框中的每一行选择一个数据框？

Question

I have the following 2 dataframes:我有以下2个数据框：

df1
   x  p  s
0  2  1  1
1  4  2  1
2  6  1  3
3  8  2  4

df2
     ts   1   2
0  1000  45  44
1  1001  46  46
2  1002  47  46
3  1003  48  48
4  1004  49  48
5  1005  50  50
6  1006  51  50
7  1007  52  52
8  1008  53  52

I would like to create a 3rd data frame with the same number of rows as df1 using values in df2 but based on the column values in df1.我想使用 df2 中的值但基于 df1 中的列值创建与 df1 具有相同行数的第三个数据框。 For example, for the first row of df1, I want to get every 'p' row from the 's' column up until the 'x' index in df2.例如，对于 df1 的第一行，我想从 's' 列中获取每个 'p' 行，直到 df2 中的 'x' 索引。 I know how to do that using df.apply() as shown below but it is too slow of an operation for the program I am writing.我知道如何使用 df.apply() 来做到这一点，如下所示，但是对于我正在编写的程序来说，它的操作太慢了。

def foo(row):
    return str(df2[row['p']].iloc[0:row['x']+1:row['s']].to_list())

df3 = df1.apply(lambda x: foo(x), axis=1)
df3
0            [45, 46, 47]
1    [44, 46, 46, 48, 48]
2            [45, 48, 51]
3            [44, 48, 52]

Answer 1

I'm not sure how large the datasets are, but try the following我不确定数据集有多大，但请尝试以下操作

# We need to do "CROSS JOIN" so we add a dummy key to both datasets to allow this
df1["temp_key"] = 0
df2["temp_key"] = 0

# Next we need to shift the index into the DataFrame and call it row_number
df2 = df2.reset_index().rename(columns={"index":"row_number"})

# Now we perform the "CROSS JOIN"
df = df1.merge(df2, on="temp_key").drop(columns=["temp_key"])

df1 should now have 7 columns: ["x", "p", "s", "ts", "1", "2", "row_number"] df1现在应该有 7 列： ["x", "p", "s", "ts", "1", "2", "row_number"]

# We can now apply the 'x' logic
df = df[df["row_number"] <= df["x"]]

# And then the 's' logic
df = df[df["row_number"].mod(df["p"]) == 0]

# Next we chose the appropriate column based on the p value
df["value"] = df["1"]
df.iloc[df["p"] == 2, "value"] = df["2"]

# Finally we can group the DataFrame by the 'x' value and create the lists
# Note: I've made the assumption that x is unique in df1
df = df.groupby(["x"])["value"].apply(list).reset_index()

This should return a DataFrame with two columns: ["x", "value"] with x corresponding to the x value in df1 and value being the list of values similar to df3 in your example.这应该返回一个包含两列的 DataFrame： ["x", "value"] ，其中x对应于 df1 中的x值，而value是类似于示例中的df3的值列表。

如何根据该行中的条件为另一个数据框中的每一行选择一个数据框？

问题描述

1 个解决方案

解决方案1
0 2022-06-01 08:02:04

如何根据该行中的条件为另一个数据框中的每一行选择一个数据框？

问题描述

1 个解决方案

解决方案1 0 2022-06-01 08:02:04

解决方案1
0 2022-06-01 08:02:04