[英]Can I use newly created variables in the following expressions in `polars`?
In R
(and in particular in dplyr::mutate()
), I'm used to use newly created variables in the following expressions, like so:在
R
(特别是在dplyr::mutate()
)中,我习惯于在以下表达式中使用新创建的变量,如下所示:
library(dplyr, warn.conflicts = FALSE)
head(iris) |>
mutate(
sp1 = Sepal.Length + 1,
sp2 = sp1 + 1
)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species sp1 sp2
#> 1 5.1 3.5 1.4 0.2 setosa 6.1 7.1
#> 2 4.9 3.0 1.4 0.2 setosa 5.9 6.9
#> 3 4.7 3.2 1.3 0.2 setosa 5.7 6.7
#> 4 4.6 3.1 1.5 0.2 setosa 5.6 6.6
#> 5 5.0 3.6 1.4 0.2 setosa 6.0 7.0
#> 6 5.4 3.9 1.7 0.4 setosa 6.4 7.4
I'm now trying to learn polars
and it seems I can't reproduce this behaviour (I'm using the Python version here to stick to the source as close as possible since the R
version is not very complete yet):我现在正在尝试学习
polars
,但似乎无法重现此行为(我在这里使用 Python 版本以尽可能接近源代码,因为R
版本还不是很完整):
import polars as pl
df = pl.DataFrame({"nrs": [1, 2, 3, None, 5]})
mod = df.with_columns(
(pl.col("nrs") + 1).alias("nrs+1"),
(pl.col("nrs+1") + 1).alias("nrs+2")
)
Traceback (most recent call last):
File "<PATH>", line 6, in <module>
mod = df.with_columns(
^^^^^^^^^^^^^^^^
File "<PATH>", line 7260, in with_columns
.collect(no_optimization=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<PATH>", line 1501, in collect
return wrap_df(ldf.collect())
^^^^^^^^^^^^^
exceptions.ColumnNotFoundError: nrs+1
pip show polars
: pip show polars
:
Name: polars
Version: 0.18.0
Is this feature unavailable with polars
or am I missing something?此功能是否不适用于
polars
还是我遗漏了什么?
You need multiple .with_columns
calls:您需要多次
.with_columns
调用:
df = pl.DataFrame({"nrs": [1, 2, 3, None, 5]})
(df.with_columns((pl.col("nrs") + 1).alias("nrs+1"))
.with_columns((pl.col("nrs+1") + 1).alias("nrs+2"))
)
shape: (5, 3)
┌──────┬───────┬───────┐
│ nrs ┆ nrs+1 ┆ nrs+2 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞══════╪═══════╪═══════╡
│ 1 ┆ 2 ┆ 3 │
│ 2 ┆ 3 ┆ 4 │
│ 3 ┆ 4 ┆ 5 │
│ null ┆ null ┆ null │
│ 5 ┆ 6 ┆ 7 │
└──────┴───────┴───────┘
Perhaps relevant: https://github.com/pola-rs/polars/issues/9062可能相关: https://github.com/pola-rs/polars/issues/9062
In comparison to dplyr:mutate
, polars will run every expression in a single with_columns
in parallel while mutate
does everything sequentially.与
dplyr:mutate
相比,polars 将并行运行单个with_columns
中的每个表达式,而mutate
则按顺序执行所有操作。 Since dplyr is doing each column sequentially, when it gets to your second column definition, the first one has already been created so it will just work.由于 dplyr 正在按顺序执行每一列,因此当它到达您的第二列定义时,第一列已经创建,因此它可以正常工作。 Whereas with polars, it sends each column/expression to its own thread/process (I'm not sure which) at the same time so the second one doesn't know anything about the first.
而对于极坐标,它会将每个列/表达式同时发送到它自己的线程/进程(我不确定是哪个),所以第二个对第一个一无所知。 This is why with polars, you have to execute it as two
with_columns
.这就是为什么对于 polars,你必须将它作为两个
with_columns
来执行。 By example, in dplyr
, doing:例如,在
dplyr
中,执行:
head(iris) |>
mutate(
sp1 = Sepal.Length + 1,
sp2 = sp1 + 1
)
is the same as doing和做的一样
head(iris) |>
mutate(
sp1 = Sepal.Length + 1
) |>
mutate(
sp2 = sp1 + 1
)
From a quantity of code perspective, the polars way may be more cumbersome and with smaller data that may be the most important consideration.从代码量的角度来看,polars方式可能比较繁琐,数据量较小可能是最重要的考虑因素。 You can monkey patch in a mutate function that will give you the
dplyr
method of doing everything sequentially such that the syntax you're used to remains, although you're giving up parallelism in the process.您可以在突变 function 中打补丁,这将为您提供按顺序执行所有操作的
dplyr
方法,以便保留您习惯使用的语法,尽管您在此过程中放弃了并行性。
The monkey patch is this:猴子补丁是这样的:
def mutate(self, *args, **kwargs):
lazydf=self.lazy()
for value in args:
lazydf=lazydf.with_columns(value)
for key, value in kwargs.items():
lazydf=lazydf.with_columns(**{key:value})
return lazydf.collect()
pl.DataFrame.mutate=mutate
del mutate
That allows you to do:这使您可以:
df.mutate(
(pl.col("nrs")+1).alias("nrs+1"),
(pl.col("nrs+1")+1).alias("nrs+2"),
**{'nrs+3':pl.col('nrs+2')+1},
nrs4=pl.col('nrs+3')+1
)
shape: (5, 5)
┌──────┬───────┬───────┬───────┬──────┐
│ nrs ┆ nrs+1 ┆ nrs+2 ┆ nrs+3 ┆ nrs4 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞══════╪═══════╪═══════╪═══════╪══════╡
│ 1 ┆ 2 ┆ 3 ┆ 4 ┆ 5 │
│ 2 ┆ 3 ┆ 4 ┆ 5 ┆ 6 │
│ 3 ┆ 4 ┆ 5 ┆ 6 ┆ 7 │
│ null ┆ null ┆ null ┆ null ┆ null │
│ 5 ┆ 6 ┆ 7 ┆ 8 ┆ 9 │
└──────┴───────┴───────┴───────┴──────┘
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.