标签[python-polars] - 堆栈内存溢出

我可以在 `polars` 的以下表达式中使用新创建的变量吗？ - Can I use newly created variables in the following expressions in `polars`?

在R （特别是在dplyr::mutate() ）中，我习惯于在以下表达式中使用新创建的变量，如下所示：library(dplyr, warn.conflicts = FALSE) head(iris) |> mutate( sp1 = Sepal.Length + 1, ...

在 Polars 中进行“索引”查找的最快方法是什么？ - What is the fastest way to do "indexed" look-ups in Polars?

我正在处理在 memory 中完全加载的大型极地数据帧。每一行都由列 entityId (Int64) 和 entryDate (date) 唯一索引。我知道 poalars 没有索引，但我仍然需要对这些表进行临时数据查找，而且它非常频繁，占用了我应用程序运行时间的很大一部分。目前我正在使用. ...

将 2 列极坐标 dataframe 转换为字典，其键作为第一列元素，第二列元素作为值 - convert 2 columns of polars dataframe to dictionary having its key as first column elements and second column elements as values

我正在使用以下 dataframe 转换为特定格式的字典。但是，我收到错误 TypeError: unhashable type: 'Series' ...

从镶木地板文件中查询值小于特定数量的排序列的最后一行 - Querying last row of sorted column where value is less than specific amount from parquet file

我有一个大型镶木地板文件，其中一列中的数据已排序。下面是一个非常简化的示例。我有兴趣查询 Y 列的最后一个值，因为 X 以最有效的方式使用 python 小于某个数量。我保证 X 列按升序排序。例如，假设 X 小于 11，我希望 Y 值为“绿色”。我尝试了以下方法：上面的代码“有效”，但 ...

查找 B 列中的值大于 A 列中的值的第一个索引 - Finding first index of where value in column B is greater than a value in column A

我想知道当 A 列中的值大于 B 列中的值时第一次出现（索引）。目前我使用 for 循环（而且它非常慢）但我想可以在滚动中做到这一点window。df = polars.DataFrame({"idx": [i for i in range(5)], "col_a": [1,2,3,4,4], "c ...

在 Polars 中从年月日构造日期列 - Construct date column from year, month and day in Polars

考虑以下 Polars dataframe：import polars as pl df = pl.DataFrame({'year': [2023], 'month': [2], 'day': [1]}) 我想从year 、 month和day构造一个日期列。我知道这可以通过首先连接成一个字符 ...

剥离整个极地 dataframe - strip entire polars dataframe

我想用这行代码从极坐标 dataframe 中去除前导和尾随空格：但它没有用。我怎样才能剥离整个极地 dataframe？ ...

给定一个包含 n 列数字的数据框，您如何计算所有列对组合的 Pearson 相关性？ - Given a data frame with n columns of numbers, how could you calculate the Pearson correlation of all column-pair combinations?

假设我有一个这样的 Polars 数据框：我正在寻找计算所有列（日期一除外）的每个对组合之间的 Pearson 相关性。结果看起来像这样：我的直觉是我需要执行以下操作：获取列 [1..] 的笛卡尔积作为新数据框。使用 Polars 表达式，计算每个序列对的pearson_corr 。我 ...

Polars 案例说明 - Polars Case Statement

我正在尝试从 Python 中提取package 极坐标。我来自 R 背景，所以请理解这可能是一个非常简单的问题。我想实现一个 case 语句，如果以下任何条件为真，它会将其标记为 1，否则将为 0。我的新列将被称为“my_new_column_flag” 但是我收到错误消息回溯（最近调用最后）： ...

如何在 Python 中使用 polars package 读取 SQLite 数据库文件 - How to read a SQLite database file using polars package in Python

我想使用polars package 读取一个 SQLite 数据库文件（database.sqlite）。我尝试了以下操作但未成功：出现以下错误：有什么建议么？ ...

如何在 polars 中用条件填充列 - How to fill column with condition in polars

我想使用具有条件的其他列值添加新列在 pandas 中，我这样做如下import pandas as pd df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}) df['c'] = df['a'] df.loc[df['b']==4, 'c'] = df['b' ...

如何将 Poisson CDF 写成 Python 极坐标表达式 - How to Write Poisson CDF as Python Polars Expression

我有一个极坐标表达式集合，用于为 ML model 生成特征。我想向这个集合添加一个 poission cdf 特征，同时保持惰性执行（具有速度、缓存等优势）。到目前为止，我还没有找到实现这一目标的简单方法。我已经能够在所需的惰性表达式框架之外获得我想要的结果：但是，实际上我希望它看起来像： ...

如何列出、连接和计算极坐标表达式？ - How to list, concatenate, and evaluate polars expressions?

我想在 object（列表、字典或其他）中存储许多不同的过滤器，然后能够 select 我想要的过滤器并在.filter()方法中评估它们。下面是一个例子： (" & ").join(filters)的正确方法是什么？ ...

使用带有文本和数字的列对极坐标 DataFrame 进行排序 - Sort polars DataFrame using column with text and numericals

如果我有一个 DataFrame 之类的我怎样才能按数值排序。即我想从 Name 列中提取字符串，并在排序时仅提取数字元素。 IE 看不到所需的极坐标表达式，不确定您是否可以传递自定义 python function。谢谢 ...

极坐标中的字符串操作 - String manipulation in polars

我在 polars 中有一个记录，到目前为止还没有 header。这个header应该是指记录的第一行。在将此行实例化为 header 之前，我想操作这些条目。首先，我想用下划线替换单词之间的换行符和空格。此外，我想用下划线填充 Camel 案例（例如 TestTest -> Test ...

Polars dataframe 在 Python 和 Rust 之间的零拷贝共享示例？ - Example of zero-copy share of a Polars dataframe between Python and Rust?

我有一个 Python function 比如 def add_data(input_df): """ input_df (Polars dataframe) 的一些操作，例如用新值填充一些列 """ 我想使用来自 Rust function 的这个 function。input_df 可能有几十 ...

Polars 消息：eval_binary_same_type,(left_aexpr, +, right_aexpr) = None - Polars message: eval_binary_same_type!(left_aexpr, +, right_aexpr) = None

在运行一些简单的 polars 代码时，我遇到了标题中的消息。下面提供了示例代码及其输出：我很好奇这条消息是什么意思。第一个表达式给了我两个这样的信息。我怀疑它应该与类型差异有某种关系。所以，在第二个表达式中，我将它们转换为相同的类型，但这次我仍然收到一条这样的消息（尽管第一次少于 2 ...

Polars 相当于 SQL `COUNT(DISTINCT expr,[expr...])`，或其他检查唯一性的方法 - Polars equivalent to SQL `COUNT(DISTINCT expr,[expr...])`, or other method of checking uniqueness

在处理数据时，我经常在每个步骤之后添加一个检查，以验证数据是否仍然具有我认为的唯一键。例如，我可能会检查我的数据在(a, b)上是否仍然是唯一的。为此，我通常会检查a列和b列的不同组合数是否等于总行数。在极地中，要获得COUNT(DISTINCT...)我可以做( df .select ...

Polars 模块没有将数据帧附加到 output 文件的方法吗？ - Does Polars module not have a method for appending DataFrames to output files?

很抱歉这个问题，但我从 polars 图书馆开始。我正在阅读 Polars DataFrame 的文档，发现任何.write_*方法都有参数mode 。虽然 pandas DataFrame 具有.to_csv()方法和可用的模式参数，因此允许将 append DataFrame 写入文件。 ...

将日期时间时区感知列转换为具有 UTC 时间偏移量的字符串 - Converting datetime timezone aware column to string with UTC time offset

我有以下 dataframe：df = ( pl.DataFrame( { "int": [1, 2, 3], "date": ["2010-01-31T23:00:00+00:00","2010-02-01T00:00:00+ ...