標簽[python-polars] - 堆棧內存溢出

我可以在 `polars` 的以下表達式中使用新創建的變量嗎？

[英]Can I use newly created variables in the following expressions in `polars`?

在R （特別是在dplyr::mutate() ）中，我習慣於在以下表達式中使用新創建的變量，如下所示：library(dplyr, warn.conflicts = FALSE) head(iris) |> mutate( sp1 = Sepal.Length + 1, ...

在 Polars 中進行“索引”查找的最快方法是什么？

[英]What is the fastest way to do "indexed" look-ups in Polars?

我正在處理在 memory 中完全加載的大型極地數據幀。每一行都由列 entityId (Int64) 和 entryDate (date) 唯一索引。我知道 poalars 沒有索引，但我仍然需要對這些表進行臨時數據查找，而且它非常頻繁，占用了我應用程序運行時間的很大一部分。目前我正在使用. ...

將 2 列極坐標 dataframe 轉換為字典，其鍵作為第一列元素，第二列元素作為值

[英]convert 2 columns of polars dataframe to dictionary having its key as first column elements and second column elements as values

我正在使用以下 dataframe 轉換為特定格式的字典。但是，我收到錯誤 TypeError: unhashable type: 'Series' ...

從鑲木地板文件中查詢值小於特定數量的排序列的最后一行

[英]Querying last row of sorted column where value is less than specific amount from parquet file

我有一個大型鑲木地板文件，其中一列中的數據已排序。下面是一個非常簡化的示例。我有興趣查詢 Y 列的最后一個值，因為 X 以最有效的方式使用 python 小於某個數量。我保證 X 列按升序排序。例如，假設 X 小於 11，我希望 Y 值為“綠色”。我嘗試了以下方法：上面的代碼“有效”，但 ...

查找 B 列中的值大於 A 列中的值的第一個索引

[英]Finding first index of where value in column B is greater than a value in column A

我想知道當 A 列中的值大於 B 列中的值時第一次出現（索引）。目前我使用 for 循環（而且它非常慢）但我想可以在滾動中做到這一點window。df = polars.DataFrame({"idx": [i for i in range(5)], "col_a": [1,2,3,4,4], "c ...

在 Polars 中從年月日構造日期列

[英]Construct date column from year, month and day in Polars

考慮以下 Polars dataframe：import polars as pl df = pl.DataFrame({'year': [2023], 'month': [2], 'day': [1]}) 我想從year 、 month和day構造一個日期列。我知道這可以通過首先連接成一個字符 ...

剝離整個極地 dataframe

[英]strip entire polars dataframe

我想用這行代碼從極坐標 dataframe 中去除前導和尾隨空格：但它沒有用。我怎樣才能剝離整個極地 dataframe？ ...

給定一個包含 n 列數字的數據框，您如何計算所有列對組合的 Pearson 相關性？

[英]Given a data frame with n columns of numbers, how could you calculate the Pearson correlation of all column-pair combinations?

假設我有一個這樣的 Polars 數據框：我正在尋找計算所有列（日期一除外）的每個對組合之間的 Pearson 相關性。結果看起來像這樣：我的直覺是我需要執行以下操作：獲取列 [1..] 的笛卡爾積作為新數據框。使用 Polars 表達式，計算每個序列對的pearson_corr 。我 ...

Polars 案例說明

[英]Polars Case Statement

我正在嘗試從 Python 中提取package 極坐標。我來自 R 背景，所以請理解這可能是一個非常簡單的問題。我想實現一個 case 語句，如果以下任何條件為真，它會將其標記為 1，否則將為 0。我的新列將被稱為“my_new_column_flag” 但是我收到錯誤消息回溯（最近調用最后）： ...

如何在 Python 中使用 polars package 讀取 SQLite 數據庫文件

[英]How to read a SQLite database file using polars package in Python

我想使用polars package 讀取一個 SQLite 數據庫文件（database.sqlite）。我嘗試了以下操作但未成功：出現以下錯誤：有什么建議么？ ...

如何在 polars 中用條件填充列

[英]How to fill column with condition in polars

我想使用具有條件的其他列值添加新列在 pandas 中，我這樣做如下import pandas as pd df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}) df['c'] = df['a'] df.loc[df['b']==4, 'c'] = df['b' ...

如何將 Poisson CDF 寫成 Python 極坐標表達式

[英]How to Write Poisson CDF as Python Polars Expression

我有一個極坐標表達式集合，用於為 ML model 生成特征。我想向這個集合添加一個 poission cdf 特征，同時保持惰性執行（具有速度、緩存等優勢）。到目前為止，我還沒有找到實現這一目標的簡單方法。我已經能夠在所需的惰性表達式框架之外獲得我想要的結果：但是，實際上我希望它看起來像： ...

如何列出、連接和計算極坐標表達式？

[英]How to list, concatenate, and evaluate polars expressions?

我想在 object（列表、字典或其他）中存儲許多不同的過濾器，然后能夠 select 我想要的過濾器並在.filter()方法中評估它們。下面是一個例子： (" & ").join(filters)的正確方法是什么？ ...

使用帶有文本和數字的列對極坐標 DataFrame 進行排序

[英]Sort polars DataFrame using column with text and numericals

如果我有一個 DataFrame 之類的我怎樣才能按數值排序。即我想從 Name 列中提取字符串，並在排序時僅提取數字元素。 IE 看不到所需的極坐標表達式，不確定您是否可以傳遞自定義 python function。謝謝 ...

極坐標中的字符串操作

[英]String manipulation in polars

我在 polars 中有一個記錄，到目前為止還沒有 header。這個header應該是指記錄的第一行。在將此行實例化為 header 之前，我想操作這些條目。首先，我想用下划線替換單詞之間的換行符和空格。此外，我想用下划線填充 Camel 案例（例如 TestTest -> Test ...

Polars dataframe 在 Python 和 Rust 之間的零拷貝共享示例？

[英]Example of zero-copy share of a Polars dataframe between Python and Rust?

我有一個 Python function 比如 def add_data(input_df): """ input_df (Polars dataframe) 的一些操作，例如用新值填充一些列 """ 我想使用來自 Rust function 的這個 function。input_df 可能有幾十 ...

Polars 消息：eval_binary_same_type,(left_aexpr, +, right_aexpr) = None

[英]Polars message: eval_binary_same_type!(left_aexpr, +, right_aexpr) = None

在運行一些簡單的 polars 代碼時，我遇到了標題中的消息。下面提供了示例代碼及其輸出：我很好奇這條消息是什么意思。第一個表達式給了我兩個這樣的信息。我懷疑它應該與類型差異有某種關系。所以，在第二個表達式中，我將它們轉換為相同的類型，但這次我仍然收到一條這樣的消息（盡管第一次少於 2 ...

Polars 相當於 SQL `COUNT(DISTINCT expr,[expr...])`，或其他檢查唯一性的方法

[英]Polars equivalent to SQL `COUNT(DISTINCT expr,[expr...])`, or other method of checking uniqueness

在處理數據時，我經常在每個步驟之后添加一個檢查，以驗證數據是否仍然具有我認為的唯一鍵。例如，我可能會檢查我的數據在(a, b)上是否仍然是唯一的。為此，我通常會檢查a列和b列的不同組合數是否等於總行數。在極地中，要獲得COUNT(DISTINCT...)我可以做( df .select ...

Polars 模塊沒有將數據幀附加到 output 文件的方法嗎？

[英]Does Polars module not have a method for appending DataFrames to output files?

很抱歉這個問題，但我從 polars 圖書館開始。我正在閱讀 Polars DataFrame 的文檔，發現任何.write_*方法都有參數mode 。雖然 pandas DataFrame 具有.to_csv()方法和可用的模式參數，因此允許將 append DataFrame 寫入文件。 ...

將日期時間時區感知列轉換為具有 UTC 時間偏移量的字符串

[英]Converting datetime timezone aware column to string with UTC time offset

我有以下 dataframe：df = ( pl.DataFrame( { "int": [1, 2, 3], "date": ["2010-01-31T23:00:00+00:00","2010-02-01T00:00:00+ ...