計算 DataFrame 列內列表中的元素數

Question

我有一個DataFrame ，格式如下：

ID	ID_鏈接
0	[10]
1個	[11, 12, 13]
2個	[14, 15]
3個	[16]
4個	[17, 18, 19, 20]

如何知道有多少 ID 具有包含多個元素的 ID_links？ 換句話說，該列表中 ID_links 包含超過 1 個值的 ID 的百分比是多少？

偽代碼：

Go通過DataFrame的每一行
統計列表中的元素個數，如果元素個數只有一個什么都不做，否則加1
將計數器值除以 DataFrame 的長度

如何在 Python/Spark 中實現它？

Answer 1

比遍歷 DataFrame 的所有行更好的是使用numpy.where像這樣

import numpy as np
import pandas as pd
df = pd.DataFrame({'ID': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
                   'ID_links': {0: [10], 1: [11, 12, 13], 2: [14, 15], 3: [16], 4: [17, 18, 19, 20]}})
np.where(df.ID_links.map(len) > 1)

output：

(array([1, 2, 4]),)

您可以將該列表中的元素數除以 DataFrame 中的行數以獲得所需的 output

where = np.where(df.ID_links.map(len) > 1)
len(where[0]) / len(df.index)  # 0.6 = 60%

如果您有任何問題，請告訴我:)

計算 DataFrame 列內列表中的元素數

問題描述

1 個解決方案

解決方案1
0 2022-09-30 19:14:25

計算 DataFrame 列內列表中的元素數

問題描述

1 個解決方案

解決方案1 0 2022-09-30 19:14:25

解決方案1
0 2022-09-30 19:14:25