在pandas數據框中分組並合並一個numpy數組列

Question

我有一個CSV文件，其中的一列看起來像一個numpy數組。 前幾行如下所示

first,second,third
170.0,2,[19 234 376]
170.0,3,[19 23 23]
162.0,4,[1 2 3]
162.0,5,[1 3 4]

當我使用熊貓數據框加載此CSV並使用以下代碼時

data = pd.read_csv('myfile.csv', converters = {'first': np.float64, 'second': np.int64, 'third': np.array})

現在，我想基於“第一”列進行分組，並合並“第三”列。 所以這樣做之后，我的數據框應該看起來像

170.0, [19 23 234 376]
162.0, [1 2 3 4]

我該如何實現？ 我嘗試了以下類似的多種方法，但似乎無濟於事。

group_data = data.groupby('first')
group_data['third'].apply(lambda x: np.unique(np.concatenate(x)))

Answer 1

在您當前的csv文件中，“第三”列以字符串形式而不是列表形式出現。

可能會有更好的方法轉換為列表，但是這里...

from ast import literal_eval

data = pd.read_csv('test_groupby.csv')

# Convert to a string representation of a list...
data['third'] = data['third'].str.replace(' ', ',')

# Convert string to list...
data['third'] = data['third'].apply(literal_eval)

group_data=data.groupby('first')

# Two secrets here revealed
# x.values instead of x since x is a Series
# list(...) to return an aggregated value
#     (np.array should work here, but...?)
ans = group_data.aggregate(
      {'third': lambda x: list(np.unique(
                               np.concatenate(x.values)))})

print(ans)
                    third
first                    
162          [1, 2, 3, 4]
170    [19, 23, 234, 376]

在pandas數據框中分組並合並一個numpy數組列

問題描述

1 個解決方案

解決方案1
1 已采納 2015-08-21 23:24:53

在pandas數據框中分組並合並一個numpy數組列

問題描述

1 個解決方案

解決方案1 1 已采納 2015-08-21 23:24:53

解決方案1
1 已采納 2015-08-21 23:24:53