帶有列表值的熊貓數據框列

Question

我有一個 Pandas 數據框，其中的列包含值列表。 像下面這樣。

            A                           B                           
0   ['x','x','y','y','z']           ['m','m','n','n','p']

我想為列表中的每個唯一項目創建單獨的列，並在這些新列下提及每個項目的計數。

            A                           B                       x   y   z   m   n   p           
0   ['x','x','y','y','z']           ['m','m','n','n','p']       2   2   1   2   2   1

有人可以幫忙編寫代碼嗎？

Answer 1

用：

pd.concat([df,df.stack().explode().value_counts().to_frame().T],axis=1)

輸出：

                 A                B  m  x  y  n  z  p
0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  2  2  1  1

如果你想保持列表的順序：

s=df.stack().explode()
pd.concat([df,s.value_counts().reindex(s.drop_duplicates()).to_frame().T],axis=1)

                 A                B  x  y  z  m  n  p
0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  1  2  2  1

不止一排：

pd.concat([df,df.stack().explode().groupby(level=0).value_counts().unstack()],axis=1)

                 A                b    m    n    p    q    x    y    z
0  [x, x, y, y, z]  [m, m, n, n, p]  2.0  2.0  1.0  NaN  2.0  2.0  1.0
1  [y, y, y, y, z]  [p, q, n, n, p]  NaN  2.0  2.0  1.0  NaN  4.0  1.0

Answer 2

這為你做：

df = pd.DataFrame([[0,['x','x','y','y','z'], ['m','m','n','n','p']]], columns = ['index', 'A', 'B'])

unique_vals = set([i for l in df['A'] for i in l] + [i for l in df['B'] for i in l]) # get all unique vals
for val in unique_vals:
    df[val] = df[['A', 'B']].apply(lambda row: sum([row[i].count(val) for i in row.index]), axis = 1) # count occurences across all columns for each row

輸出

print(df.to_string())

   index                A                B  m  x  p  n  y  z
0      0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  1  2  2  1

Answer 3

我假設您的真實數據超過 1 行。 因此，我使用collections.Counter並構建一個新的數據框並加入

在您的示例df

from collections import Counter

df_t = pd.DataFrame(df.sum(1).map(Counter).tolist())
df_final = df.join(df_t)

Out[109]:
                 A                B  x  y  z  m  n  p
0  [x, x, y, y, z]  [m, m, n, n, p]  2  2  1  2  2  1

在具有超過 1 行的示例數據幀上

df_more
Out[110]:
                 A                B
0  [x, x, y, y, z]  [m, m, n, n, p]
1  [y, y, y, y, z]  [p, q, n, n, p]

from collections import Counter

df_t = pd.DataFrame(df_more.sum(1).map(Counter).tolist())
df_final = df_more.join(df_t)

Out[115]:
                A                B    x  y  z    m  n  p    q
  [x, x, y, y, z]  [m, m, n, n, p]  2.0  2  1  2.0  2  1  NaN
  [y, y, y, y, z]  [p, q, n, n, p]  NaN  4  1  NaN  2  2  1.0

Answer 4

您可以使用函數chain.from_iterable和Counter ：

from collections import Counter
from itertools import chain

df.join(df.apply(lambda x: pd.Series(Counter(chain.from_iterable(x))), axis=1))

帶有列表值的熊貓數據框列

問題描述

4 個解決方案

解決方案1
4 2019-12-04 19:22:35

解決方案2
1 2019-12-04 19:20:29

解決方案3
1 2019-12-04 20:28:50

解決方案4
0 2019-12-04 20:26:11

帶有列表值的熊貓數據框列

問題描述

4 個解決方案

解決方案1 4 2019-12-04 19:22:35

解決方案2 1 2019-12-04 19:20:29

解決方案3 1 2019-12-04 20:28:50

解決方案4 0 2019-12-04 20:26:11

解決方案1
4 2019-12-04 19:22:35

解決方案2
1 2019-12-04 19:20:29

解決方案3
1 2019-12-04 20:28:50

解決方案4
0 2019-12-04 20:26:11