Pivot Pandas 列表列

Question

I have a pandas dataframe that has a column whose values are lists and where another column is a date.我有一个 Pandas 数据框，它有一列的值是列表，另一列是日期。 I would like to create a dataframe that counts the elements of the lists by date.我想创建一个按日期计算列表元素的数据框。

The dataframe looks like:数据框看起来像：

数据框的图像。我还不够厉害，不能直接发图片

pd.DataFrame(
    data={
        "col1": ["['a','b']", "['b','c']", "['a','c']", "", "['b']"],
        "col2": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"],
    },
    index=[0, 1, 2, 3, 4],
)

What I would like the dataframe to look like is:我希望数据框看起来像：

所需数据框的图像

pd.DataFrame(
    data={"a": [1, 0, 1, 0, 0], "b": [1, 1, 0, 0, 1], "c": [0, 1, 1, 0, 0]},
    index=["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"],
)

Any thoughts on how to do this kind of transformation?关于如何进行这种转换的任何想法？

Answer 1

You can use extractall to extract the values inside '' , then counts the values with groupby :您可以使用extractall提取''内的值，然后使用groupby计算值：

out= (df.col1.str.extractall("'([^']*)'")
   .groupby(level=0)[0].value_counts()
   .unstack(level=1,fill_value=0)
   .reindex(df.index, fill_value=0)
)

out.index= df['col2']
print(out)

Output:输出：

0           a  b  c
col2               
2020-01-01  1  1  0
2020-01-02  0  1  1
2020-01-03  1  0  1
2020-01-04  0  0  0
2020-01-05  0  1  0

Answer 2

You can use pd.crosstab here.您可以在此处使用pd.crosstab 。

df['col1'] = df['col1'].str.findall('\w+')
df_ = df.explode('col1')
pd.crosstab(df_['col2'], df_['col1']).reindex(df_['col2'].unique()).fillna(0)

col1          a    b    c
col2                     
2020-01-01  1.0  1.0  0.0
2020-01-02  0.0  1.0  1.0
2020-01-03  1.0  0.0  1.0
2020-01-04  0.0  0.0  0.0
2020-01-05  0.0  1.0  0.0

Answer 3

You could do this this way:你可以这样做：


df = pd.DataFrame(
    data={
        "col1": [['a','b'], ['b','c'], ['a','c'], ['c'], ['b']],
        "col2": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"],
    }
)
df2 = df.explode('col1').reset_index(drop=True)
df2["value"]=1
pd.pivot_table(df2, values="value", index=["col2"], columns=["col1"], aggfunc=lambda x: 1, fill_value=0)

Pivot Pandas 列表列

问题描述

3 个解决方案

解决方案1
2 2020-11-12 15:43:19

解决方案2
2 已采纳 2020-11-12 15:53:44

解决方案3
2 2020-11-12 16:21:47

Pivot Pandas 列表列

问题描述

3 个解决方案

解决方案1 2 2020-11-12 15:43:19

解决方案2 2 已采纳 2020-11-12 15:53:44

解决方案3 2 2020-11-12 16:21:47

解决方案1
2 2020-11-12 15:43:19

解决方案2
2 已采纳 2020-11-12 15:53:44

解决方案3
2 2020-11-12 16:21:47