[英]Pivot Pandas Column of Lists
I have a pandas dataframe that has a column whose values are lists and where another column is a date.我有一个 Pandas 数据框,它有一列的值是列表,另一列是日期。 I would like to create a dataframe that counts the elements of the lists by date.我想创建一个按日期计算列表元素的数据框。
The dataframe looks like:数据框看起来像:
pd.DataFrame(
data={
"col1": ["['a','b']", "['b','c']", "['a','c']", "", "['b']"],
"col2": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"],
},
index=[0, 1, 2, 3, 4],
)
What I would like the dataframe to look like is:我希望数据框看起来像:
pd.DataFrame(
data={"a": [1, 0, 1, 0, 0], "b": [1, 1, 0, 0, 1], "c": [0, 1, 1, 0, 0]},
index=["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"],
)
Any thoughts on how to do this kind of transformation?关于如何进行这种转换的任何想法?
You can use extractall
to extract the values inside ''
, then counts the values with groupby
:您可以使用extractall
提取''
内的值,然后使用groupby
计算值:
out= (df.col1.str.extractall("'([^']*)'")
.groupby(level=0)[0].value_counts()
.unstack(level=1,fill_value=0)
.reindex(df.index, fill_value=0)
)
out.index= df['col2']
print(out)
Output:输出:
0 a b c
col2
2020-01-01 1 1 0
2020-01-02 0 1 1
2020-01-03 1 0 1
2020-01-04 0 0 0
2020-01-05 0 1 0
You can use pd.crosstab
here.您可以在此处使用pd.crosstab
。
df['col1'] = df['col1'].str.findall('\w+')
df_ = df.explode('col1')
pd.crosstab(df_['col2'], df_['col1']).reindex(df_['col2'].unique()).fillna(0)
col1 a b c
col2
2020-01-01 1.0 1.0 0.0
2020-01-02 0.0 1.0 1.0
2020-01-03 1.0 0.0 1.0
2020-01-04 0.0 0.0 0.0
2020-01-05 0.0 1.0 0.0
You could do this this way:你可以这样做:
df = pd.DataFrame(
data={
"col1": [['a','b'], ['b','c'], ['a','c'], ['c'], ['b']],
"col2": ["2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04", "2020-01-05"],
}
)
df2 = df.explode('col1').reset_index(drop=True)
df2["value"]=1
pd.pivot_table(df2, values="value", index=["col2"], columns=["col1"], aggfunc=lambda x: 1, fill_value=0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.