I have a dataframe that contains an entry for a symbol occasionally and then a count. I would like to expand the dataframe so that every symbol contains a row for the entire daterange in the dataframe. I want to enter a value of '0' for the count where there is no entry for a symbol on a certain date.
My dataframe:
dates = ['2021-01-01','2021-01-02','2021-01-03']
symbol = ['a','b','a']
count = [1,2,3]
df = pd.DataFrame({'Mention Datetime': dates,
'Symbol': symbol,
'Count':count})
Mention Datetime Symbol Count
0 2021-01-01 a 1
1 2021-01-02 b 2
2 2021-01-03 a 3
what I want it to look like:
Mention Datetime Symbol Count
0 2021-01-01 a 1
1 2021-01-02 a 0
2 2021-01-03 a 3
3 2021-01-01 b 0
4 2021-01-02 b 2
5 2021-01-03 b 0
Use pivot_table
then stack
:
df = df.pivot_table(index='Mention Datetime',
columns='Symbol', fill_value=0
).stack().reset_index()
Output:
Mention Datetime Symbol Count
0 2021-01-01 a 1
1 2021-01-01 b 0
2 2021-01-02 a 0
3 2021-01-02 b 2
4 2021-01-03 a 3
5 2021-01-03 b 0
You can reindex with a new multi index created from the unique values of the columns in question.
import pandas as pd
from io import StringIO
s = '''
Mention Datetime Symbol Count
2021-01-01 a 1
2021-01-02 b 2
2021-01-03 a 3
'''
df = pd.read_fwf(StringIO(s), header=1)
df = df.set_index(['Mention Datetime', 'Symbol'])
df
Count
Mention Datetime Symbol
2021-01-01 a 1
2021-01-02 b 2
2021-01-03 a 3
df = df.reindex(
pd.MultiIndex.from_product(
[
df.index.get_level_values('Mention Datetime').unique(),
df.index.get_level_values('Symbol').unique()
]
)
).fillna(0)
df
Count
Mention Datetime Symbol
2021-01-01 a 1.0
b 0.0
2021-01-02 a 0.0
b 2.0
2021-01-03 a 3.0
b 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.