So I'm new to Pandas and I'm trying to convert my old code to DataFrames and Series. I have the data frame that looks like this:
time data param
t0 -1 x
t1 0 z
t2 -1 y
t3 1 x
t4 -1 y
I need to insert intermediate rows for every 1 to -1 and -1 to 1 transition. This row should contain backfilled time and param , and the data value should be zero.
This is how it should look like after that operation:
time data param
t0 -1 x
t1 0 z
t2 -1 y
t3 0 x <-- added row
t3 1 x
t4 0 y <-- added row
t4 -1 y
So how can I achieve this? I guess I can create new DataFrame by scanning original one, row by row, and comparing last saved data value with a current one, yielding additional zero when needed. Can you suggest better solutions, avoiding row by row iteration?
UPDATE
After reading Primer's answer I have come to another solution:
Reading data:
import pandas as pd
df = pd.read_csv(pd.io.common.StringIO("""time data param
t0 -1 x
t1 0 z
t2 -1 y
t3 1 x
t4 -1 y"""), sep='\s+')
df
Find 1->-1 and -1->1 transitions, count them, change index values, reindex with full range to introduce missing rows
df.index += (df.data * df.data.shift() < 0).astype(int).cumsum()
df = df.reindex(arange(df.index[-1] + 1))
Fill missing values
df[['time','param']] = df[['time','param']].bfill()
df.data.fillna(0, inplace=True)
I'm still looking for better solutions. Please share your ideas.
You could do it like this:
import pandas as pd
df = pd.read_csv(pd.io.common.StringIO("""time data param
t0 -1 x
t1 0 z
t2 -1 y
t3 1 x
t4 -1 y"""), sep='\s+')
df['count'] = arange(df.shape[0])
df
Setup filters for changes from -1 to 1 and back:
d1to_1 = (df.data == -1) & (df.data.shift() == 1)
d_1to1 = (df.data == 1) & (df.data.shift() == -1)
Copy the data to new dataframes (to avoid SettingWithCopyWarning):
df1to_1 = df.loc[d1to_1].copy(deep=True)
df_1to1 = df.loc[d_1to1].copy(deep=True)
Modify the new data according to your needs, changing the counter to ensure new rows are above old ones:
df_1to1['data'] = 0
df_1to1['count'] = df_1to1['count'] - 1
df1to_1['data'] = 0
df1to_1['count'] = df1to_1['count'] - 1
Concat old and new dataframes, sorting by time and counter, and then reset index.
df = pd.concat([df, df1to_1, df_1to1], ignore_index=True).sort(['time','count']).reset_index(drop=True)
del df['count']
df
This should produce desired output:
time data param
0 t0 -1 x
1 t1 0 z
2 t2 -1 y
3 t3 0 x
4 t3 1 x
5 t4 0 y
6 t4 -1 y
If you could live with new rows being after old ones then you could skip the counter
part.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.