简体   繁体   中英

python - insert intermediate values into Pandas DataFrame

So I'm new to Pandas and I'm trying to convert my old code to DataFrames and Series. I have the data frame that looks like this:

time    data    param
t0      -1      x
t1       0      z
t2      -1      y
t3       1      x
t4      -1      y

I need to insert intermediate rows for every 1 to -1 and -1 to 1 transition. This row should contain backfilled time and param , and the data value should be zero.

This is how it should look like after that operation:

time    data    param
t0      -1      x
t1       0      z
t2      -1      y
t3       0      x       <-- added row
t3       1      x
t4       0      y       <-- added row
t4      -1      y

So how can I achieve this? I guess I can create new DataFrame by scanning original one, row by row, and comparing last saved data value with a current one, yielding additional zero when needed. Can you suggest better solutions, avoiding row by row iteration?

UPDATE

After reading Primer's answer I have come to another solution:

Reading data:

import pandas as pd
df = pd.read_csv(pd.io.common.StringIO("""time    data    param
t0      -1      x
t1       0      z
t2      -1      y
t3       1      x
t4      -1      y"""), sep='\s+')
df

Find 1->-1 and -1->1 transitions, count them, change index values, reindex with full range to introduce missing rows

df.index += (df.data * df.data.shift() < 0).astype(int).cumsum()
df = df.reindex(arange(df.index[-1] + 1))

Fill missing values

df[['time','param']] = df[['time','param']].bfill()
df.data.fillna(0, inplace=True)

I'm still looking for better solutions. Please share your ideas.

You could do it like this:

import pandas as pd
df = pd.read_csv(pd.io.common.StringIO("""time    data    param
t0      -1      x
t1       0      z
t2      -1      y
t3       1      x
t4      -1      y"""), sep='\s+')
df['count'] = arange(df.shape[0])
df

Setup filters for changes from -1 to 1 and back:

d1to_1 = (df.data == -1) & (df.data.shift() == 1)
d_1to1 = (df.data == 1) & (df.data.shift() == -1)

Copy the data to new dataframes (to avoid SettingWithCopyWarning):

df1to_1 = df.loc[d1to_1].copy(deep=True)
df_1to1 = df.loc[d_1to1].copy(deep=True)

Modify the new data according to your needs, changing the counter to ensure new rows are above old ones:

df_1to1['data'] = 0
df_1to1['count'] = df_1to1['count'] - 1
df1to_1['data'] = 0
df1to_1['count'] = df1to_1['count'] - 1

Concat old and new dataframes, sorting by time and counter, and then reset index.

df = pd.concat([df, df1to_1, df_1to1], ignore_index=True).sort(['time','count']).reset_index(drop=True)
del df['count']
df

This should produce desired output:

  time  data param
0   t0    -1     x
1   t1     0     z
2   t2    -1     y
3   t3     0     x
4   t3     1     x
5   t4     0     y
6   t4    -1     y

If you could live with new rows being after old ones then you could skip the counter part.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM