简体   繁体   中英

Add IDs to dataframe with random Noise

My initial dataframe looks as follows:

import pandas as pd
df = pd.DataFrame({
  "id":[1,1,1,1,2,2],
   "time": [1,2,3,4,5,6],
   "x": [1,2,3,4,9,11 ],
   "y": [5,6,7,8,3,2],
})

So I have two IDs (1 and 2) or two different time series. Now I want to add some random noise to x- and y-value for each ID and save it as new IDs (with same length) in the initial df:

# Noise
import numpy as np
noise = np.random.normal(0,1,#number of elements you get in array noise)
new_signal = original + noise
# https://stackoverflow.com/questions/14058340/adding-noise-to-a-signal-in-python

So the resulting df would look something like the following (the values are just an example what the resulting output could be):

df = pd.DataFrame({
  "id":[1,1,1,1,2,2      ,3,3,3,3,    4,4],
   "time": [1,2,3,4,5,6  ,7,8,9,10,    11,12      ],
   "x": [1,2,3,4,9,11,    1.0005,2.3256,3.1256,4.5647,   9.6514,11.4567 ],
   "y": [5,6,7,8,3,2,  5.0505,6.0276,7.1056,8.5607,   3.6014,2.4567],
})

As you can see: 2 new IDs (3 and 4) have been added and also the values with noise.

Currently I am trying it with different loops but it seems quite complicated. Any suggestions?

Bonus question: How not just duplicate, but adding it by 3 times.

You can build a new dataframe and concat them:

df1 = pd.concat([df['id'] + df['id'].max(),
                 df['time'] + df['time'].max(),
                 df['x'] + np.random.normal(0, 1, len(df)),
                 df['y'] + np.random.normal(0, 1, len(df))], axis=1) \
        .set_index(df.index + len(x))

out = pd.concat([df, df1])

Output:

>>> out
    id  time          x         y
0    1     1   1.000000  5.000000
1    1     2   2.000000  6.000000
2    1     3   3.000000  7.000000
3    1     4   4.000000  8.000000
4    2     5   9.000000  3.000000
5    2     6  11.000000  2.000000
10   3     7   1.479734  5.720535
11   3     8   0.076273  6.256060
12   3     9   2.856642  6.845974
13   3    10   4.119396  7.738969
14   4    11   9.220569  2.710783
15   4    12  10.451495  1.245976

You can reindex and add values to increment the id, time and add noise on the data.

This works for an arbitrary number of repeats:

import numpy as np

N = 3
(df.reindex(np.tile(df.index, N))  # replicate N times the dataframe
   .add(np.c_[np.repeat(np.arange(N), len(df)),         # increment id
              np.repeat(np.arange(N), len(df))*len(df), # increment time
              np.r_[np.zeros((len(df), 2)),             # no noise for first
                    np.random.normal(size=(len(df)*(N-1), 2))] # extra noise
              ])
)

Example with N=3 :

    id  time          x         y
0  1.0   1.0   1.000000  5.000000
1  1.0   2.0   2.000000  6.000000
2  1.0   3.0   3.000000  7.000000
3  1.0   4.0   4.000000  8.000000
4  2.0   5.0   9.000000  3.000000
5  2.0   6.0  11.000000  2.000000
0  2.0   7.0   0.651240  4.713942
1  2.0   8.0   1.426533  5.446687
2  2.0   9.0   3.187928  7.430646
3  2.0  10.0   2.998382  9.421992
4  3.0  11.0  10.282871  2.108504
5  3.0  12.0  10.531258  2.439636
0  3.0  13.0  -0.200542  5.286711
1  3.0  14.0   0.350241  8.114173
2  3.0  15.0   1.843902  6.725896
3  3.0  16.0   3.831534  7.964400
4  4.0  17.0   7.612370  2.737872
5  4.0  18.0  12.129517  2.809689

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM