简体   繁体   中英

`numpy.tile()` sorts automatically - is there an alternative?

I'd like to initialize a pandas DataFrame so that I can populate it with multiple time series.

import pandas as pd
import numpy as np
from string import ascii_uppercase
dt_rng = pd.date_range(start = pd.tseries.tools.to_datetime('2012-12-31'), 
                       end   = pd.tseries.tools.to_datetime('2014-12-28'), 
                       freq  = 'D')
df = pd.DataFrame(index = xrange(len(dt_rng) * 10),
                  columns = ['product', 'dt', 'unit_sales'])
df.product = sorted(np.tile([chr for chr in ascii_uppercase[:10]], len(dt_rng)))
df.dt = np.tile(dt_rng, 10)
df.unit_sales = np.random.random_integers(0, 25, len(dt_rng) * 10)

However, when I check the first few values of df.dt , I see that all values in the field have already been sorted, eg df.dt[:10] yields 2012-12-31 ten times. I'd like to have this output to be 2012-12-31 , 2013-01-01 , ..., 2013-01-08 , 2013-01-09 (first ten values).

In general, I'm looking for behavior similar to R 's "recycling".

A combination of reduce() and the append() method of a pandas.tseries.index.DatetimeIndex object did the trick.

import pandas as pd
import numpy as np
from string import ascii_uppercase
dt_rng = pd.date_range(start = pd.tseries.tools.to_datetime('2012-12-31'), 
                       end   = pd.tseries.tools.to_datetime('2014-12-28'), 
                       freq  = 'D')
df = pd.DataFrame(index = xrange(len(dt_rng) * 10),
                  columns = ['product', 'dt', 'unit_sales'])
df.product = sorted(np.tile([chr for chr in ascii_uppercase[:10]], len(dt_rng)))
df.dt = reduce(lambda x, y: x.append(y), [dt_rng] * 10)
df.unit_sales = np.random.random_integers(0, 25, len(dt_rng) * 10)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM