[英]Creating new dataframe columns from existing dataframe column names
I have a dataframe (stockData) which has stock data for three different types of data (indexed by date) these are LAST,VOLUME,MKTCAP.我有一个数据框 (stockData),其中包含三种不同类型数据(按日期索引)的股票数据,它们是 LAST、VOLUME、MKTCAP。
here is the head of the dataframe(stockData)这是数据帧的头部(stockData)
BBG.XLON.BTA.S_LAST BBG.XLON.BTA.S_VOLUME BBG.XLON.BTA.S_MKTCAP \
date
2001-01-02 572 26605510 37494.60
2001-01-03 560 24715470 36708.00
2001-01-04 613 52781855 40182.15
2001-01-05 630 56600152 41296.50
2001-01-08 633 41014402 41493.15
BBG.XLON.VOD.S_LAST BBG.XLON.VOD.S_VOLUME BBG.XLON.VOD.S_MKTCAP
date
2001-01-02 NaN NaN NaN
2001-01-03 225.00 444328736 145216.0020
2001-01-04 239.00 488568000 154251.6643
2001-01-05 242.25 237936704 156349.2288
2001-01-08 227.75 658059776 146990.8642
Is there a way to take one of these fields for all of the stocks create a new set of columns from this data with a new post fix (_HOLIDAY) so I end up with:有没有办法让所有股票的这些字段中的一个使用新的后期修复(_HOLIDAY)从这些数据中创建一组新的列,所以我最终得到:
BBG.XLON.BTA.S_LAST BBG.XLON.BTA.S_VOLUME BBG.XLON.BTA.S_MKTCAP BBG.XLON.BTA.S_HOLIDAY \
date
2001-01-02 572 26605510 37494.60 NaN
2001-01-03 560 24715470 36708.00 NaN
2001-01-04 613 52781855 40182.15 NaN
2001-01-05 630 56600152 41296.50 NaN
2001-01-08 633 41014402 41493.15 NaN
BBG.XLON.VOD.S_LAST BBG.XLON.VOD.S_VOLUME BBG.XLON.VOD.S_MKTCAP BBG.XLON.VOD.S_HOLIDAY
date
2001-01-02 NaN NaN NaN NaN
2001-01-03 225.00 444328736 145216.0020 NaN
2001-01-04 239.00 488568000 154251.6643 NaN
2001-01-05 242.25 237936704 156349.2288 NaN
2001-01-08 227.75 658059776 146990.8642 NaN
Any assistance would be much appreciated.任何帮助将不胜感激。
is that what you want?那是你要的吗?
In [56]: newcols = df.columns.str.replace(r'\.S_.*','.S_HOLIDAY').unique().tolist()
In [57]: newcols
Out[57]: ['BBG.XLON.BTA.S_HOLIDAY', 'BBG.XLON.VOD.S_HOLIDAY']
then you can easily add new columns:然后您可以轻松添加新列:
In [65]: for col in newcols:
....: df[col] = np.nan
....:
In [66]: df
Out[66]:
BBG.XLON.BTA.S_LAST BBG.XLON.BTA.S_VOLUME BBG.XLON.BTA.S_MKTCAP \
2001-01-02 572 26605510 37494.60
2001-01-03 560 24715470 36708.00
2001-01-04 613 52781855 40182.15
2001-01-05 630 56600152 41296.50
2001-01-08 633 41014402 41493.15
BBG.XLON.VOD.S_LAST BBG.XLON.VOD.S_VOLUME BBG.XLON.VOD.S_MKTCAP \
2001-01-02 NaN NaN NaN
2001-01-03 225.00 444328736.0 145216.0020
2001-01-04 239.00 488568000.0 154251.6643
2001-01-05 242.25 237936704.0 156349.2288
2001-01-08 227.75 658059776.0 146990.8642
BBG.XLON.BTA.S_HOLIDAY BBG.XLON.VOD.S_HOLIDAY
2001-01-02 NaN NaN
2001-01-03 NaN NaN
2001-01-04 NaN NaN
2001-01-05 NaN NaN
2001-01-08 NaN NaN
If the order of columns is important for you you can reorder it like this:如果列的顺序对您很重要,您可以像这样重新排序:
df = df[ordered_column_list]
You can use DataFrame.columns.values
to get the column names and then strip the substring after and including the last dot (.):您可以使用
DataFrame.columns.values
来获取列名,然后在最后一个点 (.) 之后DataFrame.columns.values
子字符串,包括最后一个点 (.):
names=[s[:s.rfind('.')] for s in df.columns.values]
Here I assume that your dataframe is called df
.在这里,我假设您的数据框称为
df
。 This will result in duplicate names (for .S_LAST
, .S_VOLUME
and .S_MKTCAP
).这将导致重复名称(对于
.S_LAST
、 .S_VOLUME
和.S_MKTCAP
)。 Now you can use numpy.unique
to remove duplicates:现在您可以使用
numpy.unique
删除重复项:
import numpy as np
uNames=np.unique(names)
And now you can add your new column <name>.S_HOLIDAY
assigning a NaN value:现在您可以添加新列
<name>.S_HOLIDAY
分配一个 NaN 值:
for n in uNames:
df[n+'.S_HOLIDAY']=np.NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.