从现有数据框列名称创建新的数据框列

Question

I have a dataframe (stockData) which has stock data for three different types of data (indexed by date) these are LAST,VOLUME,MKTCAP.我有一个数据框 (stockData)，其中包含三种不同类型数据（按日期索引）的股票数据，它们是 LAST、VOLUME、MKTCAP。

here is the head of the dataframe(stockData)这是数据帧的头部（stockData）

                           BBG.XLON.BTA.S_LAST  BBG.XLON.BTA.S_VOLUME  BBG.XLON.BTA.S_MKTCAP  \
date                                                                            
2001-01-02                  572               26605510               37494.60   
2001-01-03                  560               24715470               36708.00   
2001-01-04                  613               52781855               40182.15   
2001-01-05                  630               56600152               41296.50   
2001-01-08                  633               41014402               41493.15   

            BBG.XLON.VOD.S_LAST  BBG.XLON.VOD.S_VOLUME  BBG.XLON.VOD.S_MKTCAP  
date                                                                           
2001-01-02                  NaN                    NaN                    NaN  
2001-01-03               225.00              444328736            145216.0020  
2001-01-04               239.00              488568000            154251.6643  
2001-01-05               242.25              237936704            156349.2288  
2001-01-08               227.75              658059776            146990.8642

Is there a way to take one of these fields for all of the stocks create a new set of columns from this data with a new post fix (_HOLIDAY) so I end up with:有没有办法让所有股票的这些字段中的一个使用新的后期修复（_HOLIDAY）从这些数据中创建一组新的列，所以我最终得到：

              BBG.XLON.BTA.S_LAST  BBG.XLON.BTA.S_VOLUME  BBG.XLON.BTA.S_MKTCAP  BBG.XLON.BTA.S_HOLIDAY  \
date                                                                            
2001-01-02                  572               26605510               37494.60                   NaN  
2001-01-03                  560               24715470               36708.00                   NaN  
2001-01-04                  613               52781855               40182.15                   NaN  
2001-01-05                  630               56600152               41296.50                   NaN  
2001-01-08                  633               41014402               41493.15                   NaN  

            BBG.XLON.VOD.S_LAST  BBG.XLON.VOD.S_VOLUME  BBG.XLON.VOD.S_MKTCAP  BBG.XLON.VOD.S_HOLIDAY  
date                                                                           
2001-01-02                  NaN                    NaN                    NaN                   NaN  
2001-01-03               225.00              444328736            145216.0020                   NaN  
2001-01-04               239.00              488568000            154251.6643                   NaN  
2001-01-05               242.25              237936704            156349.2288                   NaN  
2001-01-08               227.75              658059776            146990.8642                   NaN

Any assistance would be much appreciated.任何帮助将不胜感激。

Answer 1

is that what you want?那是你要的吗？

In [56]: newcols = df.columns.str.replace(r'\.S_.*','.S_HOLIDAY').unique().tolist()

In [57]: newcols
Out[57]: ['BBG.XLON.BTA.S_HOLIDAY', 'BBG.XLON.VOD.S_HOLIDAY']

then you can easily add new columns:然后您可以轻松添加新列：

In [65]: for col in newcols:
   ....:         df[col] = np.nan
   ....:

In [66]: df
Out[66]:
            BBG.XLON.BTA.S_LAST  BBG.XLON.BTA.S_VOLUME  BBG.XLON.BTA.S_MKTCAP  \
2001-01-02                  572               26605510               37494.60
2001-01-03                  560               24715470               36708.00
2001-01-04                  613               52781855               40182.15
2001-01-05                  630               56600152               41296.50
2001-01-08                  633               41014402               41493.15

            BBG.XLON.VOD.S_LAST  BBG.XLON.VOD.S_VOLUME  BBG.XLON.VOD.S_MKTCAP  \
2001-01-02                  NaN                    NaN                    NaN
2001-01-03               225.00            444328736.0            145216.0020
2001-01-04               239.00            488568000.0            154251.6643
2001-01-05               242.25            237936704.0            156349.2288
2001-01-08               227.75            658059776.0            146990.8642

            BBG.XLON.BTA.S_HOLIDAY  BBG.XLON.VOD.S_HOLIDAY
2001-01-02                     NaN                     NaN
2001-01-03                     NaN                     NaN
2001-01-04                     NaN                     NaN
2001-01-05                     NaN                     NaN
2001-01-08                     NaN                     NaN

If the order of columns is important for you you can reorder it like this:如果列的顺序对您很重要，您可以像这样重新排序：

df = df[ordered_column_list]

Answer 2

You can use DataFrame.columns.values to get the column names and then strip the substring after and including the last dot (.):您可以使用DataFrame.columns.values来获取列名，然后在最后一个点 (.) 之后DataFrame.columns.values子字符串，包括最后一个点 (.)：

names=[s[:s.rfind('.')] for s in df.columns.values]

Here I assume that your dataframe is called df .在这里，我假设您的数据框称为df 。 This will result in duplicate names (for .S_LAST , .S_VOLUME and .S_MKTCAP ).这将导致重复名称（对于.S_LAST 、 .S_VOLUME和.S_MKTCAP ）。 Now you can use numpy.unique to remove duplicates:现在您可以使用numpy.unique删除重复项：

import numpy as np
uNames=np.unique(names)

And now you can add your new column <name>.S_HOLIDAY assigning a NaN value:现在您可以添加新列<name>.S_HOLIDAY分配一个 NaN 值：

for n in uNames:
   df[n+'.S_HOLIDAY']=np.NaN

从现有数据框列名称创建新的数据框列

问题描述

2 个解决方案

解决方案1
1 已采纳 2016-04-24 10:02:02

解决方案2
0 2016-04-24 09:36:14

从现有数据框列名称创建新的数据框列

问题描述

2 个解决方案

解决方案1 1 已采纳 2016-04-24 10:02:02

解决方案2 0 2016-04-24 09:36:14

解决方案1
1 已采纳 2016-04-24 10:02:02

解决方案2
0 2016-04-24 09:36:14