简体   繁体   English

从现有数据框列名称创建新的数据框列

[英]Creating new dataframe columns from existing dataframe column names

I have a dataframe (stockData) which has stock data for three different types of data (indexed by date) these are LAST,VOLUME,MKTCAP.我有一个数据框 (stockData),其中包含三种不同类型数据(按日​​期索引)的股票数据,它们是 LAST、VOLUME、MKTCAP。

here is the head of the dataframe(stockData)这是数据帧的头部(stockData)

                           BBG.XLON.BTA.S_LAST  BBG.XLON.BTA.S_VOLUME  BBG.XLON.BTA.S_MKTCAP  \
date                                                                            
2001-01-02                  572               26605510               37494.60   
2001-01-03                  560               24715470               36708.00   
2001-01-04                  613               52781855               40182.15   
2001-01-05                  630               56600152               41296.50   
2001-01-08                  633               41014402               41493.15   

            BBG.XLON.VOD.S_LAST  BBG.XLON.VOD.S_VOLUME  BBG.XLON.VOD.S_MKTCAP  
date                                                                           
2001-01-02                  NaN                    NaN                    NaN  
2001-01-03               225.00              444328736            145216.0020  
2001-01-04               239.00              488568000            154251.6643  
2001-01-05               242.25              237936704            156349.2288  
2001-01-08               227.75              658059776            146990.8642 

Is there a way to take one of these fields for all of the stocks create a new set of columns from this data with a new post fix (_HOLIDAY) so I end up with:有没有办法让所有股票的这些字段中的一个使用新的后期修复(_HOLIDAY)从这些数据中创建一组新的列,所以我最终得到:

              BBG.XLON.BTA.S_LAST  BBG.XLON.BTA.S_VOLUME  BBG.XLON.BTA.S_MKTCAP  BBG.XLON.BTA.S_HOLIDAY  \
date                                                                            
2001-01-02                  572               26605510               37494.60                   NaN  
2001-01-03                  560               24715470               36708.00                   NaN  
2001-01-04                  613               52781855               40182.15                   NaN  
2001-01-05                  630               56600152               41296.50                   NaN  
2001-01-08                  633               41014402               41493.15                   NaN  

            BBG.XLON.VOD.S_LAST  BBG.XLON.VOD.S_VOLUME  BBG.XLON.VOD.S_MKTCAP  BBG.XLON.VOD.S_HOLIDAY  
date                                                                           
2001-01-02                  NaN                    NaN                    NaN                   NaN  
2001-01-03               225.00              444328736            145216.0020                   NaN  
2001-01-04               239.00              488568000            154251.6643                   NaN  
2001-01-05               242.25              237936704            156349.2288                   NaN  
2001-01-08               227.75              658059776            146990.8642                   NaN 

Any assistance would be much appreciated.任何帮助将不胜感激。

is that what you want?那是你要的吗?

In [56]: newcols = df.columns.str.replace(r'\.S_.*','.S_HOLIDAY').unique().tolist()

In [57]: newcols
Out[57]: ['BBG.XLON.BTA.S_HOLIDAY', 'BBG.XLON.VOD.S_HOLIDAY']

then you can easily add new columns:然后您可以轻松添加新列:

In [65]: for col in newcols:
   ....:         df[col] = np.nan
   ....:

In [66]: df
Out[66]:
            BBG.XLON.BTA.S_LAST  BBG.XLON.BTA.S_VOLUME  BBG.XLON.BTA.S_MKTCAP  \
2001-01-02                  572               26605510               37494.60
2001-01-03                  560               24715470               36708.00
2001-01-04                  613               52781855               40182.15
2001-01-05                  630               56600152               41296.50
2001-01-08                  633               41014402               41493.15

            BBG.XLON.VOD.S_LAST  BBG.XLON.VOD.S_VOLUME  BBG.XLON.VOD.S_MKTCAP  \
2001-01-02                  NaN                    NaN                    NaN
2001-01-03               225.00            444328736.0            145216.0020
2001-01-04               239.00            488568000.0            154251.6643
2001-01-05               242.25            237936704.0            156349.2288
2001-01-08               227.75            658059776.0            146990.8642

            BBG.XLON.BTA.S_HOLIDAY  BBG.XLON.VOD.S_HOLIDAY
2001-01-02                     NaN                     NaN
2001-01-03                     NaN                     NaN
2001-01-04                     NaN                     NaN
2001-01-05                     NaN                     NaN
2001-01-08                     NaN                     NaN

If the order of columns is important for you you can reorder it like this:如果列的顺序对您很重要,您可以像这样重新排序:

df = df[ordered_column_list]

You can use DataFrame.columns.values to get the column names and then strip the substring after and including the last dot (.):您可以使用DataFrame.columns.values来获取列名,然后在最后一个点 (.) 之后DataFrame.columns.values子字符串,包括最后一个点 (.):

names=[s[:s.rfind('.')] for s in df.columns.values]

Here I assume that your dataframe is called df .在这里,我假设您的数据框称为df This will result in duplicate names (for .S_LAST , .S_VOLUME and .S_MKTCAP ).这将导致重复名称(对于.S_LAST.S_VOLUME.S_MKTCAP )。 Now you can use numpy.unique to remove duplicates:现在您可以使用numpy.unique删除重复项:

import numpy as np
uNames=np.unique(names)

And now you can add your new column <name>.S_HOLIDAY assigning a NaN value:现在您可以添加新列<name>.S_HOLIDAY分配一个 NaN 值:

for n in uNames:
   df[n+'.S_HOLIDAY']=np.NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM