简体   繁体   中英

Looping Pandas Column Names To Create New Data Frame

I am seeking to loop the columns in a dataframe and when the column name meets a criteria create a new dataframe and/or add it to an existing dataframe. For exmaple - my current dataframe has the following column names:

open high low IVV volume open high low EWH volume open high low INDY volume open high low EWG volume open high low ENZL volume

I want a loop which will find IVV,EWH,INDY,EWG, and ENZL and add them to their own dataframe.

I have tried the following:

Indexlist = ['IVV', 'EWH', 'INDY', 'EWG', 'ENZL']

Attempt to drop the values columns:

for column in data:
    print(column)
    if column != Indexlist:
        data.drop([column], axis=0))

Attempt to del the columns

for column in data:
    print(column)
    if column != Indexlist:
        del data[column]

Attempt to select the columns

data_sample = data[column].isin(Indexlist)

all these methods are throwing errors.

I think need check substrings of columns names by str.contains with regex - join all values of list by | for OR :

data1 = data.loc[:, data.columns.str.contains('|'.join(Indexlist))]

If need select by columns names use subset:

data1 = data[Indexlist]

You can use pd.Index.isin with pd.DataFrame.loc for Boolean indexing:

data_sample = data.loc[:, data.columns.isin(Indexlist)]

Or direct indexing, if you know in advance that all list elements exist as columns in your dataframe:

data_sample = data[Indexlist]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM