简体   繁体   中英

Looping through columns using number in column name

I have the following columns in my pandas dataframe - client_1_name, client_2_name, clinet_3_name... all the way to client_10_name.

I want to loop through the columns names using the number in the column name to identify whether the specific column contains a substring - "Nike".

How I would ideally approach the problem:

for i in range(1,10):
 df['Nike'] = df['Client_'+i+'_name'].str.contains('Nike', regex = True)

but I got the following error

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-85-28926af604a8> in <module>()
          2 
          3 for i in range(1,10):
    ----> 4     df_nike['Nike'] = df_nike['client_'+i+'_name'].str.contains('Nike', regex = True)

TypeError: can only concatenate str (not "int") to str

Suggestions on how to do this?

Not sure what you need to do , but simple fix your code add str

for i in range(1,10):
   df['Nike'] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True) # notice here you assign the value to one columns 10 times 

You may want to

for i in range(1,10):
   df['Nike'+str(i)] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True)

You have to convert the integer to a string before concatentating

for i in range(1,10):
# added `str()` around the `i`
    df['Nike'] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True)

If you are using Python 3.6+ you can use f strings

for i in range(1,10):
# added `f` at the beginning of the string and {} around `i`
    df['Nike'] = df[f'Client_{i}_name'].str.contains('Nike', regex = True)

As @Wen-Ben mentioned in the second part of his answer, looping through the columns will result in the overwriting of your new "Nike" column. If you truly want to check all of the columns without overwriting "Nike", you should add i to the column name like so

for i in range(1,10):
# added `f` at the beginning of the string and {} around `i`
    df[f'Nike{i}'] = df[f'Client_{i}_name'].str.contains('Nike', regex = True)

Consider this Dataframe,

df = pd.DataFrame(data = np.random.choice(list('ABCDEFGH')+['Nike'], 100).reshape(10,10), columns = ['Client_'+str(i)+'_name' for i in range(1,11)])

You can check if the column contains Nike using

df.eq('Nike').any()

Client_1_name      True
Client_2_name     False
Client_3_name     False
Client_4_name      True
Client_5_name     False
Client_6_name      True
Client_7_name      True
Client_8_name      True
Client_9_name      True
Client_10_name     True

If you want to extract the column names, try

s = df.eq('Nike').any()
s[s].index

Index(['Client_1_name', 'Client_4_name', 'Client_6_name', 'Client_7_name',
   'Client_8_name', 'Client_9_name', 'Client_10_name'],
  dtype='object')

If you want to extract only the number, try

s[s].index.str.extract('(\d+)').astype(int).values.ravel().tolist()

[1, 4, 6, 7, 8, 9, 10]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM