I have the following columns in my pandas dataframe - client_1_name, client_2_name, clinet_3_name... all the way to client_10_name.
I want to loop through the columns names using the number in the column name to identify whether the specific column contains a substring - "Nike".
How I would ideally approach the problem:
for i in range(1,10):
df['Nike'] = df['Client_'+i+'_name'].str.contains('Nike', regex = True)
but I got the following error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-85-28926af604a8> in <module>()
2
3 for i in range(1,10):
----> 4 df_nike['Nike'] = df_nike['client_'+i+'_name'].str.contains('Nike', regex = True)
TypeError: can only concatenate str (not "int") to str
Suggestions on how to do this?
Not sure what you need to do , but simple fix your code add str
for i in range(1,10):
df['Nike'] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True) # notice here you assign the value to one columns 10 times
You may want to
for i in range(1,10):
df['Nike'+str(i)] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True)
You have to convert the integer to a string before concatentating
for i in range(1,10):
# added `str()` around the `i`
df['Nike'] = df['Client_'+str(i)+'_name'].str.contains('Nike', regex = True)
If you are using Python 3.6+ you can use f strings
for i in range(1,10):
# added `f` at the beginning of the string and {} around `i`
df['Nike'] = df[f'Client_{i}_name'].str.contains('Nike', regex = True)
As @Wen-Ben mentioned in the second part of his answer, looping through the columns will result in the overwriting of your new "Nike" column. If you truly want to check all of the columns without overwriting "Nike", you should add i
to the column name like so
for i in range(1,10):
# added `f` at the beginning of the string and {} around `i`
df[f'Nike{i}'] = df[f'Client_{i}_name'].str.contains('Nike', regex = True)
Consider this Dataframe,
df = pd.DataFrame(data = np.random.choice(list('ABCDEFGH')+['Nike'], 100).reshape(10,10), columns = ['Client_'+str(i)+'_name' for i in range(1,11)])
You can check if the column contains Nike using
df.eq('Nike').any()
Client_1_name True
Client_2_name False
Client_3_name False
Client_4_name True
Client_5_name False
Client_6_name True
Client_7_name True
Client_8_name True
Client_9_name True
Client_10_name True
If you want to extract the column names, try
s = df.eq('Nike').any()
s[s].index
Index(['Client_1_name', 'Client_4_name', 'Client_6_name', 'Client_7_name',
'Client_8_name', 'Client_9_name', 'Client_10_name'],
dtype='object')
If you want to extract only the number, try
s[s].index.str.extract('(\d+)').astype(int).values.ravel().tolist()
[1, 4, 6, 7, 8, 9, 10]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.