简体   繁体   中英

How to check if the first four characters of a column are 'http' or not?

I have a dataframe like:

df['Web']

I just want to check the first four characters of df['Web'] is 'http' or not.

I don't want to check if df['Web'] is in url format or not.

And how to use if condition like:

if (firstfour=='http'):
   print("starts with http")
else:
   print("doesn't starts with http")

You can use string.startswith() . However you should not that it would also match https as well.

You could use regex to match http and not https.

df = pd.DataFrame({'Web': ['htt', 'http', 'https', 'www']})
df['match'] = df.Web.apply(lambda x: x.startswith('http'))

     Web  match
0    htt  False
1   http   True
2  https   True
3    www  False

Regex

df['match'] = df['Web'].str.match(r'^http(?!s)')


     Web  match
0    htt  False
1   http   True
2  https  False
3    www  False

Use Series.str.startswith :

df['match'] = df.Web.str.startswith('http')

Or use Series.str.contains with ^ for start of string:

df['match'] = df.Web.str.contains('^http')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM