I am actually stuck and want to search a Dataframe to find all the cells which includes a url link into a different dataframe ie
Input:
A B C
0 1 2 https://123
1 https://432 333 qq
2 https://567 rt q4
Output:
R
0 https://123
1 https://432
2 https://567
I am trying an approach to search all the columns containing the string "http" but its not working
Try:
output_df = pd.dataframe(columns=['R'])
for col in df.columns.tolist():
output_df = pd.concat([ouput_df, df.loc[df[col].str.contains('https'), col].rename({col: 'R'}, axis=1)])
You can stack()
your dataframe and use the method contains()
to search for cells with urls:
df = df.stack()
df[df.str.contains('http')].to_frame('R').reset_index(drop=True)
Output:
R
0 https://123
1 https://432
2 https://567
you can join each row by comma and use regex to find urls in each row, this works even there is multiple url. In case of no url it will set empty list for that row
df.apply(lambda row: ",".join(row), axis=1).str.findall("http[s]?://[^,]*")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.