简体   繁体   中英

How to search all the values in a dataframe with a particular string

I am actually stuck and want to search a Dataframe to find all the cells which includes a url link into a different dataframe ie

Input:

             A    B            C
0            1    2  https://123
1  https://432  333           qq
2  https://567   rt           q4

Output:

             R
0  https://123
1  https://432
2  https://567

I am trying an approach to search all the columns containing the string "http" but its not working

Try:

output_df = pd.dataframe(columns=['R'])
for col in df.columns.tolist():
    output_df = pd.concat([ouput_df, df.loc[df[col].str.contains('https'), col].rename({col: 'R'}, axis=1)])

You can stack() your dataframe and use the method contains() to search for cells with urls:

df = df.stack()
df[df.str.contains('http')].to_frame('R').reset_index(drop=True)

Output:

             R
0  https://123
1  https://432
2  https://567

you can join each row by comma and use regex to find urls in each row, this works even there is multiple url. In case of no url it will set empty list for that row

df.apply(lambda row: ",".join(row), axis=1).str.findall("http[s]?://[^,]*")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM