简体   繁体   中英

I have a csv with inputs, I want output csv. inputs generates some urls and I want to append them to existing dataframe

I have a CSV file with some keywords. With the keywords, I generate 10 - 15 URLs using the SERP API. I want to append the URLs to column URL with respect to that keyword. For one keyword there are multiple URLs. If I am not able to add URLs to respective keywords the keywords will be replicated with different URLs.

test.csv file

keyword            value         cost
electrician         15            3.6
plumber             12            5.9

I want result like this to output.csv file

keyword        value     cost         url
electrician     15       3.6        www.ex1.com
electrician     15       3.6        www.ex2.com
electrician     15       3.6        www.ex3.com
.
.
.
.
plumber         12       5.9         www.ee.com
plumber         12       5.9         www.ee2.com
.
.

.

This is the code I am using; I have tried it with pandas but it appends only the last URL.

df = pd.read_csv("test.csv")                                                                                  
col = df['keywords']                                                                                          
for k in col:                                                                                                 
    print(k)                                                                                                  
    client_params = {                                                                                         
      "q": k,                                                                                                 
      "google_domain": "google.co.uk",                                                                        
      "location": "United+Kingdom",                                                                           
      "hl": "en",                                                                                             
      "gl": "uk",                                                                                             
      "num": "15",                                                                                            
      "serp_api_key": "********************",                     
    }                                                                                                         
    client = GoogleSearchResults(client_params)                                                               
    json_results = client.get_json()                                                                          
    #print(json_results)                                                                                      
    #print(type(json_results))                                                                                
    results = pd.DataFrame()                                                                                  
    print(results)                                                                                            
    for title in json_results['organic_results']:                                                             
        print(title['link'])                                                                                  
        df['URL'] = title['link']                                                                             
        results = results.append(df).reset_index(drop = True)                                                 
    # df.to_csv('output.csv',index=False, header= False)                                                      
results.to_csv('output.csv',index=False)  

But this is the result I got:

keyword       value    cost       urls
electrician    15      3.6      www.ex.xom
plumber        12      5.9      www.ex.xom

This is because you have results within your loop. After each iteration, it's overwriting the previous, thus leaving the last write as the last iteration. You need to initialise that before entering the loop, then you can continue to append to it. So something like:

NOTE: You probably don;t want to include an api key so I took it out of your answer. You'll need to put in your api_key

import pandas as pd
from lib.google_search_results import GoogleSearchResults

df = pd.read_csv("test.csv")                                                                  

results = pd.DataFrame()   
for idx, row in df.iterrows():
    k = row['keyword']  
    value = row['value']
    cost = row['cost']                                                                            
    client_params = {                                                                         
      "q": k,                                                                                 
      "google_domain": "google.co.uk",                                                        
      "location": "United+Kingdom",                                                           
      "hl": "en",                                                                             
      "gl": "uk",                                                                             
      "num": "15",                                                                            
      "serp_api_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",     
    }                                                                                         
    client = GoogleSearchResults(client_params)                                               
    json_results = client.get_json()                                                          

    for title in json_results['organic_results']:                                             
        print(title['link'])                                                                  
        title =  title['link']

        temp_df = pd.DataFrame([[k, value, cost,  title]], columns=['keyword', 'value','cost','URL'])
        results = results.append(temp_df).reset_index(drop = True)                                 

results.to_csv('output.csv',index=False)  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM