I am trying to use the scrape_linkedin
package . I follow the section on the github page on how to set up the package/LinkedIn li_at
key (which I paste here for clarity).
Getting LI_AT
Navigate to www.linkedin.com and log in
Open browser developer tools (Ctrl-Shift-I or right click -> inspect element)
Select the appropriate tab for your browser (Application on Chrome, Storage on Firefox)
Click the Cookies dropdown on the left-hand menu, and select the www.linkedin.com option
Find and copy the li_at value
Once I collect the li_at
value from my LinkedIn, I run the following code:
from scrape_linkedin import ProfileScraper
with ProfileScraper(cookie='myVeryLong_li_at_Code_which_has_characters_like_AQEDAQNZwYQAC5_etc') as scraper:
profile = scraper.scrape(url='https://www.linkedin.com/in/justintrudeau/')
print(profile.to_dict())
I have two questions (I am originally an R user).
How can I input a list of profiles:
https://www.linkedin.com/in/justintrudeau/
https://www.linkedin.com/in/barackobama/
and scrape the profiles? (In RI would use the map
function from the purrr
package to apply the function to each of the LinkedIn profiles).
{'personal_info': {'name': 'Steve Wozniak', 'headline': 'Fellow at Apple', 'company': None, 'school': None, 'location': 'San Francisco Bay Area', 'summary': '', 'image': '', 'followers': '', 'email': None, 'phone': None, 'connected': None, 'websites': [], 'current_company_link': 'https://www.linkedin.com/company/sandisk/'}, 'experiences': {'jobs': [{'title': 'Chief Scientist', 'company': 'Fusion-io', 'date_range': 'Jul 2014 – Present', 'location': 'Primary Data', 'description': "I'm looking into future technologies applicable to servers and storage, and helping this company, which I love, get noticed and get a lead so that the world can discover the new amazing technology they have developed. My role is principally a marketing one at present but that will change over time.", 'li_company_url': 'https://www.linkedin.com/company/sandisk/'}, {'title': 'Fellow', 'company': 'Apple', 'date_range': 'Mar 1976 – Present', 'location': '1 Infinite Loop, Cupertino, CA 94015', 'description': 'Digita l Design engineer.', 'li_company_url': ''}, {'title': 'President & CTO', 'company': 'Wheels of Zeus', 'date_range': '2002 – 2005', 'location': None, 'description': None, 'li_company_url': 'https://www.linkedin.com/company/wheels-of-zeus/'}, {'title': 'diagnostic programmer', 'company': 'TENET Inc.', 'date_range': '1970 – 1971', 'location': None, 'description': None, 'li_company_url': ''}], 'education': [{'name': 'University of California, Berkeley', 'degree': 'BS', 'grades': None, 'field_of_study': 'EE & CS', 'date_range': '1971 – 1986', 'activities': None}, {'name': 'University of Colorado Boulder', 'degree': 'Honorary PhD.', 'grades': None, 'field_of_study': 'Electrical and Electronics Engineering', 'date_range': '1968 – 1969', 'activities': None}], 'volunteering': []}, 'skills': [], 'accomplishments': {'publications': [], 'certifications': [], 'patents': [], 'courses': [], 'projects': [], 'honors': [], 'test_scores': [], 'languages': [], 'organizations': []}, 'interests': ['Western Digital', 'University of Colorado Boulder', 'Western Digital Data Center Solutions', 'NEW Homebrew Computer Club', 'Wheels of Zeus', 'SanDisk®']}
Firstly, You can create a custom function to scrape data and use map
function in Python to apply it over each profile link.
Secondly, to create a pandas dataframe using a dictionary, you can simply pass the dictionary to pd.DataFrame.
Thus to create a dataframe df
, with dictionary dict
, you can do like this:
df = pd.DataFrame(dict)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.