简体   繁体   中英

how can I extract specific row which contain specific keyword from my json dataset using pandas in python?

sorry that might be very simple question but I am new to python/json and everything. I am trying to filter my twitter json data set based on user_location/country_code/gb. but I have no idea how to do this. I have tried several ways but still no chance. I have attached my data set and some codes I have used here. I would appreciate any help.

here is what I did to get the best result however I do not know how to tell it to go for whole data set and print out the result of tweet_id:

import json

import pandas as pd

df = pd.read_json('example.json', lines=True)
if df['user_location'][4]['country_code'] == 'th':
  print (df.tweet_id[4])

else: 
  print('false')

this code show me the tweet_id: 1223489829817577472 however, I couldn't extend it to the whole data set.

I have tried theis code as well, still no chance:

dataset = df[df['user_location'].isin([ "gb" ])].copy()

print (dataset)

that is what my data set looks like:

I would break the user_location column into multiple columns using the following

df = pd.concat([df, df.pop('user_location').apply(pd.Series)], axis=1)

Running this should give you a column each for the keys contained within the user_location json. Then it should be easy to print out tweet_ids based on country_code using:

df[df['country_code']=='th']['tweet_id']

An explanation of what is actually happening here:

  • df.pop('user_location') removes the 'user_location' column from df and returns it at the same time
  • With the returned column, we use the .apply method to apply a function to the column
  • pd.Series converts the JSON data/dictionary into a DataFrame
  • pd.concat concatenates the original df (now without the 'user_location' column) with the new columns created from the 'user_location' data

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM