简体   繁体   中英

TypeError: an integer is required when select subset of rows dataframe pandas

 {'contributors': None,
 'coordinates': None,
 'created_at': 'Tue Aug 02 19:51:58 +0000 2016',
 'entities': {'hashtags': [],
  'symbols': [],
  'urls': [],
  'user_mentions': [{'id': 873491544,
    'id_str': '873491544',
    'indices': [0, 13],
    'name': 'Kenel M',
    'screen_name': 'KxSweaters13'}]},
 'favorite_count': 1,
 'favorited': False,
 'geo': None,
 'id': 760563814450491392,
 'id_str': '760563814450491392',
 'in_reply_to_screen_name': 'KxSweaters13',
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': 873491544,
 'in_reply_to_user_id_str': '873491544',
 'is_quote_status': False,
 'lang': 'en',
 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
 'place': {'attributes': {},
  'bounding_box': {'coordinates': [[[-71.813501, 42.4762],
     [-71.702186, 42.4762],
     [-71.702186, 42.573956],
     [-71.813501, 42.573956]]],
   'type': 'Polygon'},
  'contained_within': [],
  'country': 'Australia',
  'country_code': 'AUS',
  'full_name': 'Melbourne, V',
  'id': 'c4f1830ea4b8caaf',
  'name': 'Melbourne',
  'place_type': 'city',
  'url': 'https://api.twitter.com/1.1/geo/id/c4f1830ea4b8caaf.json'},
 'retweet_count': 0,
 'retweeted': False,
 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
 'text': '@KxSweaters13 are you the kenelx13 I see owning leominster for team valor?',
 'truncated': False,
 'user': {'contributors_enabled': False,
  'created_at': 'Thu Apr 21 17:09:52 +0000 2011',
  'default_profile': False,
  'default_profile_image': False,
  'description': "Arbys when it's cold. Kimballs when it's warm. @Ally__09 all year. Comp sci classes sometimes.",
  'entities': {'description': {'urls': []}},
  'favourites_count': 1106,
  'follow_request_sent': None,
  'followers_count': 167,
  'following': None,
  'friends_count': 171,
  'geo_enabled': True,
  'has_extended_profile': False,
  'id': 285715182,
  'id_str': '285715182',
  'is_translation_enabled': False,
  'is_translator': False,
  'lang': 'en',
  'listed_count': 2,
  'location': 'MA',
  'name': 'Steve',
  'notifications': None,
  'profile_background_color': '131516',
  'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme14/bg.gif',
  'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme14/bg.gif',
  'profile_background_tile': True,
  'profile_banner_url': 'https://pbs.twimg.com/profile_banners/285715182/1462218226',
  'profile_image_url': 'http://pbs.twimg.com/profile_images/727223698332200961/bGPjGjHK_normal.jpg',
  'profile_image_url_https': 'https://pbs.twimg.com/profile_images/727223698332200961/bGPjGjHK_normal.jpg',
  'profile_link_color': '4A913C',
  'profile_sidebar_border_color': 'FFFFFF',
  'profile_sidebar_fill_color': 'EFEFEF',
  'profile_text_color': '333333',
  'profile_use_background_image': True,
  'protected': False,
  'screen_name': 'StephenBurke_',
  'statuses_count': 5913,
  'time_zone': 'Eastern Time (US & Canada)',
  'url': None,
  'utc_offset': -14400,
  'verified': False}}

I have a json file which contains a list of json objects (each has the structure like above)

So I read it into a dataframe:

df = pd.read_json('data.json')

and then I try to get all the rows which are the 'city' type by:

df = df[df['place']['place_type'] == 'city']

but then I got the 'TypeError: an integer is required' During handling of the above exception, another exception occurred: KeyError: 'place_type'

Then I tried:

df['place'].head(3)
=>
0    {'id': '01864a8a64df9dc4', 'url': 'https://api...
1    {'id': '01864a8a64df9dc4', 'url': 'https://api...
2    {'id': '0118c71c0ed41109', 'url': 'https://api...
Name: place, dtype: object

So df['place'] return a series where keys are the indexes and that's why I got the TypeError

I've also tried to select the place_type of the first row and it works just fine:

df.iloc[0]['place']['place_type']
=>
city

The question is how can I filter out the rows in this case?

Solution:

Okay, so the problem lies in the fact that the pd.read_json cannot deal with nested JSON structure, so what I have done is to normalize the json object:

with open('data.json') as jsonfile:
    data = json.load(jsonfile)

df = pd.io.json.json_normalize(data)

df = df[df['place.place_type'] == 'city']

You can use the a list comprehension to do the filtering you need.

df = [loc for loc in df if d['place']['place_type'] == 'city']

This will give you an array where the elements place_type is 'city' .

I don't know if you have to use the place_type that is the index, to show all the rows that contains city.

"and then I try to get all the rows which are the city type by:"

This way you can get all the rows that contains city in the column place :

df = df[(df['place'] == 'city')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM