简体   繁体   English

TypeError:选择行数据框熊猫的子集时需要一个整数

[英]TypeError: an integer is required when select subset of rows dataframe pandas

 {'contributors': None,
 'coordinates': None,
 'created_at': 'Tue Aug 02 19:51:58 +0000 2016',
 'entities': {'hashtags': [],
  'symbols': [],
  'urls': [],
  'user_mentions': [{'id': 873491544,
    'id_str': '873491544',
    'indices': [0, 13],
    'name': 'Kenel M',
    'screen_name': 'KxSweaters13'}]},
 'favorite_count': 1,
 'favorited': False,
 'geo': None,
 'id': 760563814450491392,
 'id_str': '760563814450491392',
 'in_reply_to_screen_name': 'KxSweaters13',
 'in_reply_to_status_id': None,
 'in_reply_to_status_id_str': None,
 'in_reply_to_user_id': 873491544,
 'in_reply_to_user_id_str': '873491544',
 'is_quote_status': False,
 'lang': 'en',
 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
 'place': {'attributes': {},
  'bounding_box': {'coordinates': [[[-71.813501, 42.4762],
     [-71.702186, 42.4762],
     [-71.702186, 42.573956],
     [-71.813501, 42.573956]]],
   'type': 'Polygon'},
  'contained_within': [],
  'country': 'Australia',
  'country_code': 'AUS',
  'full_name': 'Melbourne, V',
  'id': 'c4f1830ea4b8caaf',
  'name': 'Melbourne',
  'place_type': 'city',
  'url': 'https://api.twitter.com/1.1/geo/id/c4f1830ea4b8caaf.json'},
 'retweet_count': 0,
 'retweeted': False,
 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>',
 'text': '@KxSweaters13 are you the kenelx13 I see owning leominster for team valor?',
 'truncated': False,
 'user': {'contributors_enabled': False,
  'created_at': 'Thu Apr 21 17:09:52 +0000 2011',
  'default_profile': False,
  'default_profile_image': False,
  'description': "Arbys when it's cold. Kimballs when it's warm. @Ally__09 all year. Comp sci classes sometimes.",
  'entities': {'description': {'urls': []}},
  'favourites_count': 1106,
  'follow_request_sent': None,
  'followers_count': 167,
  'following': None,
  'friends_count': 171,
  'geo_enabled': True,
  'has_extended_profile': False,
  'id': 285715182,
  'id_str': '285715182',
  'is_translation_enabled': False,
  'is_translator': False,
  'lang': 'en',
  'listed_count': 2,
  'location': 'MA',
  'name': 'Steve',
  'notifications': None,
  'profile_background_color': '131516',
  'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme14/bg.gif',
  'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme14/bg.gif',
  'profile_background_tile': True,
  'profile_banner_url': 'https://pbs.twimg.com/profile_banners/285715182/1462218226',
  'profile_image_url': 'http://pbs.twimg.com/profile_images/727223698332200961/bGPjGjHK_normal.jpg',
  'profile_image_url_https': 'https://pbs.twimg.com/profile_images/727223698332200961/bGPjGjHK_normal.jpg',
  'profile_link_color': '4A913C',
  'profile_sidebar_border_color': 'FFFFFF',
  'profile_sidebar_fill_color': 'EFEFEF',
  'profile_text_color': '333333',
  'profile_use_background_image': True,
  'protected': False,
  'screen_name': 'StephenBurke_',
  'statuses_count': 5913,
  'time_zone': 'Eastern Time (US & Canada)',
  'url': None,
  'utc_offset': -14400,
  'verified': False}}

I have a json file which contains a list of json objects (each has the structure like above) 我有一个包含json对象列表的json文件(每个对象都具有上述结构)

So I read it into a dataframe: 所以我将其读入数据框:

df = pd.read_json('data.json')

and then I try to get all the rows which are the 'city' type by: 然后我尝试通过以下方式获取所有属于“城市”类型的行:

df = df[df['place']['place_type'] == 'city']

but then I got the 'TypeError: an integer is required' During handling of the above exception, another exception occurred: KeyError: 'place_type' 但是随后我得到了'TypeError:需要一个整数'在处理上述异常期间,发生了另一个异常:KeyError:'place_type'

Then I tried: 然后我尝试了:

df['place'].head(3)
=>
0    {'id': '01864a8a64df9dc4', 'url': 'https://api...
1    {'id': '01864a8a64df9dc4', 'url': 'https://api...
2    {'id': '0118c71c0ed41109', 'url': 'https://api...
Name: place, dtype: object

So df['place'] return a series where keys are the indexes and that's why I got the TypeError 所以df ['place']返回一个序列,其中键是索引,这就是为什么我得到TypeError的原因

I've also tried to select the place_type of the first row and it works just fine: 我也尝试选择第一行的place_type ,它工作得很好:

df.iloc[0]['place']['place_type']
=>
city

The question is how can I filter out the rows in this case? 问题是在这种情况下如何过滤出行?

Solution: 解:

Okay, so the problem lies in the fact that the pd.read_json cannot deal with nested JSON structure, so what I have done is to normalize the json object: 好的,所以问题出在以下事实: pd.read_json无法处理嵌套的JSON结构,因此我要做的是标准化json对象:

with open('data.json') as jsonfile:
    data = json.load(jsonfile)

df = pd.io.json.json_normalize(data)

df = df[df['place.place_type'] == 'city']

You can use the a list comprehension to do the filtering you need. 您可以使用列表推导来进行所需的过滤。

df = [loc for loc in df if d['place']['place_type'] == 'city']

This will give you an array where the elements place_type is 'city' . 这将为您提供一个数组,其中元素place_type'city'

I don't know if you have to use the place_type that is the index, to show all the rows that contains city. 我不知道您是否必须使用作为索引的place_type来显示包含city的所有行。

"and then I try to get all the rows which are the city type by:" “然后我尝试通过以下方式获取所有city类型的行:”

This way you can get all the rows that contains city in the column place : 这样,您可以在列place获取包含city的所有行:

df = df[(df['place'] == 'city')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM