[英]Python List comprehension and JSON parsing
I'm new to Python and trying to figure out the best way to parse the values of a JSON object into an array, using a list comprehension. 我是Python的新手,它试图通过列表理解来找出将JSON对象的值解析为数组的最佳方法。
Here is my code - I'm querying the publicly available iNaturalist API and would like to take the JSON object that it returns, so that I take specific parts of the JSON object into a bumpy array: 这是我的代码-我正在查询可公开使用的iNaturalist API,并希望将其返回的JSON对象作为对象,以便将JSON对象的特定部分转换为凹凸不平的数组:
import json
import urllib2
#Set Observations URL request for Resplendent Quetzal of Costa Rica
query = urllib2.urlopen("http://api.inaturalist.org/v1/observations?place_id=6924&taxon_id=20856&per_page=200&order=desc&order_by=created_at")
obSet = json.load(query)
#Print out Lat Long of observation
n = obSet['total_results']
for i in range(n) :
print obSet['results'][i]['location']
This all works fine and gives the following output: 一切正常,并提供以下输出:
9.5142456535,-83.8011438905
10.2335478381,-84.8517773638
10.3358965682,-84.9964271008
10.3744851815,-84.9871494128
10.2468720343,-84.9298072822
...
What I'd like to do next is replace the for loop with a list comprehension, and store the location value in a tuple. 接下来,我想用列表推导替换for循环,并将位置值存储在元组中。 I'm struggling with the syntax in that I'm guessing it's something like this:
我在语法上苦苦挣扎,我猜是这样的:
[(long,lat) for i in range(n) for (long,lat) in obSet['results'][i]['location']]
But this doesn't work...thanks for any help. 但这行不通...感谢您的帮助。
The direct translation of your code into a list comprehension is: 将代码直接转换为列表推导是:
positions = [obSet['results'][i]['location'] for i in range(obSet['total_results'])]
The obSet['total_results']
is informative but not needed, you could just loop over obSet['results']
directly and use each resulting dictionary: obSet['total_results']
是有用的,但不是必需的,您可以直接循环遍历obSet['results']
并使用每个结果字典:
positions = [res['location'] for res in obSet['results']]
Now you have a list of strings however, as each 'location'
is still the long,lat
formatted string you printed before. 但是,现在您有了一个字符串列表,因为每个
'location'
仍然是您之前打印的long,lat
格式的long,lat
字符串。
Split that string and convert the result into a sequence of floats: 拆分该字符串并将结果转换为浮点数序列:
positions = [map(float, res['location'].split(',')) for res in obSet['results']]
Now you have a list of lists with floating point values: 现在,您有了一个带有浮点值的列表列表:
>>> [map(float, res['location'].split(',')) for res in obSet['results']]
[[9.5142456535, -83.8011438905], [10.2335478381, -84.8517773638], [10.3358965682, -84.9964271008], [10.3744851815, -84.9871494128], [10.2468720343, -84.9298072822], [10.3456659939, -84.9451804822], [10.3611732346, -84.9450302597], [10.3174360636, -84.8798676791], [10.325110706, -84.939710318], [9.4098152454, -83.9255607577], [9.4907141714, -83.9240819199], [9.562637289, -83.8170178428], [9.4373885911, -83.8312881263], [9.4766746409, -83.8120952573], [10.2651190176, -84.6360466565], [9.6572995298, -83.8322965118], [9.6997991784, -83.9076919066], [9.6811177044, -83.8487647156], [9.7416717045, -83.929327673], [9.4885099275, -83.9583968683], [10.1233252667, -84.5751029683], [9.4411815757, -83.824401543], [9.4202687169, -83.9550344212], [9.4620656621, -83.665183105], [9.5861809119, -83.8358881552], [9.4508914243, -83.9054016165], [9.4798058284, -83.9362558497], [9.5970449879, -83.8969131893], [9.5855562829, -83.8354434596], [10.2366179555, -84.854847472], [9.718459702, -83.8910277016], [9.4424384874, -83.8880459793], [9.5535916157, -83.9578166199], [10.4124554163, -84.9796942349], [10.0476688795, -84.298227929], [10.2129436252, -84.8384097435], [10.2052632717, -84.6053701877], [10.3835784147, -84.8677930134], [9.6079669672, -83.9084281155], [10.3583643315, -84.8069762134], [10.3975986735, -84.9196996767], [10.2060835381, -84.9698814407], [10.3322929317, -84.8805587129], [9.4756504472, -83.963818143], [10.3997876964, -84.9127311339], [10.1777433853, -84.0673088686], [10.3346128571, -84.9306278215], [9.5193346195, -83.9404786293], [9.421538224, -83.7689452093], [9.430427837, -83.9532672942], [10.3243212895, -84.9653175843], [10.021698503, -83.885674888]]
If you must have tuples rather than lists, add a tuple()
call: 如果必须有元组而不是列表,请添加一个
tuple()
调用:
positions = [tuple(map(float, res['location'].split(',')))
for res in obSet['results']]
The latter also makes sure the expression works in Python 3 (where map()
returns an iterator, not a list); 后者还确保该表达式可在Python 3中使用(其中
map()
返回一个迭代器,而不是列表); you'd otherwise have to use a nested list comprehension: 否则,您将不得不使用嵌套列表理解:
# produce a list of lists in Python 3
positions = [[float(p) for p in res['location'].split(',')] for res in obSet['results']]
You can iterate over the list of results directly: 您可以直接遍历结果列表:
print([tuple(result['location'].split(',')) for result in obSet['results']])
>> [('9.5142456535', '-83.8011438905'), ('10.2335478381', '-84.8517773638'), ... ]
[tuple(obSet['results'][i]['location'].split(',')) for i in range(n)]
This will return a list of tuple, elements of the tuples are unicode
. 这将返回一个元组列表,元组的元素是
unicode
。
If you want that the elements of tuples as floats, do the following: 如果希望元组的元素为浮点型,请执行以下操作:
[tuple(map(float,obSet['results'][i]['location'].split(','))) for i in range(n)]
Another way to get list of [long, lat] without list comprehension: 在没有列表理解的情况下获取[long,lat]列表的另一种方法:
In [14]: map(lambda x: obSet['results'][x]['location'].split(','), range(obSet['total_results']))
Out[14]:
[[u'9.5142456535', u'-83.8011438905'],
[u'10.2335478381', u'-84.8517773638'],
[u'10.3358965682', u'-84.9964271008'],
[u'10.3744851815', u'-84.9871494128'],
...
If you would like list of tuples instead: 如果您想要元组列表:
In [14]: map(lambda x: tuple(obSet['results'][x]['location'].split(',')), range(obSet['total_results']))
Out[14]:
[[u'9.5142456535', u'-83.8011438905'],
[u'10.2335478381', u'-84.8517773638'],
[u'10.3358965682', u'-84.9964271008'],
[u'10.3744851815', u'-84.9871494128'],
...
If you want to convert to floats too: 如果您也想转换为浮点数:
In [17]: map(lambda x: tuple(map(float, obSet['results'][x]['location'].split(','))), range(obSet['total_results']))
Out[17]:
[(9.5142456535, -83.8011438905),
(10.2335478381, -84.8517773638),
(10.3358965682, -84.9964271008),
(10.3744851815, -84.9871494128),
(10.2468720343, -84.9298072822),
(10.3456659939, -84.9451804822),
...
To correct way to get a list of tuples using list comprehensions would be: 要纠正使用列表推导获取元组列表的方法,将是:
def to_tuple(coords_str):
return tuple(coords_str.split(','))
output_list = [to_tuple(obSet['results'][i]['location']) for i in range(obSet['total_results'])]
You can of course replace to_tuple()
with a lambda function, I just wanted to make the example clear. 您当然可以用lambda函数替换
to_tuple()
,我只是想使例子清楚。 Moreover, you could use map()
to have a tuple with floats instead of string: return tuple(map(float,coords_str.split(',')))
. 此外,您可以使用
map()
来使用带浮点数而不是字符串的return tuple(map(float,coords_str.split(',')))
: return tuple(map(float,coords_str.split(',')))
。
obSet['results']
is a list, no need to use range
to iterate over it: obSet['results']
是一个列表,不需要使用range
对其进行迭代:
for item in obSet['results']:
print(item['location'])
To make this into list comprehension you can write: 为了使它成为列表理解,您可以编写:
[item['location'] for item in obSet['results']]
But, each location is coded as a string, instead of list or tuple of floats. 但是,每个位置都编码为字符串,而不是浮点数列表或元组。 To get it to the proper format, use
要使其正确格式,请使用
[tuple(float(coord) for coord in item['location'].split(','))
for item in obSet['results']]
That is, split the item['location']
string into parts using ,
as the delimiter, then convert each part into a float, and make a tuple of these float coordinates. 也就是说,使用
,
作为定界符将item['location']
字符串分割成多个部分,然后将每个部分转换为一个float,并生成这些float坐标的元组。
Let's try to give this a shot, starting with just 1 location: 让我们尝试从一个位置开始试一下:
>>> (long, lat) = obSet['results'][0]['location']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
Alright, so that didn't work, but why? 好吧,所以那行不通,但是为什么呢? It's because the longitude and latitude coordinates are just 1 string, so you can't unpack it immediately as a tuple.
这是因为经度和纬度坐标只有1个字符串,所以您不能立即将其作为元组解包。 We must first separate it into two different strings.
我们必须首先将其分为两个不同的字符串。
>>> (long, lat) = obSet['results'][0]['location'].split(",")
From here we will want to iterate through the whole set of results, which we know are indexed from 0 to n. 从这里开始,我们将要遍历整个结果集,我们知道这些结果的索引是从0到n。
tuple(obSet['results'][i]['location'].split(","))
will give us the tuple of longitude, latitude for the result at index i, so: tuple(obSet['results'][i]['location'].split(","))
将为我们提供索引i处的经度和纬度的元组,因此:
>>> [tuple(obSet['results'][i]['location'].split(",")) for i in range(n)]
ought to give us the set of tuples we want. 应该给我们我们想要的元组集。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.