简体   繁体   English

如何遍历 DataFrame 行并从 cols 中的字典中获取值?

[英]How to iterate through DataFrame rows and grab values from dicts in cols?

Features_Frame功能_框架

Each feature frame will be a batch of data.每个特征帧都是一批数据。 I would like to extract all values for the key 'coordinates' in the col geometry and iteratively insert into another df.我想提取 col 几何中关键“坐标”的所有值,并迭代插入另一个 df。

With that same df, I would also like to store data extracted from the properties col.使用相同的 df,我还想存储从属性 col 中提取的数据。 Properties col has many keys.属性 col 有很多键。

Each source frame will have both 'geometry':'coordinates' and 'properties', which will consists of various keys.每个源帧都有“几何”:“坐标”和“属性”,它们由各种键组成。

Each col in this new DataFrame will be a key inside either 'geometry' or 'properties'.这个新 DataFrame 中的每个 col 将是“geometry”或“properties”中的一个键。

For example:例如:

      coordinates          name
0      [-108.600,39.09]    'Target'
1      [51.459,82.04]      'Costco'
2      [-35.459,82.04]     'BJ's Wholesale Club'
3      [98.459,12.07]      'Walgreens'
4      [105.404,96.04]     'Walmart

I can access both cols with the below:我可以通过以下方式访问两个列:

coord_frame = features_frame['geometry'][:]
properties_frame = features_frame['properties'][:]

But that only splits the frame in two.但这只会将框架一分为二。 Typically, if I did:通常,如果我这样做:

Feature_Frame['geometry'][:]['coordinates']

I'd get the values for the coordinates key in the geometry col for all rows, if I did:如果我这样做了,我将获得所有行的几何 col 中坐标键的值:

Feature_Frame['properties'][:]['name']

I'd get the value for the name key in the properties col for all rows.我会在所有行的属性 col 中获取 name 键的值。

Instead I just get a key error saying name or coordinates dont exist.相反,我只是收到一个关键错误,说名称或坐标不存在。

Feed list of dicts to pd.DataFrame constructorpd.DataFrame列表馈送到pd.DataFrame构造函数

pd.Series.apply is a Python-level loop, except that it usually underperforms a simple list comprehension. pd.Series.apply是一个Python级别的循环,但它通常表现不佳,一个简单的列表理解。 A much better idea is to use the optimised code used in the pd.DataFrame constructor.一个更好的主意是使用pd.DataFrame构造函数中使用的优化代码。 Here's an example:下面是一个例子:

df = pd.DataFrame({'geometry': [{'coordinates': [-108.600,39.09], 'name': 'Target'},
                                {'coordinates': [51.459,82.04], 'name': 'Costco'}]})

print(df)

                                            geometry
0  {'coordinates': [-108.6, 39.09], 'name': 'Targ...
1  {'coordinates': [51.459, 82.04], 'name': 'Cost...

res = pd.DataFrame(df['geometry'].values.tolist())

print(res)

       coordinates    name
0  [-108.6, 39.09]  Target
1  [51.459, 82.04]  Costco

Use concat for multiple series of dictionaries对多个系列的字典使用concat

The above can be extended to arbitrary series of dictionaries:以上可以扩展到任意系列的字典:

df = pd.DataFrame({'geometry': [{'coordinates': [-108.600,39.09], 'name': 'Target'},
                                {'coordinates': [51.459,82.04], 'name': 'Costco'}],
                   'properties': [{'osm_id': 288700723, 'osm_tye': 'W'},
                                  {'osm_id': 52734154, 'osm_tye': 'W'}]})

res = pd.concat((pd.DataFrame(df[col].values.tolist()) for col in df), axis=1)

print(res)

       coordinates    name     osm_id osm_tye
0  [-108.6, 39.09]  Target  288700723       W
1  [51.459, 82.04]  Costco   52734154       W

what about关于什么

df_new = pd.DataFrame()

and then eg然后例如

df_new['coordinates'] = features_frame['geometry'].apply(lambda x: x['coordinates'])

or或者

df_new['name'] = features_frame['properties'].apply(lambda x: x['name'])

And if you want to do that with all keys you can just loop over the keys of exemplary the dict in the first row:如果你想用所有的键来做到这一点,你可以循环遍历第一行中示例字典的键:

for key in features_frame.geometry[0]:
    df_new[key] = features_frame.geometry.apply(lambda x: x[key])

for key in features_frame.properties[0]:
    df_new[key] = features_frame.properties.apply(lambda x: x[key])

supplemental:补充:
...and just in case there are identical keys in the geometry - and the properties -dicts, you could easily decorate them when creating new columns to prevent overwriting: ...并且以防万一geometry存在相同的键 - 和properties -dicts,您可以在创建新列时轻松装饰它们以防止覆盖:

for ...
    df_new['geom_' + key] = ...
for ...
    df_new['prop_' + key] = ...

EDIT:编辑:

In case that some dictionaries in a column don't have all keys, a default value, eg None should be returned.如果列中的某些字典没有所有键,则应返回默认值,例如None
To achieve that, simply use the get -method, which allow for defining a default value, in the lambda functions instead of indexing:要实现这一点,只需在 lambda 函数中使用get方法,它允许定义默认值而不是索引:

lambda x: x.get(key, None)

This is at least a proper solution against key errors.这至少是针对关键错误的正确解决方案。
However, if the code doesn't iterate through all keys because the dict in the first row is not representative for all dicts, at first a list of all keys has to be created.但是,如果代码没有遍历所有键,因为第一行中的 dict 不代表所有 dict,首先必须创建所有键的列表。
And there are different possibilities to get to this list:进入这个列表有不同的可能性:

  1. Ideally you already know all the keys from elsewhere.理想情况下,您已经知道其他地方的所有密钥。 Then you can put them in a list and iterate over it instead of the first dict.然后你可以把它们放在一个列表中并迭代它而不是第一个字典。

  2. Perhaps you know that there is at least one dict with the most keys and that this longest dict has all keys and the keys of shorter dicts in the same column are always subsets.也许您知道至少有一个具有最多键的 dict,并且这个最长的 dict 包含所有键,而同一列中较短的 dict 的键始终是子集。 Then you can find the然后你可以找到

    longest_dict = sorted(df.geometry, key=len)[-1]
  3. Perhaps you know nothing at all about the keys.也许您对密钥一无所知。 So you have to collect all different keys which appear anywhere in a column:因此,您必须收集出现在列中任何位置的所有不同键:

     all_keys = [] for d in df.geometry: all_keys.extend(d) all_keys = set(all_keys)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM