简体   繁体   English

如何在Python Pandas中动态生成管道?

[英]How to dynamically generate Pipes in Python Pandas?

I'm building a data augmentation pipeline using dataframes.我正在使用数据框构建数据增强管道。 I've created a function, h3_int that takes an int input and appends a column of hex values to the dataframe.我创建了一个 function, h3_int ,它接受一个 int 输入并将一列十六进制值附加到 dataframe。 Here's the implementation of h3_int :这是h3_int的实现:

from h3.unstable import vect
def h3_int(df, level):
    df['h3_' + str(level)] = vect.geo_to_h3(df.lat.values, df.lng.values, level).tolist()
    return df

df is comprised of a lng and lat column: dflnglat列组成:

    lat lng
0   43.64617    -79.42451
1   43.64105    -79.37628
2   43.66724    -79.41598
3   43.69602    -79.45468
4   43.66890    -79.32592
... ... ...
9515    36.10644    -115.16711
9516    36.00814    -115.17496
9517    36.10711    -115.16607
9518    36.03119    -115.05352
9519    36.13554    -115.11541

Here's the simple usage of h3_int :这是h3_int的简单用法:

df.pipe(h3_int, 8)

Since the input is dynamic, I'd like to dynamically generate the pipes as well, but I've been having difficulty implementing this.由于输入是动态的,我也想动态生成管道,但我一直难以实现这一点。 The code:编码:

(df.pipe(h3_int, i) for i in range(8, 10))

returns:返回:

<generator object <genexpr> at 0x7fd4858557b0>

while:尽管:

(df.pipe((h3_int, i) for i in range(8, 10)))

returns:返回:

TypeError: 'generator' object is not callable

What's the correct method for implementing dynamic pipes in pandas?在 pandas 中实现动态管道的正确方法是什么? Unfortunately I've found the documentation and SO lacking in answers.不幸的是,我发现文档并没有答案。

Using list comprehension inside of parentheses returns a generator , which is not indexable, as the error message indicates.如错误消息所示,在括号内使用列表推导会返回一个不可索引的generator Instead, you can use square brackets to create a list, which is indexable:相反,您可以使用方括号创建一个索引的列表:

>>> [df.pipe(h3_int, i) for i in range(8, 9)][0]
        lat        lng                h3_8
0  43.64617  -79.42451  613256717813153791
1  43.64105  -79.37628  613256717559398399
2  43.66724  -79.41598  613256718316470271
3  43.69602  -79.45468  613256716607291391
4  43.66890  -79.32592  613256718037549055
5  36.10644 -115.16711  613220086766895103
6  36.00814 -115.17496  613220073288499199
7  36.10711 -115.16607  613220086766895103
8  36.03119 -115.05352  613220075656183807
9  36.13554 -115.11541  613220087052107775

Note that df was modified in place because your function h3_int does not copy it before modifying it.请注意, df已就地修改,因为您的 function h3_int在修改之前不会复制它。 That's not bad, it's just something to keep in mind.这还不错,只是需要牢记。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM