简体   繁体   English

使用numpy数组范围扩展pandas数据框

[英]Expanding pandas dataframe with range of numpy array

I have following dataframe(sample):我有以下数据框(样本):

        min   max    lat    lon
16744  1000  1000  60.75  25.75
18738   875   950  64.00  13.75
2811    925  1000  41.00  20.00
12361  1000  1000  54.00  -1.25
19257  1000  1000  64.75  42.00

and array pressure :和阵列pressure

pressure=['1000','975','950','925','900','875','850','825','800','775','750','700','650']

I want to extend dataframe with rows with pressure level values based on the range from min to max values.我想根据从minmax的范围使用具有压力级别值的行来扩展数据框。 An addition is to be made based on pressure array members.将根据pressure阵列成员进行添加。 Ie if min , max is 1000 , 1000 , a new row with value 975 is to be added with all other cells the same as the original record.即,如果min , max1000 , 1000 ,则将添加值为975的新行,所有其他单元格与原始记录相同。 I have partially solved this, not with pandas, and now, I have performance issues since a large dataframe.我已经部分解决了这个问题,而不是使用熊猫,现在,由于大型数据框,我遇到了性能问题。 Here is what I did:这是我所做的:

mini=sample['min'].to_numpy()
maksi=sample['max'].to_numpy()            
for i, ma in enumerate(maksi):
    poc=np.where(pressure==ma)
    kr=np.where(pressure==mini[i])
    if poc[0][0]==0:
        pk=0
    else:
        pk=-1
    if kr[0][0]==len(pressure):
        kk=0
    else:
        kk=2
    pock=poc[0][0]+pk
    kraj=kr[0][0]+kk
    pk=0
    kk=0
    levels=pressure[pock:kraj]
    print(levels)

and printout of above code:并打印出上述代码:

[1000  975]
[975 950 925 900 875 850]
[1000  975  950  925  900]
[1000  975]
[1000  975]

What i need to do is integrate above arrays in records of sample dataframe.我需要做的是将上述数组集成到示例数据帧的记录中。

Desired output:期望的输出:

        pre   lat    lon
16744  1000  60.75  25.75
16744   975  60.75  25.75
18738   975  64.00  13.75
18738   950  64.00  13.75
18738   900  64.00  13.75
18738   875  64.00  13.75
18738   850  64.00  13.75
2811   1000  41.00  20.00
2811    975  41.00  20.00
2811    950  41.00  20.00
2811    925  41.00  20.00
2811    900  41.00  20.00
12361  1000  54.00  -1.25
12361   975  54.00  -1.25
19257  1000  64.75  42.00
19257   975  64.75  42.00

Can I do all this in vectored manner - pandas alone?我可以以矢量方式完成所有这些 - 仅熊猫吗? Any help is appreciated.任何帮助表示赞赏。

Let's cross-merge and filter:让我们交叉合并和过滤:

(df.assign(min=lambda x: x['min']-25,dummy=1)
   .reset_index()
   .merge(pd.DataFrame({'pre':pressure, 'dummy':1}).astype(int),
          on='dummy')
   .loc[lambda x: x['pre'].between(x['min'],x['max'])]
   .set_index('index')
   .reindex(['pre','lat','lon'], axis=1)
)

Output:输出:

        pre    lat    lon
index                    
16744  1000  60.75  25.75
16744   975  60.75  25.75
18738   950  64.00  13.75
18738   925  64.00  13.75
18738   900  64.00  13.75
18738   875  64.00  13.75
18738   850  64.00  13.75
2811   1000  41.00  20.00
2811    975  41.00  20.00
2811    950  41.00  20.00
2811    925  41.00  20.00
2811    900  41.00  20.00
12361  1000  54.00  -1.25
12361   975  54.00  -1.25
19257  1000  64.75  42.00
19257   975  64.75  42.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM