简体   繁体   中英

Expanding pandas dataframe with range of numpy array

I have following dataframe(sample):

        min   max    lat    lon
16744  1000  1000  60.75  25.75
18738   875   950  64.00  13.75
2811    925  1000  41.00  20.00
12361  1000  1000  54.00  -1.25
19257  1000  1000  64.75  42.00

and array pressure :

pressure=['1000','975','950','925','900','875','850','825','800','775','750','700','650']

I want to extend dataframe with rows with pressure level values based on the range from min to max values. An addition is to be made based on pressure array members. Ie if min , max is 1000 , 1000 , a new row with value 975 is to be added with all other cells the same as the original record. I have partially solved this, not with pandas, and now, I have performance issues since a large dataframe. Here is what I did:

mini=sample['min'].to_numpy()
maksi=sample['max'].to_numpy()            
for i, ma in enumerate(maksi):
    poc=np.where(pressure==ma)
    kr=np.where(pressure==mini[i])
    if poc[0][0]==0:
        pk=0
    else:
        pk=-1
    if kr[0][0]==len(pressure):
        kk=0
    else:
        kk=2
    pock=poc[0][0]+pk
    kraj=kr[0][0]+kk
    pk=0
    kk=0
    levels=pressure[pock:kraj]
    print(levels)

and printout of above code:

[1000  975]
[975 950 925 900 875 850]
[1000  975  950  925  900]
[1000  975]
[1000  975]

What i need to do is integrate above arrays in records of sample dataframe.

Desired output:

        pre   lat    lon
16744  1000  60.75  25.75
16744   975  60.75  25.75
18738   975  64.00  13.75
18738   950  64.00  13.75
18738   900  64.00  13.75
18738   875  64.00  13.75
18738   850  64.00  13.75
2811   1000  41.00  20.00
2811    975  41.00  20.00
2811    950  41.00  20.00
2811    925  41.00  20.00
2811    900  41.00  20.00
12361  1000  54.00  -1.25
12361   975  54.00  -1.25
19257  1000  64.75  42.00
19257   975  64.75  42.00

Can I do all this in vectored manner - pandas alone? Any help is appreciated.

Let's cross-merge and filter:

(df.assign(min=lambda x: x['min']-25,dummy=1)
   .reset_index()
   .merge(pd.DataFrame({'pre':pressure, 'dummy':1}).astype(int),
          on='dummy')
   .loc[lambda x: x['pre'].between(x['min'],x['max'])]
   .set_index('index')
   .reindex(['pre','lat','lon'], axis=1)
)

Output:

        pre    lat    lon
index                    
16744  1000  60.75  25.75
16744   975  60.75  25.75
18738   950  64.00  13.75
18738   925  64.00  13.75
18738   900  64.00  13.75
18738   875  64.00  13.75
18738   850  64.00  13.75
2811   1000  41.00  20.00
2811    975  41.00  20.00
2811    950  41.00  20.00
2811    925  41.00  20.00
2811    900  41.00  20.00
12361  1000  54.00  -1.25
12361   975  54.00  -1.25
19257  1000  64.75  42.00
19257   975  64.75  42.00

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM