[英]Expanding pandas dataframe with range of numpy array
I have following dataframe(sample):我有以下数据框(样本):
min max lat lon
16744 1000 1000 60.75 25.75
18738 875 950 64.00 13.75
2811 925 1000 41.00 20.00
12361 1000 1000 54.00 -1.25
19257 1000 1000 64.75 42.00
and array pressure
:和阵列
pressure
:
pressure=['1000','975','950','925','900','875','850','825','800','775','750','700','650']
I want to extend dataframe with rows with pressure level values based on the range from min
to max
values.我想根据从
min
到max
的范围使用具有压力级别值的行来扩展数据框。 An addition is to be made based on pressure
array members.将根据
pressure
阵列成员进行添加。 Ie if min
, max
is 1000
, 1000
, a new row with value 975
is to be added with all other cells the same as the original record.即,如果
min
, max
为1000
, 1000
,则将添加值为975
的新行,所有其他单元格与原始记录相同。 I have partially solved this, not with pandas, and now, I have performance issues since a large dataframe.我已经部分解决了这个问题,而不是使用熊猫,现在,由于大型数据框,我遇到了性能问题。 Here is what I did:
这是我所做的:
mini=sample['min'].to_numpy()
maksi=sample['max'].to_numpy()
for i, ma in enumerate(maksi):
poc=np.where(pressure==ma)
kr=np.where(pressure==mini[i])
if poc[0][0]==0:
pk=0
else:
pk=-1
if kr[0][0]==len(pressure):
kk=0
else:
kk=2
pock=poc[0][0]+pk
kraj=kr[0][0]+kk
pk=0
kk=0
levels=pressure[pock:kraj]
print(levels)
and printout of above code:并打印出上述代码:
[1000 975]
[975 950 925 900 875 850]
[1000 975 950 925 900]
[1000 975]
[1000 975]
What i need to do is integrate above arrays in records of sample dataframe.我需要做的是将上述数组集成到示例数据帧的记录中。
Desired output:期望的输出:
pre lat lon
16744 1000 60.75 25.75
16744 975 60.75 25.75
18738 975 64.00 13.75
18738 950 64.00 13.75
18738 900 64.00 13.75
18738 875 64.00 13.75
18738 850 64.00 13.75
2811 1000 41.00 20.00
2811 975 41.00 20.00
2811 950 41.00 20.00
2811 925 41.00 20.00
2811 900 41.00 20.00
12361 1000 54.00 -1.25
12361 975 54.00 -1.25
19257 1000 64.75 42.00
19257 975 64.75 42.00
Can I do all this in vectored manner - pandas alone?我可以以矢量方式完成所有这些 - 仅熊猫吗? Any help is appreciated.
任何帮助表示赞赏。
Let's cross-merge and filter:让我们交叉合并和过滤:
(df.assign(min=lambda x: x['min']-25,dummy=1)
.reset_index()
.merge(pd.DataFrame({'pre':pressure, 'dummy':1}).astype(int),
on='dummy')
.loc[lambda x: x['pre'].between(x['min'],x['max'])]
.set_index('index')
.reindex(['pre','lat','lon'], axis=1)
)
Output:输出:
pre lat lon
index
16744 1000 60.75 25.75
16744 975 60.75 25.75
18738 950 64.00 13.75
18738 925 64.00 13.75
18738 900 64.00 13.75
18738 875 64.00 13.75
18738 850 64.00 13.75
2811 1000 41.00 20.00
2811 975 41.00 20.00
2811 950 41.00 20.00
2811 925 41.00 20.00
2811 900 41.00 20.00
12361 1000 54.00 -1.25
12361 975 54.00 -1.25
19257 1000 64.75 42.00
19257 975 64.75 42.00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.