简体   繁体   English

在python pandas中,如何重新采样和插入DataFrame?

[英]In python pandas, how can I re-sample and interpolate a DataFrame?

I have a pd DataFrame, typically on this format: 我有一个pd DataFrame,通常采用以下格式:

   1       2          3          4  
0.1100 0.0000E+00 1.0000E+00 5.0000E+00  
0.1323 7.7444E-05 8.7935E-01 1.0452E+00  
0.1545 4.3548E-04 7.7209E-01 4.5432E-01  
0.1768 1.2130E-03 6.7193E-01 2.6896E-01  
0.1990 2.5349E-03 5.7904E-01 1.8439E-01  
0.2213 4.5260E-03 4.9407E-01 1.3771E-01 

What I would like to do is re-sample the column 1 (index) values from a list, for example represented by: 我想要做的是从列表中重新采样列1(索引)值,例如:

indexList = numpy.linspace(0.11, 0.25, 8)

Then I need the values for columns 2, 3 and 4 to be linearly interpolated from the input DataFrame (it is always only my column 1 that I re-sample/reindex) - and if necessary extrapolated, as the min/max values for my list is not necessarily within my existing column 1 (index). 然后我需要从输入DataFrame线性插值第2,3和4列的值(它总是只有我重新采样/重新索引的第1列) - 如果需要外推,作为我的最小值/最大值list不一定在我现有的第1列(索引)中。 However the key point is the interpolation part. 然而,关键点是插值部分。 I am quite new to python, but I was thinking using an approach like this: 我是python的新手,但我正在考虑使用这样的方法:

  1. output_df = DataFrame.reindex(index=indexList) - this will give me mainly NaN's for columns 2-4. output_df = DataFrame.reindex(index = indexList) - 这将主要给出第2-4列的NaN。
  2. for index, row in output_df.iterrows() for index,output_df.iterrows()中的行
    "function that calculates interpolated/extrapolated values from DataFrame and inserts them at correct row/column" “从DataFrame计算插值/外推值并将其插入正确的行/列的函数”

Somehow it feels like I should be able to use the .interpolate functionality, but I cannot figure out how. 不知何故感觉我应该能够使用.interpolate功能,但我无法弄清楚如何。 I cannot use it straightforward - it will be too inaccurate since most of my entries after re-indexing as mentioned in columns 2-4 will be NaN's; 我不能直接使用它 - 它太不准确,因为在第2-4列中提到的重新索引后的大多数条目都是NaN的; the interpolation should be done within the two closest values of my initial DataFrame. 插值应该在我的初始DataFrame的两个最接近的值内完成。 Any good tips anyone? 任何好的提示有人吗? (and if my format/intension is unclear, please let me know...) (如果我的格式/意图不清楚,请告诉我......)

Assuming column 1 is in the index, you can reindex your dataframe with the original values along with the list you created and then use interpolate to fill in the nan's. 假设列1在索引中,您可以使用原始值和您创建的列表重新索引数据帧,然后使用interpolate填充nan。

df1 = df.reindex(df.index.union(np.linspace(.11,.25,8)))
df1.interpolate('index')

               2         3         4
0.1100  0.000000  1.000000  5.000000
0.1300  0.000069  0.891794  1.453094
0.1323  0.000077  0.879350  1.045200
0.1500  0.000363  0.793832  0.574093
0.1545  0.000435  0.772090  0.454320
0.1700  0.000976  0.702472  0.325482
0.1768  0.001213  0.671930  0.268960
0.1900  0.001999  0.616698  0.218675
0.1990  0.002535  0.579040  0.184390
0.2100  0.003517  0.537127  0.161364
0.2213  0.004526  0.494070  0.137710
0.2300  0.004526  0.494070  0.137710
0.2500  0.004526  0.494070  0.137710

Before we begin some spells: 在我们开始一些法术之前:

import pandas as pd
import numpy

LENGTH=8

Let's start by loading your data (we'll change to csv cause it's easier): 让我们从加载数据开始(我们将更改为csv,因为它更容易):

x="""   1       2          3          4
0.1100 0.0000E+00 1.0000E+00 5.0000E+00
0.1323 7.7444E-05 8.7935E-01 1.0452E+00
0.1545 4.3548E-04 7.7209E-01 4.5432E-01
0.1768 1.2130E-03 6.7193E-01 2.6896E-01
0.1990 2.5349E-03 5.7904E-01 1.8439E-01
0.2213 4.5260E-03 4.9407E-01 1.3771E-01
"""
nx = ""
for l in x.split('\n'):
    nx += ','.join(l.split()) + '\n'
df= pd.read_csv(pd.compat.StringIO(nx))

Now, you want a new data frame interpolated on the same data but with an array of 8 values between 0.11 and 0.25: 现在,您希望在相同数据上插入一个新数据帧,但是数组的值介于0.11和0.25之间:

indexList = numpy.linspace(0.11, 0.25, LENGTH)

We will use column one as the index, and reindex: 我们将使用第一列作为索引,并使用reindex:

df_interpolated = df.reindex(df.index.union(indexList)).interpolate('index')
df_interpolated.head(LENGTH)

             1         2         3         4
0.00  0.110000  0.000000  1.000000  5.000000
0.11  0.112453  0.000009  0.986729  4.564972
0.13  0.112899  0.000010  0.984316  4.485876
0.15  0.113345  0.000012  0.981903  4.406780
0.17  0.113791  0.000013  0.979490  4.327684
0.19  0.114237  0.000015  0.977077  4.248588
0.21  0.114683  0.000016  0.974664  4.169492
0.23  0.115129  0.000018  0.972251  4.090396
0.25  0.115575  0.000019  0.969838  4.011300

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 python 中重新采样和插入新数据 - How to re-sample and interpolate new data in python 如何设置5分钟滚动平均值窗口以使用Pandas Python重新采样数据 - How to set the 5 minutes rolling mean window to re-sample data with Pandas Python 将 df 重新采样到微秒 - Pandas - Re-sample df to microsecond - Pandas python在统一的半年期重新采样(在熊猫重新采样中等于“ BQ”) - python re-sample at a uniform semiannual period (equivaent of 'BQ' in pandas resample) 在 Pandas 中,如果我们通过平均将 1 分钟间隔数据重新采样为 15 分钟间隔,我们可以选择如何重新采样和分配数据 - In pandas If we are re-sampling a 1-minute interval data to a 15-minute interval by averaging can we select how to re-sample and assiggn the data 带有多索引的熊猫数据帧重新采样时间序列索引 - Panda dataframe re-sample timeseries index with multiindex 熊猫定制重新采样时间序列数据 - Pandas custom re-sample for time series data 如何在python数据框中插入值? - How can I interpolate values in a python dataframe? Python - Pandas:如何在呈指数增长的值之间进行插值? - Python - Pandas: how can I interpolate between values that grow exponentially? 如何从pandas DataFrame的子集中进行采样? - How can I sample from a subset of a pandas DataFrame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM