简体   繁体   English

使用多列熊猫数据框生成numpy数组

[英]Generate numpy array using multiple columns of pandas dataframe

Sorry for the long post. 抱歉,很长的帖子。 I'm using python 3.6 on windows 10.I have a pandas data frame that contain around 100,000 rows. 我在Windows 10上使用python 3.6,我有一个熊猫数据框,包含大约100,000行。 From this data frame I need to generate Four numpy arrays. 从这个数据帧中,我需要生成四个numpy数组。 First 5 relevant rows of my data frame looks like below 我数据框的前5个相关行如下所示

A          B      x      UB1     LB1     UB2    LB2
0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127
0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185
0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617
0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644
0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828

Column B is (1-Column A), Actually column B is not there in my data frame. B列是(1-A列),实际上B列不在我的数据框中。 I have added it to explain my problem From this data frame, I need to generate three arrays. 我添加了它来解释我的问题从这个数据框中,我需要生成三个数组。 My arrays looks like 我的数组看起来像

My array c looks like array([-0.2134, -0.7866,-0.24735, -0.75265,-0.0125, -0.9875,-0.8365, -0.1635,-0.1234, -0.8766],dtype=float32)

Where first element is first row of column A with added negative sign, similarly 2nd element is taken from 1st row of column B, third element is from second row of column A,fourth element is 2nd row of column B & so on My second array UB looks like 其中第一个元素是具有加负号的A列的第一行,类似地,第二个元素取自B列的第一行,第三个元素取自A列的第二行,第四个元素是B列的第二行,依此类推。 UB看起来像

array([ 0.2237, 0.0881, 0.1501, 0.0948, 0.0415, 0.2237],dtype=float32)

where elements are rows of column X. 其中元素是X列的行。

My third array,bounds, looks like 我的第三个数组,边界看起来像

   array([[0.0133 , 0.1567],
       [0.127 , 1.0499],
       [0.422 , 0.5905],
       [0.5185 , 1.4715],
       [0.5007 , 1.3721],
       [2.0617 , 2.0866],
       [1.0854 , 1.9463],
       [1.9644 , 2.4655],
       [2.2602 , 2.7903],
       [3.2828 , 3.5192]])

Where bounds[0][0] is first row of LB1,bounds[0][1] is first row of UB1. 其中bounds [0] [0]是LB1的第一行,bounds [0] [1]是UB1的第一行。 bounds[1][0] is first row of LB2, bounds [1][1] is first row of UB2. bounds [1] [0]是LB2的第一行,bounds [1] [1]是UB2的第一行。 Again bounds[2][0] is 2nd row of LB1 & so on. 再次bounds [2] [0]是LB1的第二行,依此类推。 My fourth array looks like 我的第四个数组看起来像

array([[-1,  1,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 0,  0, -1,  1,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0, -1,  1,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0, -1,  1,  0,  0],
       [ 0,  0,  0,  0,  0,  0,  0,  0, -1,  1]])

It contains same number of rows as data frame rows & column=2*data frame rows. 它包含的行数与数据框行和column = 2 *数据框行相同。

Can you please tell me for 100,000 rows of record what is the efficient way to generate these arrays 您能告诉我100,000行记录是什么有效的方法来生成这些数组

This should be rather straightforward: 这应该很简单:

from io import StringIO
import pandas as pd
import numpy as np

data = """A          B      x      UB1     LB1     UB2    LB2
0.2134  0.7866  0.2237  0.1567  0.0133  1.0499  0.127
0.24735 0.75265 0.0881  0.5905  0.422   1.4715  0.5185
0.0125  0.9875  0.1501  1.3721  0.5007  2.0866  2.0617
0.8365  0.1635  0.0948  1.9463  1.0854  2.4655  1.9644
0.1234  0.8766  0.0415  2.7903  2.2602  3.5192  3.2828"""

df = pd.read_csv(StringIO(data), sep='\\s+', header=0)

c = -np.stack([df['A'], 1 - df['A']], axis=1).ravel()
print(c)
# [-0.2134  -0.7866  -0.24735 -0.75265 -0.0125  -0.9875  -0.8365  -0.1635
#  -0.1234  -0.8766 ]

ub = df['x'].values
print(ub)
# [0.2237 0.0881 0.1501 0.0948 0.0415]

bounds = np.stack([df['LB1'], df['UB1'], df['LB2'], df['UB2']], axis=1).reshape((-1, 2))
print(bounds)
# [[0.0133 0.1567]
#  [0.127  1.0499]
#  [0.422  0.5905]
#  [0.5185 1.4715]
#  [0.5007 1.3721]
#  [2.0617 2.0866]
#  [1.0854 1.9463]
#  [1.9644 2.4655]
#  [2.2602 2.7903]
#  [3.2828 3.5192]]

n = len(df)
fourth = np.zeros((n, 2 * n))
idx = np.arange(n)
fourth[idx, 2 * idx] = -1
fourth[idx, 2 * idx + 1] = 1
print(fourth)
# [[-1.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
#  [ 0.  0. -1.  1.  0.  0.  0.  0.  0.  0.]
#  [ 0.  0.  0.  0. -1.  1.  0.  0.  0.  0.]
#  [ 0.  0.  0.  0.  0.  0. -1.  1.  0.  0.]
#  [ 0.  0.  0.  0.  0.  0.  0.  0. -1.  1.]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM