简体   繁体   English

Python 3.7 + Numpy + pandas数组在范围内选择数据

[英]Python 3.7+Numpy+pandas Arrays Selecting data between a range

Ok I'm going to try to explain my problem, I have a csv file with data, the data is wavelength and amplitude, the image is include here. 好的,我将尝试解释我的问题,我有一个带数据的csv文件,该数据是波长和幅度,图像包含在此处。

CSV data CSV数据

So, I want to select only data between 500nm and 800nm (wave), 因此,我只选择500nm至800nm(波)之间的数据,

import pandas as pd
import numpy as np
excelfile=pd.read_csv('Files/660nm.csv');
excelfile.head();
wave = excelfile['Longitud'];
wave = np.array(wave);
X = excelfile['Amplitud'];
X = np.array(X);
wave = wave[(wave > 500) & (wave < 800)]

This does what I want in first instance, but I want to extend this selection to the column of amplitude (X), to have two arrays of the same dimensions. 这样做的初衷是我想要的,但是我想将此选择扩展到幅度(X)列,以具有两个相同尺寸的数组。 In my actual code I have to make an index to select the data in the amplitude array(X): 在我的实际代码中,我必须建立索引以选择振幅数组(X)中的数据:

indices = np.arange(382,775,1)
X = np.take(X, indices)

But this is not the best practice, if I cant extend the first column selection to the the amplitude column I don't have to make another array to index the X array, and check the extension of the array, any idea about it ? 但这不是最佳实践,如果我不能将第一个列选择扩展到振幅列,则不必制作另一个数组来索引X数组并检查该数组的扩展,对此有任何想法吗? Thanks. 谢谢。

Like @ALollz pointed out, you shouldn't split the DataFrame up. 就像@ALollz指出的那样,您不应该拆分DataFrame。 Instead just filter the whole dataframe on wavelength. 而是只过滤整个数据帧的波长。 See the docs for DataFrame.loc 请参阅DataFrame.loc的文档

import pandas as pd
import numpy as np

# some dummy data
excelfile = pd.DataFrame({'Longitud': np.random.random(100) * 1000,
                          'Amplitud': np.arange(100)})

wave = excelfile['Longitud']
excelfile_filtered = excelfile.loc[(wave > 500) & (wave < 800)]
X = excelfile_filtered ['Amplitud'].values  # yields an array

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM