[英]Why there is a difference between n1 and n2?
I read a csv data in two ways, get different results.我以两种方式读取 csv 数据,得到不同的结果。 one way is the directly extract 'value' column one time from a csv using pandas another way is to extract 'value' class by class and append them together.
一种方法是使用 pandas 从 csv 中直接提取“值”列一次,另一种方法是逐类提取“值”并将它们附加在一起。 ideally, the two results should be the same, but I do see difference.
理想情况下,这两个结果应该是相同的,但我确实看到了差异。 the sequence of class is U1 U2 U7 U8 U9 U10 U98 U5 U4 U3, not sure if the order will impact or not.
类的顺序是U1 U2 U7 U8 U9 U10 U98 U5 U4 U3,不确定顺序会不会影响。 any idea?
任何的想法?
input.csv in link https://drive.google.com/file/d/1qND1NM6BK3py2ZjYw294GjhJVDzIOlHj/view?usp=sharing input.csv 链接https://drive.google.com/file/d/1qND1NM6BK3py2ZjYw294GjhJVDzIOlHj/view?usp=sharing
inputfilename='input.csv'
data=[]
df=pd.read_csv(inputfilename)
classes=pd.unique(df['class'])
for c in classes:
df2=df[df['class']==c]
data+=list(df2['value'].values)
n1=np.array(data)
n2=df['value']
plt.plot(n1-n2)
plt.show()
The two arrays will only be the same if all the rows with the same class are grouped together in the CSV.仅当具有相同类别的所有行在 CSV 中分组在一起时,这两个数组才会相同。
n1
is created by grouping all the values with the same class together. n1
是通过将具有同一类的所有值分组在一起创建的。 So it contains all U1
values, then all U2
values, and so on.所以它包含所有
U1
值,然后是所有U2
值,依此类推。
n2
just has all the values in the order that they appear in the CSV. n2
只是按照它们在 CSV 中出现的顺序包含所有值。
The classes are contiguous for U1, U2, U7, U8, U9, U10, and U98. U1、U2、U7、U8、U9、U10 和 U98 的类是连续的。 But U3, U4, and U5 are all mixed together.
但是U3、U4、U5都混在一起了。 You have a sequence of rows starting like this:
你有一系列这样开始的行:
U4,-0.6
U4,-0.8
U4,-0.1
U4,-0.6
U3,-0.2
U3,0.2
U5,-0.3
U5,0.1
U3,0
U5,0.2
U5,-0.2
These will be ordered differently in the two arrays.这些将在两个数组中以不同的方式排序。
You could solve this by sorting the dataframe by class first.您可以通过首先按类对数据框进行排序来解决此问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.