简体   繁体   English

读取在观察中具有不相等列的熊猫数据框

[英]Reading a pandas data frame having unequal columns in observations

I am trying to read this small data file, Link - https://drive.google.com/open?id=1nAS5mpxQLVQn9s_aAKvJt8tWPrP_DUiJ我正在尝试阅读这个小数据文件,链接 - https://drive.google.com/open?id=1nAS5mpxQLVQn9s_aAKvJt8tWPrP_DUiJ

I am using the code -我正在使用代码 -

df = pd.read_table('/Data/123451_date.csv', sep=';', index_col=0,  engine='python', error_bad_lines=False)

It has ';'它有';' as a seprator, and values are missing in the file for some columns values in some observations (or rows).作为分隔符,文件中缺少某些观察(或行)中某些列值的值。

How can I read it properly.我怎样才能正确阅读它。 I see the current dataframe, which is not loaded properly.我看到当前的数据框,它没有正确加载。 在此处输入图片说明

在此处输入图片说明

It looks like the data you use has some garbage in it.看起来您使用的数据中有一些垃圾。 Precisely, rows 1-33 (inclusive) have additional, unnecessary (non-GPS) information included.准确地说,第 1-33 行(含)包含额外的、不必要的(非 GPS)信息。 You can either fix the database by manually removing the unneeded information from the datasheet, or use following code snippet to skip the rows that include it:您可以通过从数据表中手动删除不需要的信息来修复数据库,也可以使用以下代码片段跳过包含它的行:

from pandas import read_table

data = read_table('34_2017-02-06.gpx.csv', sep=';', skiprows=list(range(1, 34)).drop("Unnamed: 28", axis=1)

The drop("Unnamed: 28", axis=1) is simply there to remove an additional column that is created probably due to each row in your datasheet ending with a ; drop("Unnamed: 28", axis=1)只是为了删除可能由于数据表中的每一行以;结尾而创建的附加列; (because it reads the empty space at the end of each line as data). (因为它将每行末尾的空白读取为数据)。

The result of print(data.head()) is then as follows: print(data.head())的结果如下:

   index  cumdist   ele    ...     esttotalpower        lat       lon
0     49      340 -34.8    ...                 9  52.077362  5.114530
1     51      350 -34.8    ...                17  52.077468  5.114543
2     52      360 -35.0    ...               -54  52.077521  5.114551
3     53      370 -35.0    ...              -173  52.077603  5.114505
4     54      380 -34.8    ...               335  52.077677  5.114387

[5 rows x 28 columns]

To explain the role of the drop command even more, here is what would happen without it (notice the last, weird column)为了进一步解释drop命令的作用,这里是没有它会发生什么(注意最后一个奇怪的列)

   index  cumdist   ele     ...             lat       lon  Unnamed: 28
0     49      340 -34.8     ...       52.077362  5.114530          NaN
1     51      350 -34.8     ...       52.077468  5.114543          NaN
2     52      360 -35.0     ...       52.077521  5.114551          NaN
3     53      370 -35.0     ...       52.077603  5.114505          NaN
4     54      380 -34.8     ...       52.077677  5.114387          NaN

[5 rows x 29 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM