简体   繁体   English

导入期间跳过行

[英]Skip rows during import

i know how to skip rows... and how to handle different header, but how can I manage different headers? 我知道如何跳过行...以及如何处理不同的标题,但是如何管理不同的标题?

In my example, I have a CSV file like this: 在我的示例中,我有一个CSV文件,如下所示:

Header_row1; Header_row2;....
2 ;3 ;...
2 ;3 ;...
2 ;3 ;...
2 ;3 ;...
Header_row1; Header_row2;....
2 ;3 ;...
2 ;3 ;...
2 ;3 ;...
Header_row1; Header_row2;....
2 ;3 ;...
2 ;3 ;...

I want a panda dataframe with only one header for my further calculations. 我想要一个只有一个标头的熊猫数据框进行进一步的计算。

Thanks a lot. 非常感谢。

Edit: After a few comments: A part of my code: 编辑:在一些评论后:我的代码的一部分:

for h in range(len(dpath)):
  path = lidar_save + dpath[h]

  #Combine seperate files to one file over the periode.
  data_month = pd.DataFrame()
  data_month_std = pd.DataFrame()
  wind_rec_gz = glob.glob(path+'/*.csv')
  print('Read: ', wind_rec_gz[0])

  df = pd.read_csv(wind_rec_gz, header=0, sep=';',encoding = 'unicode_escape')

I'm not sure there's a way to do so while importing. 我不确定导入时是否有办法。 But you can do that after import: 但是您可以在导入后执行此操作:

df = pd.read_csv('your_csv_file')

# this check every row if they are different from the column names
s = df.ne(df.columns, axis=1).any(axis=1)

# s is
#0      True
#1      True
#2      True
#3      True
#4     False
#5      True
#6      True
#7      True
#8     False
#9      True
#10     True
#dtype: bool


# keep only those rows
df = df[s]

Output: 输出:

   Header_row1  Header_row2 ....
0           2            3   ...
1           2            3   ...
2           2            3   ...
3           2            3   ...
5           2            3   ...
6           2            3   ...
7           2            3   ...
9           2            3   ...
10          2            3   ...

The variable wind_rec_gz is a list ['/media/..../rge/merge_2019-04-04.csv'] (look at the brackets). 变量wind_rec_gz是一个列表['/media/..../rge/merge_2019-04-04.csv'] (请看方括号)。 Get rid of the brackets and it should work (along with Quang Hoang's answer to get rid of the extra header rows). 摆脱括号,它应该起作用(连同Quang Hoang的回答摆脱多余的标题行)。

wind_rec_gz = r'/media/..../rge/merge_2019-04-04.csv'

df = pd.read_csv(wind_rec_gz, sep=';', header=0)

As I mentioned in my earlier comment, if you use Quang Hoang's approach, you will get rid of the extra header rows, but all columns will be imported as object instead of integers. 正如我在前面的评论中提到的那样,如果使用Quang Hoang的方法,您将摆脱多余的标题行,但是所有列都将作为对象而不是整数导入。 This could be more work if you have lots of columns of different datatypes. 如果您有许多不同数据类型的列,则可能需要更多工作。 One solution might then be to export it back to csv and then import it back again... 一种解决方案可能是将其导出回csv,然后再次导入回...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM