[英]how to store bytes like b'PK\x03\x04\x14\x00\x08\x08\x08\x009bwR\x00\x00\x00\x00\x00\x00\x00 to dataframe or csv in python
[英]How to read a csv with rows of NUL, ('\x00'), into pandas?
我有一組 csv 文件,日期和時間作為前兩列(文件中沒有標題)。 這些文件在 Excel 中打開得很好,但是當我嘗試使用 Pandas read_csv 將它們讀入 Python 時,無論我是否嘗試類型轉換,都只返回第一個日期。
當我在記事本中打開時,它不是簡單的逗號分隔,並且在第 1 行之后的每一行之前都有大量空格; 我試過skipinitialspace = True
無濟於事
我也嘗試過各種類型轉換,但都沒有奏效。 我目前正在使用parse_dates = [['Date','Time']], infer_datetime_format = True, dayfirst = True
示例輸出(無轉換):
0 1 2 3 4 ... 12 13 14 15 16
0 02/03/20 15:13:39 5.5 5.8 42.84 ... 30.0 79.0 0.0 0.0 0.0
1 NaN 15:13:49 5.5 5.8 42.84 ... 30.0 79.0 0.0 0.0 0.0
2 NaN 15:13:59 5.5 5.7 34.26 ... 30.0 79.0 0.0 0.0 0.0
3 NaN 15:14:09 5.5 5.7 34.26 ... 30.0 79.0 0.0 0.0 0.0
4 NaN 15:14:19 5.5 5.4 17.10 ... 30.0 79.0 0.0 0.0 0.0
... ... ... ... ... ... ... ... ... ... ... ...
39451 NaN 01:14:27 5.5 8.4 60.00 ... 30.0 68.0 0.0 0.0 0.0
39452 NaN 01:14:37 5.5 8.4 60.00 ... 30.0 68.0 0.0 0.0 0.0
39453 NaN 01:14:47 5.5 8.4 60.00 ... 30.0 68.0 0.0 0.0 0.0
39454 NaN 01:14:57 5.5 8.4 60.00 ... 30.0 68.0 0.0 0.0 0.0
39455 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN
並使用 parse_dates 等:
Date_Time pH1 SP pH Ph1 PV pH ... 1 2 3
0 02/03/20 15:13:39 5.5 5.8 ... 0.0 0.0 0.0
1 nan 15:13:49 5.5 5.8 ... 0.0 0.0 0.0
2 nan 15:13:59 5.5 5.7 ... 0.0 0.0 0.0
3 nan 15:14:09 5.5 5.7 ... 0.0 0.0 0.0
4 nan 15:14:19 5.5 5.4 ... 0.0 0.0 0.0
... ... ... ... ... ... ... ...
39451 nan 01:14:27 5.5 8.4 ... 0.0 0.0 0.0
39452 nan 01:14:37 5.5 8.4 ... 0.0 0.0 0.0
39453 nan 01:14:47 5.5 8.4 ... 0.0 0.0 0.0
39454 nan 01:14:57 5.5 8.4 ... 0.0 0.0 0.0
39455 nan nan NaN NaN ... NaN NaN NaN
從記事本復制的數據(每行前面實際上有更多空格,但在這里不起作用):
67.csv
數據02/03/20,15:13:39,5.5,5.8,42.84,7.2,6.8,10.63,60.0,0.0,300,1,30,79,0.0,0.0, 0.0
02/03/20,15:13:49,5.5,5.8,42.84,7.2,6.8,10.63,60.0,0.0,300,1,30,79,0.0,0.0, 0.0
02/03/20,15:13:59,5.5,5.7,34.26,7.2,6.8,10.63,60.0,22.3,300,1,30,79,0.0,0.0, 0.0
02/03/20,15:14:09,5.5,5.7,34.26,7.2,6.8,10.63,60.0,15.3,300,45,30,79,0.0,0.0, 0.0
02/03/20,15:14:19,5.5,5.4,17.10,7.2,6.8,10.63,60.0,50.2,300,86,30,79,0.0,0.0, 0.0
在 Excel 中(所以我知道信息在那里並且可讀):
import sys
import numpy as np
import pandas as pd
from datetime import datetime
from tkinter import filedialog
from tkinter import *
def import_file(filename):
print('\nOpening ' + filename + ":")
##Read the data in the file
df = pd.read_csv(filename, header = None, low_memory = False)
print(df)
df['Date_Time'] = pd.to_datetime(df[0] + ' ' + df[1])
df.drop(columns=[0, 1], inplace=True)
print(df)
filenames=[]
print('Select files to read, Ctrl or Shift for Multiples')
TkWindow = Tk()
TkWindow.withdraw() # we don't want a full GUI, so keep the root window from appearing
## Show an "Open" dialog box and return the path to the selected file
filenames = filedialog.askopenfilename(title='Open data file', filetypes=(("Comma delimited", "*.csv"),), multiple=True)
TkWindow.destroy()
if len(filenames) == 0:
print('No files selected - Exiting program.')
sys.exit()
else:
print('\n'.join(filenames))
##Read the data from the specified file/s
print('\nReading data file/s')
dfs=[]
for filename in filenames:
dfs.append(import_file(filename))
if len(dfs) > 1:
print('\nCombining data files.')
NUL
, '\\x00'
,需要刪除。pandas.DataFrame
從d
加載數據。import pandas as pd
import string # to make column names
# the issue is the the file is filled with NUL not whitespace
def import_file(filename):
# open the file and clean it
with open(filename) as f:
d = list(f.readlines())
# replace NUL, strip whitespace from the end of the strings, split each string into a list
d = [v.replace('\x00', '').strip().split(',') for v in d]
# remove some empty rows
d = [v for v in d if len(v) > 2]
# load the file with pandas
df = pd.DataFrame(d)
# convert column 0 and 1 to a datetime
df['datetime'] = pd.to_datetime(df[0] + ' ' + df[1])
# drop column 0 and 1
df.drop(columns=[0, 1], inplace=True)
# set datetime as the index
df.set_index('datetime', inplace=True)
# convert data in columns to floats
df = df.astype('float')
# give character column names
df.columns = list(string.ascii_uppercase)[:len(df.columns)]
# reset the index
df.reset_index(inplace=True)
return df.copy()
# call the function
dfs = list()
filenames = ['67.csv']
for filename in filenames:
dfs.append(import_file(filename))
display(df)
A B C D E F G H I J K L M N O
datetime
2020-02-03 15:13:39 5.5 5.8 42.84 7.2 6.8 10.63 60.0 0.0 300.0 1.0 30.0 79.0 0.0 0.0 0.0
2020-02-03 15:13:49 5.5 5.8 42.84 7.2 6.8 10.63 60.0 0.0 300.0 1.0 30.0 79.0 0.0 0.0 0.0
2020-02-03 15:13:59 5.5 5.7 34.26 7.2 6.8 10.63 60.0 22.3 300.0 1.0 30.0 79.0 0.0 0.0 0.0
2020-02-03 15:14:09 5.5 5.7 34.26 7.2 6.8 10.63 60.0 15.3 300.0 45.0 30.0 79.0 0.0 0.0 0.0
2020-02-03 15:14:19 5.5 5.4 17.10 7.2 6.8 10.63 60.0 50.2 300.0 86.0 30.0 79.0 0.0 0.0 0.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.