简体   繁体   English

从熊猫数据框中删除“垃圾”列

[英]Dropping a "garbage" column from pandas dataframe

i am trying my hardest to plot some data i have from a particularly badly formatted file (I can not change the format of the files, so i have to build around the issues i meet).我正在尽我最大的努力从一个格式特别糟糕的文件中绘制一些数据(我无法更改文件的格式,所以我必须围绕我遇到的问题进行构建)。 I am trying to import the data from the file, and removing some garbage data i do not need, such as error messages, but i am struggling.我正在尝试从文件中导入数据,并删除一些我不需要的垃圾数据,例如错误消息,但我很挣扎。

Here i have a function that allows me to open the file i want to work with, and some workarounds for formatting it into the dataframe:在这里,我有一个函数可以让我打开我想要使用的文件,以及一些将其格式化为数据帧的解决方法:

headers = ['Date','Time','Pressure','Temperature','Bias','RefTemp', 'Garbage']
def plotDigitalFunction():
    infile=askopenfilename()

    df = pd.read_csv(infile,sep="\t",names=headers, skiprows=1, parse_dates=[['Date','Time']])
    df = df.drop('Garbage', axis=1)

the top of my file looks something like this:我的文件顶部看起来像这样:

Date    Time    Pressure    Temperature Bias    Error
06.02.12    13:42:19:549         -2689      895524     1842052        27.0  ERROR: T1B1

So, here i have 6 headers and 7 columns.所以,这里我有 6 个标题和 7 列。 I am skipping the first row, and setting my own headers and combining Date and time so i have 6 (i need the date and time stamp in the same column).我跳过第一行,设置我自己的标题并组合日期和时间,所以我有 6 个(我需要同一列中的日期和时间戳)。

I have tried to work with this code in jupyter notebook, and it works flawlessly.我曾尝试在 jupyter notebook 中使用此代码,并且它完美无缺。 I get something like this:我得到这样的东西:

        Date_Time   Pressure    Temperature Bias    RefTemp
    0   06.02.12 13:42:19:549   -2689   895524  1842052 27.0
    1   06.02.12 13:42:20:546   -2689   895467  1841921 27.0
    2   06.02.12 13:42:21:544   -2689   895388  1841817 27.0
    3   06.02.12 13:42:22:543   -2691   895287  1841672 27.0

But when i am running the same code in python 3.6.2 it seems as if just the column header gets deleted and the data beneath it gets shifted and placed under the column to the left of it.但是当我在 python 3.6.2 中运行相同的代码时,似乎只是列标题被删除,它下面的数据被移动并放置在它左侧的列下。 This wont work, and i am struggling to figure out what i am doing wrong.这行不通,我正在努力弄清楚我做错了什么。

I had a solution before to open the file and then creating a temporary csv file wich i then read from, that worked.我之前有一个解决方案来打开文件,然后创建一个临时的 csv 文件,然后我从中读取,该文件有效。 But it is quite an amount of data to run through so it takes double the amount of time to process.但是要运行的数据量相当大,因此处理时间要加倍。

I hope this is explained well enough and dont hesitate to ask questions if i need to elaborate.我希望这解释得足够好,如果我需要详细说明,请不要犹豫提出问题。

Thanks in advance!提前致谢!

edit: i just tried the same code in the console, using the filename hardcoded instead of using "infile" with tkinter's askopenfilename().编辑:我只是在控制台中尝试了相同的代码,使用硬编码的文件名而不是使用带有 tkinter 的 askopenfilename() 的“infile”。 that worked fine.效果很好。 Could the problem be because of that?问题可能是因为这个吗?

Probably it's about python version that you use.可能与您使用的python版本有关。 Check what python version you are using with jupyter and use the same one.检查您在 jupyter 中使用的 Python 版本并使用相同的版本。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM