简体   繁体   English

从python pandas数据框中的前几行中减去某列的行

[英]Subtracting the rows of a column from the preceding rows in a python pandas dataframe

I have a .dat file which takes thousands of rows in a column (say, the column is time, t), now I want to find the interval between the rows in the column, that means subtracting the value of second row from first row, and so on.. (to find dt). 我有一个.dat文件,该文件在一列中包含数千行(例如,该列是时间,t),现在我想查找该列中各行之间的间隔,这意味着从第一行中减去第二行的值,依此类推(找到dt)。 Then I wish to make a new column with those interval values and plot it against the original column. 然后,我希望使用这些间隔值创建一个新列,并将其与原始列相对应。 If any other language other than python is helpful in this case, I appreciate their suggestion too. 如果在这种情况下,除了python以外的其他语言也有帮助,我也很感谢他们的建议。
I have written a pseudo python code for that: 我为此编写了一个伪python代码:

    import pandas as pd
import numpy as np
from sys import argv
from pylab import *


import csv



script, filename = argv


# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./flash.dat").readlines()]

# write it as a new CSV file
with open("./flash.dat", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)


columns_to_keep = ['#time']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)


df = pd.DataFrame({"#time"})
df["#time"] = df["#time"]  + [pd.Timedelta(minutes=m) for m in np.random.choice(a=range(60), size=df.shape[0])]
df["value"] = np.random.normal(size=df.shape[0])

df["prev_time"] = [np.nan] + df.iloc[:-1]["#time"].tolist()
df["time_delta"] = df.time - df.prev_time
df

pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

dataframe.plot(x='#time', y='time_delta', style='r')

print dataframe

show()

Updated my code, and i am also sharing the .dat file I am working on. 更新了我的代码,并且我还共享了正在使用的.dat文件。 https://www.dropbox.com/s/w4jbxmln9e83355/flash.dat?dl=0 https://www.dropbox.com/s/w4jbxmln9e83355/flash.dat?dl=0

One easy way to perform an operation involving values from different rows is simply to copy the required values one the same row and then apply a simple row-wise operation. 执行涉及来自不同行的值的操作的一种简单方法是,将所需的值复制到同一行中,然后应用简单的逐行操作。

For instance, in your example, we'd have a dataframe with one time column and some other data, like so: 例如,在您的示例中,我们将有一个包含一个time列和一些其他数据的数据框,如下所示:

import pandas as pd
import numpy as np 

df = pd.DataFrame({"time":  pd.date_range("24 sept 2016",  periods=5*24, freq="1h")})
df["time"] = df["time"]  + [pd.Timedelta(minutes=m) for m in np.random.choice(a=range(60), size=df.shape[0])]
df["value"] = np.random.normal(size=df.shape[0])

在此处输入图片说明

If you want to compute the time delta from the previous (or next, or whatever) row, you can simply copy the value from it, and then perform the subtraction: 如果要从上一行(或下一行,或任何其他行)计算时间增量,则可以简单地从中复制值,然后执行减法:

df["prev_time"] = [np.nan] + df.iloc[:-1]["time"].tolist()
df["time_delta"] = df.time - df.prev_time
df

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM