I have a .dat file which takes thousands of rows in a column (say, the column is time, t), now I want to find the interval between the rows in the column, that means subtracting the value of second row from first row, and so on.. (to find dt). Then I wish to make a new column with those interval values and plot it against the original column. If any other language other than python is helpful in this case, I appreciate their suggestion too.
I have written a pseudo python code for that:
import pandas as pd
import numpy as np
from sys import argv
from pylab import *
import csv
script, filename = argv
# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("./flash.dat").readlines()]
# write it as a new CSV file
with open("./flash.dat", "wb") as f:
writer = csv.writer(f)
writer.writerows(datContent)
columns_to_keep = ['#time']
dataframe = pd.read_csv("./flash.csv", usecols=columns_to_keep)
df = pd.DataFrame({"#time"})
df["#time"] = df["#time"] + [pd.Timedelta(minutes=m) for m in np.random.choice(a=range(60), size=df.shape[0])]
df["value"] = np.random.normal(size=df.shape[0])
df["prev_time"] = [np.nan] + df.iloc[:-1]["#time"].tolist()
df["time_delta"] = df.time - df.prev_time
df
pd.set_option('display.height', 1000)
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
dataframe.plot(x='#time', y='time_delta', style='r')
print dataframe
show()
Updated my code, and i am also sharing the .dat file I am working on. https://www.dropbox.com/s/w4jbxmln9e83355/flash.dat?dl=0
One easy way to perform an operation involving values from different rows is simply to copy the required values one the same row and then apply a simple row-wise operation.
For instance, in your example, we'd have a dataframe with one time
column and some other data, like so:
import pandas as pd
import numpy as np
df = pd.DataFrame({"time": pd.date_range("24 sept 2016", periods=5*24, freq="1h")})
df["time"] = df["time"] + [pd.Timedelta(minutes=m) for m in np.random.choice(a=range(60), size=df.shape[0])]
df["value"] = np.random.normal(size=df.shape[0])
If you want to compute the time delta from the previous (or next, or whatever) row, you can simply copy the value from it, and then perform the subtraction:
df["prev_time"] = [np.nan] + df.iloc[:-1]["time"].tolist()
df["time_delta"] = df.time - df.prev_time
df
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.