简体   繁体   中英

Python CSV: How to extract data with condition from Dataframe, edit the extracted data then put it back into the Dataframe

Sample csv data:

ID,AC_Input_Voltage,AC_Input_Current,DC_Output_Voltage,DC_Output_Current,DC_Output_Power,Input_Active_Power,Input_Reactive_Power,Input_Apparent_Power,Line_Frequency,DC_Ref,AC_Ref,Time_Stamp
8301,418,13.2,34.4,136,4673,1,-1,5524.5,0,49,0,22/6/2017 05:11:00
8301,419.3,2.3,0.7,-0.9,-0.6,1,-1,946.2,0,50,0,22/6/2017 05:11:01
8301,417.7,15.2,30.3,196.5,5962,1,-1,6355,0,49,0,22/6/2017 05:11:02
8301,418.7,2.3,0.7,-0.9,-0.6,1,-1,944.7,0,50,0,22/6/2017 05:11:03
8301,419.3,3.4,53.6,10.8,580.2,1,-1,1432.8,0,49,0,22/6/2017 05:11:04
8301,417.7,13.6,30.1,170.4,5122.7,1,-1,5681.8,0,50,0,22/6/2017 05:11:05
8301,418,11.5,41.2,105,4328.2,1,-1,4796.9,0,49,0,22/6/2017 05:11:07
8301,419.7,2.3,0.8,-0.9,-0.7,1,-1,946.9,0,51,0,22/6/2017 05:11:08
8301,419.7,2.3,40.6,-0.7,-27.9,1,-1,974,0,49,0,22/6/2017 05:11:09
8301,417.4,14.9,30.4,194.4,5903.8,1,-1,6215.4,0,51,0,22/6/2017 05:11:10
8301,417.7,14.7,30.5,186.2,5682.9,1,-1,6139.5,0,49,0,22/6/2017 05:11:11
8301,418,12,31.5,141.5,4456.9,1,-1,5012.5,0,51,0,22/6/2017 05:11:12
8301,419,2.3,0.7,-1.4,-0.9,1,-1,945.4,0,49,0,22/6/2017 05:11:13
8301,419,2.3,0.7,-0.9,-0.6,1,-1,945.4,0,50,0,22/6/2017 05:11:14
8301,419.7,2.3,0.8,-0.9,-0.7,1,-1,946.9,0,50,0,22/6/2017 05:11:15
8301,419,2.3,0.7,-0.9,-0.6,1,-1,945.4,0,49,0,22/6/2017 05:11:16
8301,419,2.3,32.9,-0.2,-5.7,1,-1,972.4,0,51,0,22/6/2017 05:11:17
8301,419.3,2.3,50.3,0.3,17.3,1,-1,973.2,0,49,0,22/6/2017 05:11:18
8301,417.4,15.2,30.5,197.4,6010.5,1,-1,6350,0,50,0,22/6/2017 05:11:19
8301,418.7,2.3,0.9,-0.9,-0.7,1,-1,944.7,0,49,0,22/6/2017 05:11:20
8301,419,2.3,42.9,-0.2,-7.4,1,-1,972.4,0,50,0,22/6/2017 05:11:21
8301,417.4,13.9,30.4,180,5477.6,1,-1,5811.8,0,49,0,22/6/2017 05:11:22
8301,419.7,2.3,0.9,-0.9,-0.8,1,-1,946.9,0,50,0,22/6/2017 05:11:23
8301,418.7,2.3,0.7,-0.9,-0.6,1,-1,944.7,0,50,0,22/6/2017 05:11:24
8301,418.3,2.3,0.6,-0.9,-0.5,1,-1,943.9,0,49,0,22/6/2017 05:11:25

I've tried the following code and manage to edit the data then put them into a new dataframe( df_filter2 ):

import numpy as np
from datetime import date,time,datetime
import pandas as pd
import csv

df = pd.read_csv('Data.csv')
df["Time_Stamp"] = pd.to_datetime(df["Time_Stamp"]) # convert to Datetime

def getMask(start,end):
    mask = (df['Time_Stamp'] > start) & (df['Time_Stamp'] <= end)
    return mask;

start = '2017-06-22 05:00:00'
end = '2017-06-22 05:20:00'
timerange = df.loc[getMask(start, end)]

df_filter = timerange[timerange["AC_Input_Current"].le(3.0)] # new df with less or equal to 0.5
#print(df_filter)

where = (df_filter[df_filter["Time_Stamp"].diff().dt.total_seconds() > 1] ["Time_Stamp"] - pd.Timedelta("1s")).astype(str).tolist() # Find where diff > 1 second
df_filter2 = timerange[timerange["Time_Stamp"].isin(where)] # Create new df with those
#print(df_filter2)
df_filter2["AC_Input_Current"] = 0.0 # Set c1 to 0.0

#display spikes ( high possibility of data being a spike )
for index, row in df_filter2.iterrows():
    values = row.astype(str).tolist()
    print(','.join(values))

Output: Note : The edited rows below is in the dataframe df_filter2 ..

8301,418.0,0.0,34.4,136.0,4673.0,1,-1,5524.5,0,49,0,2017-06-22 05:11:00
8301,417.7,0.0,30.3,196.5,5962.0,1,-1,6355.0,0,49,0,2017-06-22 05:11:02
8301,418.0,0.0,41.2,105.0,4328.2,1,-1,4796.9,0,49,0,2017-06-22 05:11:07
8301,418.0,0.0,31.5,141.5,4456.9,1,-1,5012.5,0,51,0,2017-06-22 05:11:12
8301,417.4,0.0,30.5,197.4,6010.5,1,-1,6350.0,0,50,0,2017-06-22 05:11:19
8301,417.4,0.0,30.4,180.0,5477.6,1,-1,5811.8,0,49,0,2017-06-22 05:11:22

What I want is to put back the output ( from df_filter2 ) into the main dataframe df , replacing the rows from df with the same Time_Stamp , with the rows from df_filter2 . How do I do this?

Make Time_Stamp the index for both data frames, then assign df to df_filter2 values, based on matching indices.

First, ensure both data frames have Time_Stamp in the same format, as well as identical column names. For the sample data provided, I used:

# copy df sample data from OP
df = pd.read_clipboard(sep=",", parse_dates=["Time_Stamp"])
# now copy df_filter2 sample data
df_filter2 = pd.read_clipboard(sep=",", header=None, names=df.columns, parse_dates=[12])

Now, set Time_Stamp as index and replace matching rows:

df = df.set_index("Time_Stamp")
df_filter2 = df_filter2.set_index("Time_Stamp")
df.loc[df_filter2.index] = df_filter2

UPDATE (per comments)
To be explicit, here's a full working example, starting with a data dict, building df , and using OP code to generate df_filter2 . Only slight modifications made (eg defining Time_Stamp as pd.Timestamp in the original data , and adding .loc in places).

# sample data
import pandas as pd
from pandas import Timestamp

data = {'AC_Input_Current': {0: 13.199999999999999, 1: 2.2999999999999998,2: 15.199999999999999,3: 2.2999999999999998,4: 3.3999999999999999,5: 13.6,6: 11.5,7: 2.2999999999999998,8: 2.2999999999999998,9: 14.9,10: 14.699999999999999,11: 12.0,12: 2.2999999999999998,13: 2.2999999999999998,14: 2.2999999999999998,15: 2.2999999999999998,16: 2.2999999999999998,17: 2.2999999999999998,18: 15.199999999999999,19: 2.2999999999999998,20: 2.2999999999999998,21: 13.9,22: 2.2999999999999998,23: 2.2999999999999998,24: 2.2999999999999998},
'AC_Input_Voltage': {0: 418.0,1: 419.30000000000001,2: 417.69999999999999,3: 418.69999999999999,4: 419.30000000000001,5: 417.69999999999999,6: 418.0,7: 419.69999999999999,8: 419.69999999999999,9: 417.39999999999998,10: 417.69999999999999,11: 418.0,12: 419.0,13: 419.0,14: 419.69999999999999,15: 419.0,16: 419.0,17: 419.30000000000001,18: 417.39999999999998,19: 418.69999999999999,20: 419.0,21: 417.39999999999998,22: 419.69999999999999,23: 418.69999999999999,24: 418.30000000000001},
'DC_Output_Current': {0: 136.0,1: -0.90000000000000002,2: 196.5,3: -0.90000000000000002,4: 10.800000000000001,5: 170.40000000000001,6: 105.0,7: -0.90000000000000002,8: -0.69999999999999996,9: 194.40000000000001,10: 186.19999999999999,11: 141.5,12: -1.3999999999999999,13: -0.90000000000000002,14: -0.90000000000000002,15: -0.90000000000000002,16: -0.20000000000000001,17: 0.29999999999999999,18: 197.40000000000001,19: -0.90000000000000002,20: -0.20000000000000001,21: 180.0,22: -0.90000000000000002,23: -0.90000000000000002,24: -0.90000000000000002},
'DC_Output_Power': {0: 4673.0,1: -0.59999999999999998,2: 5962.0,3: -0.59999999999999998,4: 580.20000000000005,5: 5122.6999999999998,6: 4328.1999999999998,7: -0.69999999999999996,8: -27.899999999999999,9: 5903.8000000000002,10: 5682.8999999999996,11: 4456.8999999999996,12: -0.90000000000000002,13: -0.59999999999999998,14: -0.69999999999999996,15: -0.59999999999999998,16: -5.7000000000000002,17: 17.300000000000001,18: 6010.5,19: -0.69999999999999996,20: -7.4000000000000004,21: 5477.6000000000004,22: -0.80000000000000004,23: -0.59999999999999998,24: -0.5},
'DC_Output_Voltage': {0: 34.399999999999999,1: 0.69999999999999996,2: 30.300000000000001,3: 0.69999999999999996,4: 53.600000000000001,5: 30.100000000000001,6: 41.200000000000003,7: 0.80000000000000004,8: 40.600000000000001,9: 30.399999999999999,10: 30.5,11: 31.5,12: 0.69999999999999996,13: 0.69999999999999996,14: 0.80000000000000004,15: 0.69999999999999996,16: 32.899999999999999,17: 50.299999999999997,18: 30.5,19: 0.90000000000000002,20: 42.899999999999999,21: 30.399999999999999,22: 0.90000000000000002,23: 0.69999999999999996,24: 0.59999999999999998},
 'DC_Ref': {0: 49,1: 50,2: 49,3: 50,4: 49,5: 50,6: 49,7: 51,8: 49,9: 51,10: 49,11: 51,12: 49,13: 50,14: 50,15: 49,16: 51,17: 49,18: 50,19: 49,20: 50,21: 49,22: 50,23: 50,24: 49},
 'Input_Apparent_Power': {0: 5524.5,1: 946.20000000000005,2: 6355.0,3: 944.70000000000005,4: 1432.8,5: 5681.8000000000002,6: 4796.8999999999996,7: 946.89999999999998,8: 974.0,9: 6215.3999999999996,10: 6139.5,11: 5012.5,12: 945.39999999999998,13: 945.39999999999998,14: 946.89999999999998,15: 945.39999999999998,16: 972.39999999999998,17: 973.20000000000005,18: 6350.0,19: 944.70000000000005,20: 972.39999999999998,21: 5811.8000000000002,22: 946.89999999999998,23: 944.70000000000005,24: 943.89999999999998},
'Time_Stamp': {0: Timestamp('2017-06-22 05:11:00'),1: Timestamp('2017-06-22 05:11:01'),2: Timestamp('2017-06-22 05:11:02'),3: Timestamp('2017-06-22 05:11:03'),4: Timestamp('2017-06-22 05:11:04'),5: Timestamp('2017-06-22 05:11:05'),6: Timestamp('2017-06-22 05:11:07'),7: Timestamp('2017-06-22 05:11:08'),8: Timestamp('2017-06-22 05:11:09'),9: Timestamp('2017-06-22 05:11:10'),10: Timestamp('2017-06-22 05:11:11'),11: Timestamp('2017-06-22 05:11:12'),12: Timestamp('2017-06-22 05:11:13'),13: Timestamp('2017-06-22 05:11:14'),14: Timestamp('2017-06-22 05:11:15'),15: Timestamp('2017-06-22 05:11:16'),16: Timestamp('2017-06-22 05:11:17'),17: Timestamp('2017-06-22 05:11:18'),18: Timestamp('2017-06-22 05:11:19'),19: Timestamp('2017-06-22 05:11:20'),20: Timestamp('2017-06-22 05:11:21'),21: Timestamp('2017-06-22 05:11:22'),22: Timestamp('2017-06-22 05:11:23'),23: Timestamp('2017-06-22 05:11:24'),24: Timestamp('2017-06-22 05:11:25')}}
df = pd.DataFrame(data)

There are a few columns which have constant values:

df["AC_Ref"] = 0
df["ID"] = 8301
df["Input_Active_Power"] = 1
df["Input_Reactive_Power"] = -1
df["Line_Frequency"] = 0

Now construct df_filter2 :

def getMask(start,end):
    mask = (df['Time_Stamp'] > start) & (df['Time_Stamp'] <= end)
    return mask;
start = '2017-06-22 05:00:00'
end = '2017-06-22 05:20:00'
timerange = df.loc[getMask(start, end)]
df_filter = timerange.loc[timerange["AC_Input_Current"].le(3.0)]
where = (df_filter.loc[df_filter["Time_Stamp"].diff().dt.total_seconds() > 1, "Time_Stamp"] - pd.Timedelta("1s")).astype(str).tolist() 
df_filter2 = timerange.loc[timerange["Time_Stamp"].isin(where)].copy() 
df_filter2["AC_Input_Current"] = 0.0 

Finally, replace rows in df with matching rows (by Time_Stamp ) from df_filter2 :

df = df.set_index("Time_Stamp")
df_filter2 = df_filter2.set_index("Time_Stamp")
df.loc[df_filter2.index] = df_filter2

We can check to be sure the replacement has occurred:

assert(all(df.AC_Input_Current.sort_values()[:5].values == df_filter2.AC_Input_Current.values))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM