简体   繁体   中英

Python data-frame using pandas

I have a dataset which looks like below

  [25/May/2015:23:11:15  000]
  [25/May/2015:23:11:15  000]
  [25/May/2015:23:11:16  000]
  [25/May/2015:23:11:16  000]

Now i have made this into a DF and df[0] has [25/May/2015:23:11:15 and df[1] has 000] . I want to send all the data which ends with same seconds to a file. in the above example they end with 15 and 16 as seconds. So all ending with 15 seconds into one and the other into a different one and many more

I have tried the below code

   import pandas as pd
   data = pd.read_csv('apache-access-log.txt', sep=" ", header=None)
   df = pd.DataFrame(data)
   print(df[0],df[1].str[-2:])

Converting that column to a datetime would make it easier to work on, eg:

df['date'] = pd.to_datetime(df['date'], format='%d/%B/%Y:%H:%m:%S')

The you can simply iterate over a groupby() , eg:

In []:
for k, frame in df.groupby(df['date'].dt.second):
     #frame.to_csv('file{}.csv'.format(k))
     print('{}\n{}\n'.format(k, frame))

Out[]: 
15
                 date  value
0 2015-11-25 23:00:15      0
1 2015-11-25 23:00:15      0

16
                 date  value
2 2015-11-25 23:00:16      0
3 2015-11-25 23:00:16      0

You can set your datetime as the index for the dataframe, and then use loc and to_csv Pandas' functions. Obviously, as other answers points out, you should convert your date to datetime while reading your dataframe.

Example:

df = df.set_index(['date'])
df.loc['25/05/2018 23:11:15':'25/05/2018 23:11:15'].to_csv('df_data.csv')

Try out this,

## Convert a new column with seconds value    
df['seconds'] = df.apply(lambda row: row[0].split(":")[3].split(" ")[0], axis=1)

for sec in df['seconds'].unique():  
    ## filter by seconds
    print("Resutl ",df[df['seconds'] == sec])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM