[英]Python csv: find the latest record with a condition
我有一個帶有以下示例數據的 csv:
id bb_id cc_id datetime
-------------------------
1 11 44 2019-06-09
2 33 55 2020-06-09
3 22 66 2020-06-09
4 11 44 2019-06-09
5 11 44 2020-02-22
假設條件是if bb_id == 11 and cc_id == 44
獲得最新記錄,即:
11 44 2020-02-22
我如何從 csv 獲得這個?
我做了什么:
with open('sample.csv') as csv_file
for indx, data in enumerate(csv.DictReader(csv_file)):
# check if the conditional data is in the file?
if data['bb_id'] == 11 and data['cc_id'] == 44:
# sort the data by date? or should I store all the relevant data before hand in a data structure like list and then apply sort on it? could I avoid that? as I need to perform this interactively multiple times
將所有選定的記錄放在一個列表中,然后使用max()
function 以日期為鍵。
selected_rows = []
with open('sample.csv') as csv_file
for data in csv.DictReader(csv_file):
# check if the conditional data is in the file?
if data['bb_id'] == 11 and data['cc_id'] == 44:
selected_rows.append(data)
latest = max(selected_rows, key = lambda x: x['datetime'])
print(latest)
如果您真的想在常規 python 中執行此操作,則如下所示很簡單:
with open('sample.csv') as csv_file:
list_of_dates = []
for indx, data in enumerate(csv.DictReader(csv_file)):
if data['bb_id'] == 11 and data['cc_id'] == 44:
list_of_dates.append(data['datetime'])
sorted = list_of_dates.sort()
print( sorted[-1] ) # you already know the values for bb and cc
也試試:
def sort_func(e):
return e['datetime']
with open('sample.csv') as csv_file:
list_of_dates = []
for indx, data in enumerate(csv.DictReader(csv_file)):
if data['bb_id'] == 11 and data['cc_id'] == 44:
list_of_dates.append(data)
sorted = list_of_dates.sort(key=sort_func)
print( sorted[-1] )
我知道的最簡單的方法:
import pandas as pd
import pandasql as ps
sample_df = pd.read_csv(<filepath>);
ps.sqldf("""select *
from (select *
from sample_df
where bb_id = 11
and cc_id = 44
order by datetime desc) limit 1""", locals())
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.