[英]What's the fastest way to compare datetime in pandas?
I have two big csv files with different number of rows which I am importing as follows:我有两个具有不同行数的大 csv 文件,我将按如下方式导入:
tdata = pd.read_csv(tfilepath, sep=',', parse_dates=['date_1'])
print(tdata.iloc[:, [0,3]])
TBA date_1
0 0 2010-01-04
1 9 2010-01-05
2 0 2010-01-06
3 8 2010-01-07
4 0 2010-01-08
5 0 2010-01-09
pdata = pd.read_csv(pfilepath, sep=',', parse_dates=['date_2'])
print(pdata.iloc[:, [0,3]])
TBA date_2
0 3 2011-01-04
1 5 2010-01-09
2 0 2012-02-03
3 9 2010-03-17
4 1 2010-11-08
5 2 2010-01-05
Now I want to replace TBA in the first dataframe with corresponding TBA in the second dataframe where the dates match.现在我想用日期匹配的第二个数据框中的相应 TBA 替换第一个数据框中的 TBA。 The default value would be 0. So I am iterating through rows as follows:
默认值为 0。所以我按如下方式遍历行:
for i, row1 in tdata.iterrows():
for j, row2 in pdata.iterrows():
if row1['date_1'] == row2['date_2']:
tdata.loc[i, 'TBA'] = row2['TBA']
break
else:
tdata.loc[i, 'TBA'] = 0
Problem is this takes very long (around 11 minutes).问题是这需要很长时间(大约 11 分钟)。 I want to compare one csv with 160 other csv and further run some tree based models.
我想将一个 csv 与 160 个其他 csv 进行比较,并进一步运行一些基于树的模型。 I am a newbee with little coding background!
我是一个没有编码背景的新手! Pardon me if this is a 'dirty' way.
如果这是一种“肮脏”的方式,请原谅我。 Any help would be appreciated.
任何帮助,将不胜感激。 Thanks!
谢谢!
If you call set_index
on pdata
to date_2
then you can pass this as the param to map
and call this on tdata['date_1']
column and then fillna
:如果你调用
set_index
上pdata
到date_2
,那么你可以通过这个作为参数去map
并称之为对tdata['date_1']
列,然后fillna
:
In [51]:
tdata['TBA'] = tdata['date_1'].map(pdata.set_index('date_2')['TBA'])
tdata.fillna(0, inplace=True)
tdata
Out[51]:
TBA date_1
0 0 2010-01-04
1 2 2010-01-05
2 0 2010-01-06
3 0 2010-01-07
4 0 2010-01-08
5 5 2010-01-09
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.