[英]Alternatives to loop and apply in Pandas
I have two data frames one is 80,000 rows long 37 columns wide (dfa), the other is 90 rows long (dfb).我有两个数据框,一个是 80,000 行长 37 列宽 (dfa),另一个是 90 行长 (dfb)。 I need to screen the 90 row values in dfb for a value that matches a row in dfa, and then append a value from dfb to an index in dfa.
我需要筛选 dfb 中的 90 行值以查找与 dfa 中的行匹配的值,然后 append 将 dfb 中的值筛选到 dfa 中的索引。
However, there are 10 columns in dfa that need to be compared to, and my current code is working but extremely slow.但是,需要比较 dfa 中的 10 列,我当前的代码正在运行,但速度极慢。
I have tried making vectorizing the data but I had the same problem of speed, and I am unaware of how to use the apply function in this situation as well so I have just been using.iterrows().我曾尝试对数据进行矢量化,但我遇到了同样的速度问题,而且我不知道如何在这种情况下使用 apply function,所以我一直在使用.iterrows()。 I have thought about trying parallel computing but I feel like trying to implement that will be another nightmare.
我曾想过尝试并行计算,但我觉得尝试实现这将是另一场噩梦。
I am open to any suggestions on how to try and speed this code up我愿意接受有关如何尝试加快此代码速度的任何建议
def add_scale():
x = 1
for i in dfa.iterrows():
for f in dfb.iterrows():
if i[1][2] == f[1][0]:
dfa.iloc[x, 27] = f[1][8]
if i[1][4] == f[1][0]:
dfa.iloc[x, 28] = f[1][8]
if i[1][6] == f[1][0]:
dfa.iloc[x, 29] = f[1][8]
if i[1][8] == f[1][0]:
dfa.iloc[x, 30] = f[1][8]
if i[1][10] == f[1][0]:
dfa.iloc[x, 31] = f[1][8]
if i[1][12] == f[1][0]:
dfa.iloc[x, 32] = f[1][8]
if i[1][14] == f[1][0]:
dfa.iloc[x, 33] = f[1][8]
if i[1][16] == f[1][0]:
dfa.iloc[x, 34] = f[1][8]
if i[1][18] == f[1][0]:
dfa.iloc[x, 35] = f[1][8]
if i[1][20] == f[1][0]:
dfa.iloc[x, 36] = f[1][8]
x += 1
print(x)
The head of dfa. dfa的负责人。
id
2d11489f-919c-436d-8e7d-e25df44d9dfb
a747fe55-7bb0-4877-b080-9a3f89855c02
9688cb3c-57a0-4e23-b10b-c674e346cce5
a042f8e6-d433-4229-8b6b-304a1c14df98
fe4918d7-6e23-4605-8158-e5a89afc0614
item_1 quantity_1 \
c2f332de-1cdb-43ce-9cb1-f61a06e51d65 1
6ebaafde-8652-4fb8-bea2-a08d661bd56b 1
063b51ab-a8b8-4714-8adc-992d507fd222 2
b6ab20b1-be59-4592-9447-d12fc7a4f405 1
9bdd10b9-2356-494c-958f-04a35514178e 1
item_2 quantity_2 \
1f672f37-50d9-40ff-a063-122bdcd7da2a 1.0
16c36c7a-a6b0-4aca-9f6e-a178074dc15e 1.0
e2341b46-b323-4b41-9865-cbf1625ee810 3.0
c34eab5c-1772-422c-8773-e00c10b10b1c 1.0
4e720d54-fbb0-4c9d-bb99-dc2b17004bf2 1.0
item_3 quantity_3 \
NaN NaN
33671e62-f1d4-4284-b08b-1e4813b9cb4c 3.0
2192e8c2-c66f-4650-9f6e-b5b12a2e8587 1.0
60fddb6f-c6a3-41e1-90ed-febdd13ffbdf 1.0
9493337b-8843-40fe-b97f-4cca3b687ebc 2.0
item_4 quantity_4 \
NaN NaN
e2341b46-b323-4b41-9865-cbf1625ee810 2.0
b6ab20b1-be59-4592-9447-d12fc7a4f405 1.0
NaN NaN
b6ab20b1-be59-4592-9447-d12fc7a4f405 3.0
item_5 quantity_5 \
NaN NaN
257db03b-3711-4e98-9b8f-68890d433a18 1.0
7d2fc54e-c92e-47e4-830e-c434cdd70ffc 1.0
NaN NaN
NaN NaN
item_6 quantity_6 \
NaN NaN
9493337b-8843-40fe-b97f-4cca3b687ebc 3.0
c34eab5c-1772-422c-8773-e00c10b10b1c 2.0
NaN NaN
NaN NaN
item_7 quantity_7 \
NaN NaN
3b10f6f0-6412-4366-bd8d-483c88368511 1.0
d9962506-1685-4502-b3f1-4c885eeb5457 1.0
NaN NaN
NaN NaN
item_8 quantity_8 \
NaN NaN
75929f5e-f3fb-42b1-9e71-0aed7f8e5066 3.0
1f7b73b1-c995-46cf-9781-ed1bcc336345 1.0
NaN NaN
NaN NaN
item_9 quantity_9 \
NaN NaN
819069be-ef26-4670-aa2c-321c53ed6c94 1.0
a708ab2d-b79e-4577-80fb-a30fa155445f 1.0
NaN NaN
NaN NaN
item_10 quantity_10 datetime \
NaN NaN 2019-09-03 10:56
NaN NaN 2019-09-04 21:59
1a747199-994e-4e82-a8ea-1fbc1029256c 1.0 2019-09-04 12:50
NaN NaN 2019-09-05 20:48
NaN NaN 2019-09-05 14:06
food_prep_time_minutes minutes.inday days item_val it1 it2 it3 it4 \
13 23 2 2 NaN NaN NaN NaN
38 45 3 9 NaN NaN NaN NaN
24 27 3 0 NaN NaN NaN NaN
14 43 4 3 NaN NaN NaN NaN
25 29 4 4 NaN NaN NaN NaN
it5 it6 it7 it8 it9 it10
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
head of dfb, unused columns removed dfb 的负责人,未使用的列已删除
item_id scale
0 5445da62-e213-4f71-b2b0-8b6073647102 3
1 16c36c7a-a6b0-4aca-9f6e-a178074dc15e 1
2 a708ab2d-b79e-4577-80fb-a30fa155445f 1
3 024f545f-a8af-4244-8c9e-da2b92633d59 2
4 8e3e855c-918c-4761-b2d6-0f4aae1c5e0d 3
I have two data frames one is 80,000 rows long 37 columns wide (dfa), the other is 90 rows long (dfb).我有两个数据框,一个是 80,000 行长 37 列宽 (dfa),另一个是 90 行长 (dfb)。 I need to screen the 90 row values in dfb for a value that matches a row in dfa, and then append a value from dfb to an index in dfa.
我需要筛选 dfb 中的 90 行值以查找与 dfa 中的行匹配的值,然后 append 将 dfb 中的值筛选到 dfa 中的索引。
However, there are 10 columns in dfa that need to be compared to, and my current code is working but extremely slow.但是,需要比较 dfa 中的 10 列,我当前的代码正在运行,但速度极慢。
I have tried making vectorizing the data but I had the same problem of speed, and I am unaware of how to use the apply function in this situation as well so I have just been using.iterrows().我曾尝试对数据进行矢量化,但我遇到了同样的速度问题,而且我不知道如何在这种情况下使用 apply function,所以我一直在使用.iterrows()。 I have thought about trying parallel computing but I feel like trying to implement that will be another nightmare.
我曾想过尝试并行计算,但我觉得尝试实现这将是另一场噩梦。
I am open to any suggestions on how to try and speed this code up我愿意接受有关如何尝试加快此代码速度的任何建议
def add_scale():
x = 1
for i in dfa.iterrows():
for f in dfb.iterrows():
if i[1][2] == f[1][0]:
dfa.iloc[x, 27] = f[1][8]
if i[1][4] == f[1][0]:
dfa.iloc[x, 28] = f[1][8]
if i[1][6] == f[1][0]:
dfa.iloc[x, 29] = f[1][8]
if i[1][8] == f[1][0]:
dfa.iloc[x, 30] = f[1][8]
if i[1][10] == f[1][0]:
dfa.iloc[x, 31] = f[1][8]
if i[1][12] == f[1][0]:
dfa.iloc[x, 32] = f[1][8]
if i[1][14] == f[1][0]:
dfa.iloc[x, 33] = f[1][8]
if i[1][16] == f[1][0]:
dfa.iloc[x, 34] = f[1][8]
if i[1][18] == f[1][0]:
dfa.iloc[x, 35] = f[1][8]
if i[1][20] == f[1][0]:
dfa.iloc[x, 36] = f[1][8]
x += 1
print(x)
The head of dfa. dfa的负责人。
id
2d11489f-919c-436d-8e7d-e25df44d9dfb
a747fe55-7bb0-4877-b080-9a3f89855c02
9688cb3c-57a0-4e23-b10b-c674e346cce5
a042f8e6-d433-4229-8b6b-304a1c14df98
fe4918d7-6e23-4605-8158-e5a89afc0614
item_1 quantity_1 \
c2f332de-1cdb-43ce-9cb1-f61a06e51d65 1
6ebaafde-8652-4fb8-bea2-a08d661bd56b 1
063b51ab-a8b8-4714-8adc-992d507fd222 2
b6ab20b1-be59-4592-9447-d12fc7a4f405 1
9bdd10b9-2356-494c-958f-04a35514178e 1
item_2 quantity_2 \
1f672f37-50d9-40ff-a063-122bdcd7da2a 1.0
16c36c7a-a6b0-4aca-9f6e-a178074dc15e 1.0
e2341b46-b323-4b41-9865-cbf1625ee810 3.0
c34eab5c-1772-422c-8773-e00c10b10b1c 1.0
4e720d54-fbb0-4c9d-bb99-dc2b17004bf2 1.0
item_3 quantity_3 \
NaN NaN
33671e62-f1d4-4284-b08b-1e4813b9cb4c 3.0
2192e8c2-c66f-4650-9f6e-b5b12a2e8587 1.0
60fddb6f-c6a3-41e1-90ed-febdd13ffbdf 1.0
9493337b-8843-40fe-b97f-4cca3b687ebc 2.0
item_4 quantity_4 \
NaN NaN
e2341b46-b323-4b41-9865-cbf1625ee810 2.0
b6ab20b1-be59-4592-9447-d12fc7a4f405 1.0
NaN NaN
b6ab20b1-be59-4592-9447-d12fc7a4f405 3.0
item_5 quantity_5 \
NaN NaN
257db03b-3711-4e98-9b8f-68890d433a18 1.0
7d2fc54e-c92e-47e4-830e-c434cdd70ffc 1.0
NaN NaN
NaN NaN
item_6 quantity_6 \
NaN NaN
9493337b-8843-40fe-b97f-4cca3b687ebc 3.0
c34eab5c-1772-422c-8773-e00c10b10b1c 2.0
NaN NaN
NaN NaN
item_7 quantity_7 \
NaN NaN
3b10f6f0-6412-4366-bd8d-483c88368511 1.0
d9962506-1685-4502-b3f1-4c885eeb5457 1.0
NaN NaN
NaN NaN
item_8 quantity_8 \
NaN NaN
75929f5e-f3fb-42b1-9e71-0aed7f8e5066 3.0
1f7b73b1-c995-46cf-9781-ed1bcc336345 1.0
NaN NaN
NaN NaN
item_9 quantity_9 \
NaN NaN
819069be-ef26-4670-aa2c-321c53ed6c94 1.0
a708ab2d-b79e-4577-80fb-a30fa155445f 1.0
NaN NaN
NaN NaN
item_10 quantity_10 datetime \
NaN NaN 2019-09-03 10:56
NaN NaN 2019-09-04 21:59
1a747199-994e-4e82-a8ea-1fbc1029256c 1.0 2019-09-04 12:50
NaN NaN 2019-09-05 20:48
NaN NaN 2019-09-05 14:06
food_prep_time_minutes minutes.inday days item_val it1 it2 it3 it4 \
13 23 2 2 NaN NaN NaN NaN
38 45 3 9 NaN NaN NaN NaN
24 27 3 0 NaN NaN NaN NaN
14 43 4 3 NaN NaN NaN NaN
25 29 4 4 NaN NaN NaN NaN
it5 it6 it7 it8 it9 it10
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN
head of dfb, unused columns removed dfb 的负责人,未使用的列已删除
item_id scale
0 5445da62-e213-4f71-b2b0-8b6073647102 3
1 16c36c7a-a6b0-4aca-9f6e-a178074dc15e 1
2 a708ab2d-b79e-4577-80fb-a30fa155445f 1
3 024f545f-a8af-4244-8c9e-da2b92633d59 2
4 8e3e855c-918c-4761-b2d6-0f4aae1c5e0d 3
Well I feel stupid, @furas is correct.好吧,我觉得很愚蠢,@furas 是正确的。 I removed the additional loops, is there still a way to do this without loops though?
我删除了额外的循环,还有没有循环的方法呢?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.