简体   繁体   English

了解CPI和缓存访问

[英]understanding CPI and cache access

These are previous homework problems, but I am using them as exam review. 这些是以前的作业问题,但是我将它们用作考试复习。 I am changing numbers around from what is actually in the problem. 我正在改变问题的实质。 I just want to make sure I have a grasp on the concepts. 我只想确保我对概念有所了解。 I already have the answers, just need clarification that I understand them. 我已经有了答案,只需澄清一下就可以理解。 This is not homework but review work. 这不是家庭作业,而是复习工作。

Anyway, this focuses on aspects of CPI 无论如何,这集中在CPI方面

The fist problem: 拳头问题:

An application running on a 1GHz processor has 30% load-store instructions, 30% arithmetic, and 40% branch instructions. 在1GHz处理器上运行的应用程序具有30%的负载存储指令,30%的算术运算和40%的分支指令。 The individual CPIs are 3 for load-store, 4 for arithmetic, 5 for branch instructions. 单个CPI用于装载存储的是3,用于算术的是4,对于分支指令是5。 Determine the overall CPI of this program on the given processor. 确定给定处理器上该程序的总体CPI。

My answer: The overall CPI is the sum of the sub-CPIs, multiplied by the percentages in which they occur ie 3*0.3 + 4*0.3 + 5*0.4 = 0.9 + 1.2 + 2 = 4.1 我的回答:总体CPI是子CPI的总和乘以它们出现的百分比,即3 * 0.3 + 4 * 0.3 + 5 * 0.4 = 0.9 + 1.2 + 2 = 4.1

Now, the processor is enhanced to run at 1.6GHz. 现在,该处理器已经增强,可以在1.6GHz下运行。 The CPIs of the branch instructions remain the same but load-store and arithmetic instruction CPIs both increase to 6 cycles. 分支指令的CPI保持不变,但加载存储和算术指令CPI均增加到6个周期。 A new compiler is in use which eliminates 30% of branch instructions and 10% of load-stores. 使用了一种新的编译器,它消除了30%的分支指令和10%的负载存储。 Determine the new overall CPI and the factor by which the application will be faster or slower. 确定新的总体CPI以及应用程序变快或变慢的因素。

My answer: Once again, the new CPI is just the sum of its parts. 我的回答:再次,新的CPI只是其各个部分的总和。 However, the parts have changed and this must be accounted for. 但是,零件已更改,这必须予以考虑。 Branch instructions will drop by 30% (0.4*0.7=0.28) and load-stores will drop by 10% (0.3*0.9=0.27); 分支指令将下降30%(0.4 * 0.7 = 0.28),而负载存储将下降10%(0.3 * 0.9 = 0.27); arithmetic instructions will now account for the rest of the instructions (1-0.28-0.27=0.45), or 45%. 算术指令现在将占其余指令(1-0.28-0.27 = 0.45)或45%。 These will be multiplied by the new sub-CPIs to get: 6*0.45+6*0.27+5*0.28=5.72. 这些将乘以新的子CPI得出:6 * 0.45 + 6 * 0.27 + 5 * 0.28 = 5.72。

Now, the processor enhancement is 60% faster, and the CPI is greater by (5.72-4.1)/4.1 = 39.5%. 现在,处理器增强速度提高了60%,CPI则提高了(5.72-4.1)/4.1 = 39.5%。 Thus, the application will run roughly 0.6*0.395 = 23.7% faster. 因此,该应用程序将以大约0.6 * 0.395 = 23.7%的速度运行。

Now, the second problem: 现在,第二个问题:

A new processor with a load/store architecture has an ideal CPI of 1.25. 具有加载/存储架构的新处理器的理想CPI为1.25。 Typical applications on this processor are a mix of 50% arithmetic and logic, 25% conditional branching and 25% load/store. 该处理器的典型应用是50%的算术和逻辑,25%的条件分支和25%的加载/存储的混合。 Memory is accessed via a separate data and instruction cache, with a 5% instruction cache miss rate and 10% data miss rate. 存储器通过单独的数据和指令高速缓存访​​问,指令高速缓存未命中率为5%,数据未命中率为10%。 The penalty of any cache miss is 100 cycles and hits don't produce any penalties. 任何缓存未命中的惩罚是100个周期,命中不会产生任何惩罚。

What is the effective CPI? 什么是有效CPI?

My answer: The effective CPI is the ideal CPI, plus the stalled cycles per instruction due to cache access. 我的答案:有效CPI是理想的CPI,再加上由于高速缓存访​​问而导致的每条指令的停顿周期。 The ideal CPI is, as given, 1.25. 给定的理想CPI为1.25。 The stalled cycles per instruction is (0.1*100*0.25) + (0.05*100*1) = 7.5. 每个指令的停顿周期为(0.1 * 100 * 0.25)+(0.05 * 100 * 1)= 7.5。 0.1*100*0.25 is the data miss rate multiplied by the stalled cycle penalty which is also multiplied by the load/store percentage (which is where the data accesses take place); 0.1 * 100 * 0.25是数据丢失率乘以停顿的周期损失,也乘以加载/存储百分比(在该位置进行数据访问); 0.05*100*1 is the instruction miss rate, which is the instruction cache miss rate times the stalled cycle penalty, instruction access take place in 100% of the program, so this is multiplied by 1. Following from this, the effective CPI is 1.25 + 7.5 = 8.75. 0.05 * 100 * 1是指令未命中率,它是指令高速缓存未命中率乘以停顿的周期惩罚,指令访问发生在程序的100%中,因此将其乘以1。 1.25 + 7.5 = 8.75。

What is the misses per 1000 instruction for typical applications and what is the average memory access time (in clock cycles) for typical applications? 对于典型应用,每1000条指令的未命中次数是多少?对于典型应用,平均存储器访问时间(以时钟周期为单位)是多少?

My answers: The misses per 1000 instructions is equal to the stalled cycles per instruction due to cache access (as given above: 7.5), divided by 1000, which equals 7.5/1000 = 0.0075 我的答案:每1000条指令的未命中次数等于由于高速缓存访​​问而导致的每条指令的停顿周期(如上所示:7.5)除以1000,等于7.5 / 1000 = 0.0075

When discussing the average memory access time (AMAT), we first must talk about the total number of accesses here, which is the percentage of data accesses (25%) plus the percentage of instruction accesses (100%), or 125%=1.25. 在讨论平均内存访问时间(AMAT)时,我们首先必须在这里讨论访问总数,即数据访问百分比(25%)加上指令访问百分比(100%),即125%= 1.25 。 The data accesses are .25/1.25 and the instruction accesses are 1/1.25. 数据访问为.25 / 1.25,指令访问为1 / 1.25。

The AMAT equals the percentage of data accesses (.25/1.25) multiplied by the sum of the hit time (1) and the data miss rate multiplied by the miss penalty (0.1*100), or (.25/1.25)(1+0.1*100) and this is added to the percentage of instruction accesses (1/1.25) multiplied by the sum of the hit time (1) and the instruction miss rate multiplied by the miss penalty (0.05*100), or (1/1.25)(1+0.05*100). AMAT等于数据访问的百分比(.25 / 1.25)乘以命中时间(1)和数据丢失率的总和乘以未命中罚款(0.1 * 100)或(.25 / 1.25)(1 + 0.1 * 100),并将其加到指令访问的百分比(1 / 1.25)乘以命中时间(1)和指令未命中率的总和乘以未命中罚分(0.05 * 100),或(1 /1.25)(1+0.05*100)。 Put together, the AMAT is (.25/1.25)(1+0.1*100)+(1/1.25)(1+0.05*100)=7. 放在一起,AMAT为(.25 / 1.25)(1 + 0.1 * 100)+(1 / 1.25)(1 + 0.05 * 100)= 7。

Once again, sorry for the wall of text. 再次致歉,谢谢。 If I am wrong, please try to help me understand how I am wrong. 如果我错了,请大家尽量帮助我了解如何 ,我错了。 I tried to show all my work to make it as easy as possible to understand. 我试图展示我的所有作品,以使其尽可能地易于理解。 Thanks in advance. 提前致谢。

There's an error in the lat part of your question. 问题的最后部分有误。 When they ask: 当他们问:

What is the misses per 1000 instruction for typical applications and what is the average memory
access time (in clock cycles) for typical applications?

what's needed here is the number of misses you will get for every 1000 instructions, which in this case would be 1000*1*0.05 for instruction cache misses and 1000*0.25*0.1 for data cache misses. 这里需要的是每1000条指令的未命中数,在这种情况下,指令高速缓存未命中将为1000 * 1 * 0.05,而数据高速缓存未命中将为1000 * 0.25 * 0.1。 This equals 75 misses per 1000 instructions. 这等于每1000条指令有75次未命中。

To calculate the AMAT, you use the formula AMAT = hit time + (miss rate*miss penalty) 要计算AMAT,请使用公式AMAT =命中时间+(失效率*失罚)

In this case, your miss rate is 75/1000 and your miss penalty is 100 cycles. 在这种情况下,您的未命中率为75/1000,而您的未命中罚款为100个周期。 The hit time is given as 1.25 cycles (your ideal CPI!). 命中时间为1.25个周期(您的理想CPI!)。

Hope this helps and all the best for your exam! 希望这对您有所帮助,并祝您考试愉快!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM