使用C和內聯匯編程序查找數組中最大的浮點數

Question

我是C和匯編程序的新手（所有編程，真的），這真的困擾了我一兩天。

這是我的任務（我已經完成了4.這是額外的功勞，我遇到了問題。）：

對於每個問題，您必須編寫一個包含三個部分的C（而不是C ++）程序：

A.一些用於讀取輸入的C代碼（使用scanf）。

B.用於進行計算的內聯匯編程序段。

C.編寫輸出的一些C代碼（使用printf）。

maxi.c：讀取一個計數n，然后將一個n個整數的列表讀入一個數組（使用malloc分配），然后使用匯編程序找出所有整數中最大的整數，然后輸出它。 您將需要像jg這樣的條件跳轉操作碼。 要使用這樣的操作碼，通常首先使用像dec或sub這樣的操作碼來設置標志寄存器中的位。 然后，如果前一個操作的結果大於零，您將使用jg跳轉到某處（例如循環的頂部）。 您還可以使用cmp操作碼設置標志寄存器。 您還需要使用基本位移模式在數組中訪問：mov eax，[ebx + ecx * 4]。

額外信用：5。maxf.c：與上面相同，但使用浮點數而不是整數。

這就是我現在所擁有的。 當我輸入列表時，它會輸出列表中的第一個數字，無論它是否是最大數字。

// maxi.cpp : Defines the entry point for the console application.
//

#include "stdafx.h"
#include "malloc.h"

int n;     // length of list
float *nums; // the list
int max;   // the result

int main()
{
    int i;
    float arr[10];
    printf("How many integers?  ");
    scanf_s("%d", &n);
    nums = (float*)malloc(n * sizeof(float));
    for (i = 0; i<n; i++)
    {
        printf("Enter next number:  ");
        scanf_s("%d", &arr[i]);
    }

    __asm
    {
        mov eax, n;    // A is the count
        dec eax;       // A becomes the end of the list
        mov ebx, nums; // B is the beginning of the list
        mov ecx, arr;    // C is the current largest integer

    top:
        cmp eax, 0;
        jl done;

        mov edx, [ebx + eax * 4];
        cmp edx, ecx;
        jg Change;
        jmp Increment;

    Increment:
        //mov eax,[ebx+ecx*4];
        dec eax;
        jmp top;

    Change:
        mov ecx, edx;
        jmp Increment;
    done:
        mov max, ecx;
    }
    printf("The largest integer in your list is: %d\n", max);
    return 0;
}

Answer 1

我認為你混淆了兩個變量arr和nums。 您將nums分配給正確的大小，但是您將數字讀入arr，預先分配為保存10個數字。 首先修復此問題，然后查看裝配材料。

Answer 2

你在輸入實際的浮點數嗎？

如果你這樣做，第一個scanf_s("%d", &arr[i]); 將解析第一個數字的整數部分，每個后續調用都將失敗，因為'.' 不是整數的一部分。 由於您不測試scanf_s的返回值， scanf_s您將忽略此故障，並且該數組將包含第一個條目之外的未確定值。

程序集從arr初始化ecx ，第一個值，並且循環使用nums而不是arr來與其他值進行比較，同樣不確定值，因為malloc沒有初始化它返回的內存...很可能它全部為零，但不確定。

匯編循環看起來效率低，但對我來說並不正確，但由於數組包含不確定的值，因此max的值可能是第一個，如果偶然的話，數組的其余部分包含零。

請注意，您在scanf_s和匯編循環scanf_s arr的內容視為整數，因此您獲得的結果與將其聲明為int結果相同。

Answer 3

你在asm中跳躍的結構非常難看，而且效率會更高。 jmp Increment完全是浪費，因為跳轉和標簽之間沒有指令。 請記住，即使您在源中留下一行空格，執行也會自行進行到下一條指令！

int n;
// scanf..
int *nums = malloc(n * sizeof(int));
// loop scanf to set nums[0..n-1]
__asm
{
    // MSVC inline asm sucks, and forces you to write load instructions instead of letting you ask for things to already be in registers.
    //  Nothing you can do about that, other than write the whole function in asm yourself.
    mov   ecx, [n];    // ECX is the count (and array index)
    mov   edx, [nums]; // load the pointer.  ESI is the conventional choice for source pointers, but we haven't yet run out of registers that can be used without saving
                // If you wanted to get the address of a local array, IDK if you'd use OFFSET in MSVC, or what.
    mov   eax, [edx + ecx*4 - 4];  // Dereference the pointer to get the last list element (which we used to check first).  The other option is to start with the most-negative possible integer, like you'd initialize a sum=0 instead of loading the first element.

    sub   ecx, 2;  // start from the 2nd-last list element.
    jle  .done;    // or leave this out if you can assume n >= 2

.top:    // use local labels so they don't show up in the symbol table.
    cmp   eax, [edx + ecx*4];  // it is actually more efficient to compare with memory directly here.  It saves an instruction, and unless the list is mostly increasing, the nochange branch will usually be taken.
    jng .nochange;
    mov   eax, [edx + ecx*4];  // skipped if the current isn't > max
.nochange:
    dec   ecx; // put the loop condition at the bottom to save branch insns
    jge .top;
    // jnz is a common choice, but this avoids a long loop if you accidentally call with n = 0, even if we didn't check for this at the top.

.done:
    mov   [max], eax;
}

但是，將cmp與內存操作數一起使用可能實際上對Intel SnB系列CPU沒有幫助。 在這種情況下，加載到臨時寄存器實際上更好。

為了加快常見情況，即nums[ecx] <= max ，你可以構造一些東西，這是不采取的情況。 在這個案例中，你會跳到

.change:
    mov   eax, [edx + ecx*4];
    jmp .back_into_loop;    // to the same place as .nochange is in the above version

通常你會在函數中的ret指令之后放置這個塊，所以你以后不必跳過它。 MSVC內聯asm阻止了這一點，但你可以把它放在循環之外。

對於float，使無MAXSS非常簡單和高效：使用MAXSS指令。 或者，矢量化並使用MAXPS 。 更好的是，使用3或4個累加器寄存器來重疊MAXPS的延遲，因為3c延遲但每1c吞吐量一個，在Intel Haswell上可以同時運行3個。

我沒有使用cmov在整數循環中執行等效操作，因為在Broadwell之前，它是2uop指令，具有2個周期延遲，並且循環攜帶數據依賴性將限制循環以每2個時鍾一次迭代運行，而不是每個時鍾一個。

SSE4.1 PMAXSD執行最大有符號雙字整數，延遲為1c（每0.5c吞吐量為1，因此兩個累加器可能會使p1和p5飽和，以及兩個加載端口。）同樣，來自http：// agner的 Haswell數字.org / optimize / 。

對於像10的測試用例這樣的微小數組顯然，幾乎沒有任何矢量化的范圍，而且大部分工作都是在最后進行水平最大化。

使用C和內聯匯編程序查找數組中最大的浮點數

問題描述

3 個解決方案

解決方案1
3 已采納 2016-01-21 20:43:49

解決方案2
3 2016-01-21 21:16:12

解決方案3
2 2016-01-22 03:01:31

使用C和內聯匯編程序查找數組中最大的浮點數

問題描述

3 個解決方案

解決方案1 3 已采納 2016-01-21 20:43:49

解決方案2 3 2016-01-21 21:16:12

解決方案3 2 2016-01-22 03:01:31

解決方案1
3 已采納 2016-01-21 20:43:49

解決方案2
3 2016-01-21 21:16:12

解決方案3
2 2016-01-22 03:01:31