簡體   English   中英

如何使用合並排序對具有相同姓氏的名稱進行排序?

[英]How do I use mergesort to sort the names that have the same surname?

我們的導師給了我們一個 csv 文件,我們應該根據他們的姓氏按字母順序對名字進行排序。 但是有些名字具有相同的姓氏,我的代碼僅適用於他們的姓氏。 當他們有相同的姓氏時,我不知道要添加什么以按他們的名字對他們進行排序。

這是我的代碼:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define people 11

struct list_people {
    char FirstName[20];
    char LastName[20];
    char Name[20];
    char Age[5];
};

typedef struct list_people Details;

int merge_sort();
int merge(Details *[11], int, int, int);

int main() {
    FILE *info;
    int i, j;
    Details A[people];
    Details *cell[people];
    
    for (i = 0; i < (people + 1); i++) {
       cell[i] = &A[i];
    }
    
    info = fopen("people.csv", "r");
    for (i = 0; i < people; i++) {
       fscanf(info," %[^,], %[^,], %8s", A[i].LastName, A[i].FirstName, A[i].Age);
    }
    fclose(info);
    
    merge_sort(cell, 1, people - 1);
    for (i = 1; i < people; i ++) {
        printf( "\t %-20s %-20s Age:%-20s \n", A[i].FirstName, A[i].LastName, A[i].Age);
    }
}
    
int merge_sort(Details *A[], int low, int high) {
    int mid;
    if (low < high) {
        mid = (low + high) / 2;
        merge_sort(A, low, mid);
        merge_sort(A, mid + 1, high);
        merge(A, low, mid, high);
    }
    return 0;
}
    
int merge(Details *A[], int low, int mid, int high) {
    int leftIndex = low;
    int rightIndex = mid + 1;
    int combinedIndex = low;
    int i, j;
    Details tempA[people];
    
    while (leftIndex <= mid && rightIndex <= high) {
        if (strcasecmp((A[leftIndex]->LastName), (A[rightIndex]->LastName)) <= 0) {
            tempA[combinedIndex] = *(*(A + leftIndex));
            combinedIndex++;
            leftIndex++;
        } else {
            tempA[combinedIndex] = *(*(A + rightIndex));
            combinedIndex++;
            rightIndex++;
        }
    }
    if (leftIndex == mid + 1) {
        while (rightIndex <= high) {
            tempA[combinedIndex] = *(*(A + rightIndex));
            combinedIndex++;
            rightIndex++;
        }
    } else {
        while (leftIndex <= mid) {
            tempA[combinedIndex] = *(*(A + leftIndex));
            combinedIndex++;
            leftIndex++;
        }
    }
    
    for (i = low; i <= high; i++) {
       *(*(A + i)) = tempA[i];
    }
    return 0;
}

您可以使用以下方法比較元素:

        int cmp;
        // compare the last names
        cmp = strcasecmp( ( A[leftIndex]->LastName), ( A[rightIndex]->LastName) );
        if (cmp == 0)
        {
            // last names are identical so compare the first names
            cmp = strcasecmp( ( A[leftIndex]->FirstName), ( A[rightIndex]->FirstName) );
        }
        if (cmp <= 0)
        {
            // ...
        }
        else
        {
            // ...
        }

代碼中存在多個問題:

  • 數組大小為people ,因此初始化循環應排除索引值people 而不是for (i = 0; i < (people + 1); i++)你應該寫:

     for (i = 0; i < people; i++)
  • 為避免緩沖區溢出,您應該在fscanf()為每個目標數組指定要讀取的最大字節數:

     fscanf(info," %19[^,], %19[^,], %4s", A[i].LastName, A[i].FirstName, A[i].Age);

    您還應該檢查fscanf()的返回值以檢測無效輸入。

  • 要訂購具有相同姓氏的項目,您應該使用比較函數並比較LastNameFirstNameAge ,返回第一個不相等的結果。

下面是一個例子:

int compareDetails(const Details *a, const Details *b) {
    int res, age_a, age_b;
    if ((res = strcasecmp(a->LastName, b->LastName)) != 0)
        return res;
    if ((res = strcasecmp(a->FirstName, b->FirstName)) != 0)
        return res;
    age_a = atoi(a->Age);
    age_b = atoi(a->Age);
    return (age_a > age_b) - (age_a < age_b);
}

數組在 C 中是基於零的,因此索引值應該從0開始。 在 C 中指定包含第一個索引並排除最后一個索引的數組切片也是慣用的。 這允許空切片並使merge_sort代碼更簡單,避免容易出錯的+1 / -1調整。

cell指針數組是多余的,因為您將Details數組就地排序。 您可以通過以下方式簡化代碼:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define PEOPLE 11

struct list_people {
    char FirstName[20];
    char LastName[20];
    char Name[20];
    char Age[5];
};

typedef struct list_people Details;

int merge_sort(Details A[], int low, int high);

int main() {
    FILE *info;
    int i, j, n;
    Details A[PEOPLE];
    
    info = fopen("people.csv", "r");
    if (info == NULL)
        return 1;

    for (n = 0; n < PEOPLE; n++) {
       if (fscanf(info," %19[^,], %19[^,], %4s", A[n].LastName, A[n].FirstName, A[n].Age) != 3)
           break;
    }
    fclose(info);
    
    merge_sort(A, 0, n);
    for (i = 0; i < n; i++) {
        printf( "\t %-20s %-20s Age:%-20s \n", A[i].FirstName, A[i].LastName, A[i].Age);
    }
    return 0;
}
    
int compareDetails(const Details *a, const Details *b) {
    int res, age_a, age_b;

    if ((res = strcasecmp(a->LastName, b->LastName)) != 0)
        return res;
    if ((res = strcasecmp(a->FirstName, b->FirstName)) != 0)
        return res;
    age_a = atoi(a->Age);
    age_b = atoi(a->Age);
    return (age_a > age_b) - (age_a < age_b);
}

void merge(Details A[], int low, int mid, int high) {
    int leftIndex = low;
    int rightIndex = mid;
    int combinedIndex = 0;
    int i, j;
    Details tempA[high - low];
    
    while (leftIndex < mid && rightIndex < high) {
        if (compareDetails(&A[leftIndex], &A[rightIndex]) <= 0) {
            tempA[combinedIndex] = A[leftIndex];
            combinedIndex++;
            leftIndex++;
        } else {
            tempA[combinedIndex] = A[rightIndex];
            combinedIndex++;
            rightIndex++;
        }
    }
    while (leftIndex < mid) {
        tempA[combinedIndex] = A[leftIndex];
        combinedIndex++;
        leftIndex++;
    }
    while (rightIndex < high) {
        tempA[combinedIndex] = A[rightIndex];
        combinedIndex++;
        rightIndex++;
    }
    for (i = low; i < high; i++) {
        A[i] = tempA[i - low];
    }
}

int merge_sort(Details A[], int low, int high) {
    if (high - low >= 2) {
        int mid = low + (high - low) / 2;
        merge_sort(A, low, mid);
        merge_sort(A, mid, high);
        merge(A, low, mid, high);
    }
    return 0;
}

如果您有兩個相同的姓氏,那么您需要與另一個屬性進行比較並使用來確定排序的方式。

首先,為了讓事情更清晰一些,我們strcasecmp條件表達式中取出對strcasecmp的調用,並將它們的結果保存到某個變量中:

while ( leftIndex <= mid  &&  rightIndex <= high )
{
  int lastNameResult = strcasecmp( A[leftIndex]->LastName, A[rightIndex]->LastName );

然后作為第一遍,我們將把<= 0情況分成兩個單獨的情況:

  if ( lastNameResult < 0 )
  {
    // left < right, process as before
  }
  else if ( lastNameResult == 0 )
  {
    // new case to handle same last names
  }
  else
  {
    // left > right, process as before 
  }

現在在這個新案例中,我們需要與第二個屬性進行比較; 通常的選擇是FirstName

  ...
  else if ( lastNameResult == 0 )
  {
    int firstNameResult = strcasecmp( A[leftIndex]->FirstName, A[rightIndex]->FirstName );

    if ( firstNameResult < 0 )
    {
      // left < right, handle the same way you did for LastName
    }
    else if ( firstNameResult == 0 )
    {
      // left == right, need to compare against another attribute
    }
    else
    {
      // left > right, handle the same way you did for LastName
    }
  }
  ...

如果您遇到兩個具有相同名字的條目,那么您需要選擇另一個屬性進行比較(例如Age ),並將其放入else if ( firstNameResult == 0 )分支中。 或者,您可以合並<==情況,讓重復的條目落在它們所在的位置。

顯然,這種方法不能很好地擴展多個屬性,並且您最終會復制代碼。 更好的方法是將比較與合並分離。 添加一個新變量,我們將其mergeLeft ,如果您需要從左側列表合並,則將其設置為 true(非零),如果您需要從右側合並,則設置為 false ( 0 )。 我們可以在不需要使用整個 if-else 鏈的情況下進行計算:

while ( leftIndex <= mid  &&  rightIndex <= high )
{
  int lastNameResult = strcasecmp( A[leftIndex]->LastName, A[rightIndex]->LastName );
  int firstNameResult = strcasecmp( A[leftIndex]->FirstName, A[rightIndex]->FirstName );

  /**
   * Yes, you can use logical expressions outside of an if condition.  The
   * parentheses aren't necessary in this case, but it should make the
   * expression easier to understand.  The result of this expression
   * will either be 0 or 1.  
   */
  int mergeLeft = lastNameResult < 0 || (lastNameResult == 0 && firstNameResult < 0);

如果左側LastName小於右側LastName或者如果姓氏相等且左側FirstName小於右側FirstName ,則mergeLeft變量將設置為 true ( 1 )。 噗,那個丑陋的嵌套if語句神奇地消失了,我們留下它作為while循環的主體:

  if ( mergeLeft )
  {
    tempA[combinedIndex++] = *A[leftIndex++];  // I've combined the updates
  }                                            // of combinedIndex and leftIndex
  else                                         // within these statements, 
  {                                            // just to save a little space
    tempA[combinedIndex++] = *A[rightIndex++]; // I also replaced pointer
  }                                            // notation with array notation
}                                              // because it's less eye-stabby

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM