简体   繁体   English

C ++对矢量或链表进行排序

[英]C++ sorting a vector or linked list

I have an input file that I want to sort based on timestamp which is a substring of each record. 我有一个输入文件,我想根据时间戳排序,时间戳是每个记录的子字符串。 I want to store multiple attributes of the 我想存储多个属性

The list is currently about 1000 records. 该列表目前约有1000条记录。 But, I want it to be able to scale up a bit just in case. 但是,我希望它能够扩大一点以防万一。

When I did it with a Linked List by searching the entire list for insertion it took about 20 seconds. 当我使用链接列表通过搜索整个列表进行插入时,花了大约20秒。 Now, just filling up a vector and outputting to file is taking 4 seconds (does that sound too long)? 现在,只需填充一个矢量并输出到文件需要4秒钟(这听起来太长了)?

I would like to use merge sort or quick sort (merge sort appears to be a little easier to me). 我想使用合并排序或快速排序(合并排序对我来说似乎更容易)。 The trouble that I'm running into is that I don't see many examples of implementing these sorts using objects rather than primitive data types. 我遇到的麻烦是我没有看到很多使用对象而不是原始数据类型实现这些排序的例子。

I could use either a vector or Linked list. 我可以使用矢量或链接列表。 The feedback that I've gotten from this site has been most helpful so far. 到目前为止,我从这个网站获得的反馈最有帮助。 I'm hoping that someone can sprinkle on the magic pixie dust to make this easier on me :) 我希望有人可以撒上神奇的小精灵尘埃,让我更轻松:)

Any links or examples on the easiest way to do this with pretty decent performance would be most appreciated. 任何链接或示例以最简单的方式执行此操作具有相当不错的性能将是非常感谢。 I'm getting stuck on how to implement these sorts with objects because I'm newbie at C++ :) 我对如何使用对象实现这些排序感到困惑,因为我是C ++的新手:)

Here's what my new code looks like (no sorting yet): 这是我的新代码的样子(尚无排序):

class CFileInfo  
{  
    public:  
    std::string m_PackLine;  
    std::string m_FileDateTime;  
    int m_NumDownloads;  
};  
void main()  
{  
    CFileInfo packInfo;  
    vector<CFileInfo> unsortedFiles;  
    vector<CFileInfo>::iterator Iter;  
    packInfo.m_PackLine = "Sample Line 1";  
    packInfo.m_FileDateTime = "06/22/2008 04:34";  
    packInfo.m_NumDownloads = 0;  
    unsortedFiles.push_back(packInfo);  
    packInfo.m_PackLine = "Sample Line 2";  
    packInfo.m_FileDateTime = "12/05/2007 14:54";  
    packInfo.m_NumDownloads = 1;  
    unsortedFiles.push_back(packInfo);  
    for (Iter = unsortedFiles.begin(); Iter != unsortedFiles.end(); ++Iter )   
    {  
        cout << " " << (*Iter).m_PackLine;  
    }  
}  

I'm not sure I understood your question correctly, is your problem defining the sort functor? 我不确定我是否正确理解了您的问题,是您定义排序函数的问题吗? The STL sort is generally implemented as an introspective sort which is very good for most of the cases. STL排序通常被实现为内省排序,对于大多数情况非常好。

struct sort_functor
{
    bool operator()(const CFileInfo & a, const CFileInfo & b) const
    {

        // may be a little bit more subtle depending on what your strings look like
        return a.m_FileDateTime < b.m_FileDateTime;
    }
}

std::sort(unsortedFiles.begin(), unsortedFile.end(), sort_functor());

or using boost::lambda 或者使用boost :: lambda

std::sort(unsortedFiles.begin(), 
    unsortedFile.end(),
    bind(&CFileInfo::m_FileDateTime, _1) < bind(&CFileInfo::m_FileDateTime, _2));

Was it the needed information? 这是必要的信息吗?

Sorting a linked-list will inherently be either O(N^2) or involve external random-access storage. 对链表进行排序本质上可以是O(N ^ 2)或涉及外部随机存取存储。

Vectors have random access storage. 向量具有随机存取存储。 So do arrays. 数组也是如此。 Sorting can be O(NlogN). 排序可以是O(NlogN)。

At 1000 elements you will begin to see a difference between O(N^2) and O(NlogN). 在1000个元素处,您将开始看到O(N ^ 2)和O(NlogN)之间的差异。 At 1,000,000 elements you'll definitely notice the difference! 在1,000,000个元素你肯定会注意到差异!

It is possible under very special situations to get O(N) sorting. 在非常特殊的情况下可以进行O(N)排序。 (For example: Sorting a deck of playing cards. We can create a function(card) that maps each card to its sorted position.) (例如:对一副扑克牌进行排序。我们可以创建一个功能(卡片),将每张卡片映射到其分类位置。)

But in general, O(NlogN) is as good as it gets. 但总的来说,O(NlogN)和它一样好。 So you might as well use STL's sort()! 所以你不妨使用STL的sort()!
Just add #include <algorithms> 只需添加#include <算法>


All you'll need to add is an operator<(). 您需要添加的是运算符<()。 Or a sort functor. 或者是一种排序函子。

But one suggestion: For god's sake man, if you are going to sort on a date, either encode it as a long int representing seconds-since-epoch (mktime?), or at the very least use a "year/month/day-hour:minute:second.fraction" format. 但有一个建议:为了上帝的缘故,如果你打算对约会进行排序,要么将其编码为代表秒 - 自 - 纪元(mktime?)的长整数,要么至少使用“年/月/日” -hour:分钟:second.fraction“格式。 (And MAKE SURE everything is 2 (or 4) digits with leading zeros!) Comparing "6/22/2008-4:34" and "12/5/2007-14:54" will require parsing! (并且确保一切都是带有前导零的2(或4)位数!)比较“6/22 / 2008-4:34”和“12/5 / 2007-14:54”将需要解析! Comparing "2008/06/22-04:34" with "2007/12/05-14:54" is much easier. 将“2008/06 / 22-04:34”与“2007/12 / 05-14:54”进行比较要容易得多。 (Though still much less efficient than comparing two integers!) (虽然比比较两个整数的效率要低得多!)


Rich wrote: the other answers seem to get into syntax more which is what I'm really lacking. Rich写道: 其他答案似乎进入语法更多,这是我真正缺乏的。

Ok. 好。 With basic a "int" type we have: 基本的“int”类型我们有:

#define PRINT(DATA,N) for(int i=0; i<N; i++) { cout << (i>0?", ":"") << DATA[i]; } cout << endl;

int
main()  
{
    // Creating and Sorting a stack-based array.
  int d [10] = { 1, 4, 0, 2, 8, 6, 3, 5, 9, 7 };
  PRINT(d,10);
  sort( d, d+10 );
  PRINT(d,10);

  cout << endl;

    // Creating a vector.
  int eData [10] = { 1, 4, 0, 2, 8, 6, 3, 5, 9, 7 };
  vector<int> e;
  for(int i=0; i<10; i++ )
    e.push_back( eData[i] );

    // Sorting a vector.
  PRINT(e,10);
  sort(e.begin(), e.end());
  PRINT(e,10);
}

With your own type we have: 有了你自己的类型,我们有:

class Data
{  
public:  
  string m_PackLine;  
  string m_FileDateTime;  
  int    m_NumberDownloads;

    /* Lets simplify creating Data elements down below. */
  Data( const string & thePackLine  = "",
        const string & theDateTime  = "",
        int            theDownloads = 0 )
      : m_PackLine        ( thePackLine  ),
        m_FileDateTime    ( theDateTime  ),
        m_NumberDownloads ( theDownloads )
    { }

    /* Can't use constructor with arrays */
  void set( const string & thePackLine,
            const string & theDateTime,
            int            theDownloads = 0 )
    {
      m_PackLine        = thePackLine;
      m_FileDateTime    = theDateTime;
      m_NumberDownloads = theDownloads;
    }

    /* Lets simplify printing out down below. */ 
  ostream & operator<<( ostream & theOstream ) const
    {
      theOstream << "PackLine=\"" << m_PackLine
                 << "\"   fileDateTime=\"" << m_FileDateTime
                 << "\"   downloads=" << m_NumberDownloads;
      return theOstream;
    }


    /*
     * This is IT!  All you need to add to use sort()!
     *  Note:  Sort is just on m_FileDateTime.  Everything else is superfluous.
     *  Note:  Assumes "YEAR/MONTH/DAY HOUR:MINUTE" format.
     */
  bool operator< ( const Data & theOtherData ) const
    { return m_FileDateTime < theOtherData.m_FileDateTime; }

};

    /* Rest of simplifying printing out down below. */ 
ostream & operator<<( ostream & theOstream, const Data & theData )
  { return theData.operator<<( theOstream ); }


    /* Printing out data set. */
#define PRINT(DATA,N) for(int i=0; i<N; i++) { cout << "[" << i << "]  " << DATA[i] << endl; }  cout << endl;

int
main()
{  
    // Creating a stack-based array.
  Data d [10];
  d[0].set( "Line 1", "2008/01/01 04:34", 1 );
  d[1].set( "Line 4", "2008/01/04 04:34", 4 );
  d[2].set( "Line 0", "2008/01/00 04:34", 0 );
  d[3].set( "Line 2", "2008/01/02 04:34", 2 );
  d[4].set( "Line 8", "2008/01/08 04:34", 8 );
  d[5].set( "Line 6", "2008/01/06 04:34", 6 );
  d[6].set( "Line 3", "2008/01/03 04:34", 3 );
  d[7].set( "Line 5", "2008/01/05 04:34", 5 );
  d[8].set( "Line 9", "2008/01/09 04:34", 9 );
  d[9].set( "Line 7", "2008/01/07 04:34", 7 );

    // Sorting a stack-based array.
  PRINT(d,10);
  sort( d, d+10 );
  PRINT(d,10);

  cout << endl;

    // Creating a vector.
  vector<Data> e;
  e.push_back( Data( "Line 1", "2008/01/01 04:34", 1 ) );
  e.push_back( Data( "Line 4", "2008/01/04 04:34", 4 ) );
  e.push_back( Data( "Line 0", "2008/01/00 04:34", 0 ) );
  e.push_back( Data( "Line 2", "2008/01/02 04:34", 2 ) );
  e.push_back( Data( "Line 8", "2008/01/08 04:34", 8 ) );
  e.push_back( Data( "Line 6", "2008/01/06 04:34", 6 ) );
  e.push_back( Data( "Line 3", "2008/01/03 04:34", 3 ) );
  e.push_back( Data( "Line 5", "2008/01/05 04:34", 5 ) );
  e.push_back( Data( "Line 9", "2008/01/09 04:34", 9 ) );
  e.push_back( Data( "Line 7", "2008/01/07 04:34", 7 ) );

    // Sorting a vector.
  PRINT(e,10);
  sort(e.begin(), e.end());
  PRINT(e,10);
}

The stl has a sort algorithm in the header stl在头文件中有一个排序算法

 <algorithm>

Here's a link to the SGI manual. 这是 SGI手册的链接。

Use std::sort in the algorithm header: 在算法标题中使用std :: sort:

If you define the operator < for CFileInfo, it should work without a problem. 如果你定义运算符<for CFileInfo,它应该没有问题。

Alternatively, define a functor performing the comparison, and pass that as a separate argument to the sort function. 或者,定义执行比较的仿函数,并将其作为单独的参数传递给sort函数。

Rich -- To answer you more recent question (and not your original question), it's probably best/simplest to just parse out the date with sscanf(). Rich - 为了回答你最近的问题(而不是你原来的问题),用sscanf()解析日期可能是最好/最简单的。 Ideally you want to store it numerically to begin with. 理想情况下,您希望以数字方式存储它。

With a "YYYY/MM/DD-HH:MM" string, you can just compare the strings. 使用“YYYY / MM / DD-HH:MM”字符串,您只需比较字符串即可。 All the strings are the same length, and you are going from largest time increment to smallest time increment as you read left-to-right. 所有字符串都是相同的长度,当您从左向右阅读时,您将从最大时间增量到最小时间增量。

But comparing strings is very inefficient! 但是比较字符串是非常低效的!

Usually dates are stored as time_t (integer) values measured in seconds since the Epoch (00:00:00 UTC, January 1, 1970). 通常日期存储为自Epoch(1970年1月1日00:00:00 UTC)以秒为单位测量的time_t(整数)值。

mktime() or timegm() (if you have timegm) will construct a time_t value from a "struct tm" you supply. mktime()或timegm()(如果你有timegm)将从你提供的“struct tm”构造一个time_t值。

Sample code: 示例代码:

#define SHOW(X)  cout << # X " = " << (X)

int
main()
{
  const string s = "2008/12/03 12:48";
  struct tm    datetime;
  time_t       t;

  memset( & datetime, 0, sizeof(datetime) );

  if ( 5 != sscanf( s.c_str(), "%d/%d/%d %d:%d",
                    & datetime.tm_year,
                    & datetime.tm_mon,
                    & datetime.tm_mday,
                    & datetime.tm_hour,
                    & datetime.tm_min  ) )
  {
    cout << "FAILED to parse:  \"" << s << "\"" << endl;
    exit(-1);
  }

    /* tm_year - The number of years since 1900. */
  datetime.tm_year -= 1900;

    /* tm_mon - The number of months since January, in the range 0 to 11. */
  datetime.tm_mon --;

    /* tm_mday - The day of the month, in the range 1 to 31. */
    /* tm_hour - The number of hours past midnight, in the range 0 to 23. */
    /* tm_min - The number of minutes after the hour, in the range 0 to 59. */
  // No change.

  /* If using mktime, you may need these to force UTC time:
   *   setenv("TZ","",1);
   *   tzset();
   */

  t = mktime( & datetime );

  SHOW( t ) << endl;
  SHOW( asctime( & datetime ) );
  SHOW( ctime( & t ) );
}

Now given two time (date) values, eg time_t t1, t2 , you can compare them with just t1<t2 . 现在给出两个时间(日期)值,例如time_t t1,t2 ,您可以将它们与t1 <t2进行比较。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM