简体   繁体   English

编写此算法的更有效方法?

[英]More efficient way to write this algorithm?

Currently working on a Library simulator assignment. 目前正在从事图书馆模拟器的工作。 Everything is working fine, but I would like to know something just for the sake of knowing it. 一切工作正常,但我只是想了解一些信息。

In this program there are 3 classes: Book, Patron, and Library. 在此程序中,共有3个类:Book,Patron和Library。 The library class contains 3 private data members: a vector of pointers to books, a vector to pointers of patron's, and a currentDate int. 该库类包含3个私有数据成员:指向书的指针的向量,指向顾客的指针的向量和currentDate int。

The function in question is below: 有问题的函数如下:

void Library::incrementCurrentDate()
{
  currentDate++;

  for (int i = 0; i < members.size(); i++)
  {
    vector<Book*> ptr = members.at(i)->getCheckedOutBooks();

     for (int j = 0; j < ptr.size(); j++)
      {
        if (currentDate > ptr.at(j)->getCheckOutLength())
            members.at(i)->amendFine(.10);
      }
  }
} 

the requirements for the function are this: 该功能的要求是这样的:

increment current date; 增加当前日期; increase each Patron's fines by 10 cents for each overdue Book they have checked out (using amendFine) 将他们检出的每本过期书的每位顾客的罚款提高10美分(使用amendFine)

The way I have written it above works fine now. 我上面写的方式现在工作正常。 As I am just in my first semester of my computer science program, we cannot use anything that we have not covered, which I know is alot. 就像我刚进入计算机科学课程的第一学期一样,我们不能使用我们没有涉及的任何东西,我知道很多。 With that being said, would there be a more efficient way to implement this function using more advanced c++ programming methods? 话虽这么说,是否会有更有效的方法使用更高级的c ++编程方法来实现此功能?

  1. Use std::vector if the size is not extremely large. 如果大小不是很大,请使用std::vector

Pointers always have a cost associated with them because of the indirection involved. 由于涉及间接性,指针总是要付出一定的代价。 Looking up an address and accessing it in memory may not be able to be optimized by the compiler out and will thus involve costs with accessing memory. 查找地址并在内存中访问地址可能无法通过编译器优化,因此将涉及访问内存的成本。 Memory access is often the bottleneck for performance in systems, so it's best to try to put things near to each other in memory and try to structure your programs so that you access memory the least. 内存访问通常是系统性能的瓶颈,因此最好尝试使内存中的内容相互靠近,并尝试构建程序,以使访问内存最少。

  1. Use a database system, like SQL, if the data gets extremely large. 如果数据过大,请使用SQL之类的数据库系统。

On the other hand, we can forego all of the dirty work and use an established database library or program. 另一方面,我们可以放弃所有繁琐的工作,而使用已建立的数据库库或程序。 Something like MySQL can easily manage a lot of data with a great programming language to access and manage it as well. 诸如MySQL之类的东西可以使用出色的编程语言轻松地管理大量数据,从而也可以访问和管理它们。 Certain databases, like PostgreSQL can scale to large sets of data. 某些数据库(例如PostgreSQL)可以扩展到大量数据。 Getting familiar with it can also be quite helpful. 熟悉它也很有帮助。 Even some mobile apps might use MySQL for Android, for example. 例如,甚至某些移动应用程序可能都将MySQL用于Android。

  1. Use the modern C++11 or greater for loop iteration syntax. for循环迭代语法,请使用现代的C ++ 11或更高版本。

The current for loop syntax is quite opaque and might have a lot of cruft. 当前的for循环语法非常不透明,可能有很多不足之处。 C++11 introduced a cleaner for loop syntax to iterate across standard library containers like map or vector . C ++ 11引入了一种更清晰for循环语法,可在诸如mapvector标准库容器中进行迭代。 Use: for(auto it : vector_name) . 使用: for(auto it : vector_name) If you need to modify each one, use a reference qualifier for the it --the iterator. 如果需要修改每个参数,请为it使用参考限定符-迭代器。

  1. Use pre-increment syntax for possibly minimal speedup. 使用预递增语法可能会最小化加速。

++i and i++ are slightly different. ++ii++略有不同。 ++i just directly modifies the object where it appears in an expression before it continues evaluating it. ++i只是直接修改对象在表达式中的位置,然后再继续对其求值。 i++ creates a copy of the object, returns it, and increments the original. i++创建对象的副本,将其返回并递增原始对象。 Creating a copy of a value or object has a cost in C++, so avoiding this can be helpful in certain cases and it is a good convention to do this anyways. 在C ++中,创建值或对象的副本会产生成本,因此避免在某些情况下会有所帮助,并且无论如何都是一个好习惯。

  1. Pass by const & . 通过const &传递。 Not by just regular reference. 不只是定期参考。

Function arguments are passed by value by default in C++. 在C ++中,默认情况下,函数参数按值传递。 This means that C++ just makes a copy of the object. 这意味着C ++只会复制对象。 However, when there are mutations applied repeatedly to an object, like, say, using a function to change the value of an integer over time, you may want to pass by reference. 但是,当有重复的突变应用于对象时,例如,使用函数随时间更改整数值,您可能希望通过引用传递。 References basically pass the "real" object, meaning that any changes you make to the reference are done on the "real" object. 引用基本上会传递“真实”对象,这意味着您对引用所做的任何更改都是在“真实”对象上完成的。

Now, why pass a non-modifiable object? 现在,为什么要传递不可修改的对象? Because it can lead to better optimizations. 因为它可以导致更好的优化。 Passing by constant reference allows the compiler to make stronger assumptions about your code (eg because the reference cannot change within the course of the program, referring to the same reference multiple times in the function doesn't require the value of the argument to be reloaded over again because it shouldn't change while inside of a function). 通过常量引用传递可使编译器对您的代码进行更强的假设(例如,由于引用不能在程序过程中更改,因此在函数中多次引用相同的引用不需要重新加载参数的值再来一次,因为它在函数内部时不应更改)。

  1. Use a std::unique_ptr or std::shared_ptr . 使用std::unique_ptrstd::shared_ptr

Smart pointers are also a nice feature that was introduced with C++11, and involves pointers that automatically deallocate themselves by attaching their lifetime to scope. 智能指针也是C ++ 11引入的一项不错的功能,它涉及通过将生命周期附加到作用域来自动释放自身的指针。 In other words, no need to use new or delete --just create the pointers and you shouldn't have to keep track of when to release memory. 换句话说,无需使用newdelete ,只需创建指针即可, 不必跟踪何时释放内存。 This can get complicated in certain situations but in general, using smart pointers leads to better safety and less change of having memory management problems, which is why they were inducted into the standard library in the first place. 在某些情况下,这可能会变得复杂,但是总的来说,使用智能指针可以提高安全性,并减少内存管理问题的变化,这就是为什么首先将它们引入标准库的原因。

There are a couple of questions to answer here I think. 我认为有两个问题需要回答。 The first being: can this algorithm be more efficient? 第一个是:此算法能否更有效? And the other being: can my implementation of the algorithm in c++ be more efficient? 另一个是:我在c ++中实现算法的效率更高吗?

To the first question, I would answer no. 对于第一个问题,我不会回答。 Based on the problem, it sounds to me like you have no further information that would allow you to do any better than O(n^2). 基于这个问题,在我看来,您没有比O(n ^ 2)更好的信息。

As mentioned in the comments, you could iterate over every person and sort their books by due date. 如评论中所述,您可以遍历每个人,并按截止日期对他们的书进行排序。 In practice this could save some time, but in theory book lookup would still be linear time, O(n). 实际上,这可以节省一些时间,但从理论上讲,书本查找仍然是线性时间O(n)。 Plus you add the overhead of sorting making your algorithm now O(mnlog(n)) where m is the number of patrons and n is the number of books. 另外,您还增加了使算法现在排序为O(mnlog(n))的开销,其中m是用户数,n是书本数。 If you know you have few patrons with many books each, then sorting could be beneficial. 如果您知道您的顾客很少,每人都有很多书籍,那么分类可能会有所帮助。 If you have many patrons with few books, it would be much less beneficial. 如果您有很多主顾而很少有书,那将没有太多好处。

As for the second question: there are a few small tweaks (and a few large tweaks) that could make your code more efficient, although I would argue that a vast majority of the time they would not be necessary. 至于第二个问题:尽管我认为大多数时候它们是不必要的,但是有一些小的调整(和一些大的调整)可以使您的代码更高效。 One major thing I notice is that you recreate a vector object on every iteration of you first for loop. 我注意到的一件主要事情是,您在第一次for循环的每次迭代中都会重新创建向量对象。 By doing this you are creating unnecessary overhead. 这样做会造成不必要的开销。 Try instead this pseudo-code: 尝试改用以下伪代码:

currentDate++;
vector<Book*> ptr = members.at(i)->getCheckedOutBooks();
for(....)

Another optimization that could be a large overhaul would be to drop the Vector library. 可能需要大修的另一个优化是删除Vector库。 A vector in c++ has the ability to be resized on the fly as well as other unnecessary overhead (for your task). C ++中的向量具有即时调整大小的能力以及其他不必要的开销(用于您的任务)。 Simply using an array would be more memory efficient, although equivalently time efficient. 尽管等效地节省了时间,但简单地使用数组将提高存储效率。

You mentioned being in your first semester, so you probably have not been introduced to Big O notation yet. 您提到在第一学期就读,因此您可能尚未被Big O标记法介绍。

如果那是您要优化的唯一操作,则保留一个tuple <int, Book *, Patron * >的向量,并按表示evey检出书的到期日期的int排序,然后迭代直到到期日期大于当前申请日期罚款相关的赞助人。

If you have n checked out books, m of which are overdue, your algorithm takes O(n) time to add the fines. 如果你已经n签出书, m ,其中逾期,您的算法需要O(n)时间来添加的罚款。 This is because your data structure stores information like this 这是因为您的数据结构存储这样的信息

member -> list(checked out books)
book -> check-out length // presumably the due date for returning the book

If in addition to your members collection you also store the following information: 如果除了members集合之外,您还存储以下信息:

check-out length -> list(checked out books with that due date)
book -> member who checked it out

then you can use a sorted tree that stores all checked-out books by their due date to look up all overdue books in O(log n) . 那么您可以使用排序树,按截止日期存储所有已签出的书,以在O(log n)查找所有过期的书。 Thus the total asymptotic run-time of your algorithm would reduce from O(n) to O(log n + m) . 因此,算法的总渐近运行时间将从O(n)减少到O(log n + m)

You might consider replacing the vector with an std::map container. 您可以考虑将vector替换为std::map容器。 Maps are stored as sorted trees. 地图存储为分类树。 If you define a comparator function comparing checkout length (or more likely "expired date") you do not need to scan the entire list every time. 如果定义比较器功能来比较结帐长度(或更可能是“到期日期”),则无需每次都扫描整个列表。

A more complicated solution would store all books in a single tree of pointers sorted by their expiration time. 一种更复杂的解决方案是将所有书籍存储在按其过期时间排序的单个指针树中。 Then you wouldn't have to iterate over members at all. 然后,您根本不必遍历成员。 Instead iterate over books until you find a book that has not expired yet. 而是遍历书籍,直到找到一本尚未过期的书籍。

This is more complicated because now adding/removing books for each member (or even iterating over all books a member owns) is more difficult and may require maintaining a vector of pointers for each user as your current approach (in addition to the global book map). 这更加复杂,因为现在为每个成员添加/删除书籍(甚至遍历一个成员拥有的所有书籍)更加困难,并且可能需要作为当前方法为每个用户维护一个指针向量(除了全局书籍地图之外) )。

It's been a while since I've used C++, but almost always the standard libraries will be faster than your own implementation. 自从我使用C ++已经有一段时间了,但是几乎总是标准库会比您自己的实现更快。 For your reference, check out the standard functions that are associated with std::vector (this site is extremely useful). 供您参考,请检查与std :: vector关联的标准函数(此站点非常有用)。

You might be able to slim down the ptr.size() through some other filtering logic so you don't have to iterate over people who don't have late books (maybe some sorting on books and their due dates?) 您可能可以通过其他一些过滤逻辑来ptr.size() ,这样您就不必遍历那些没有迟到书籍的人(也许对书籍及其到期日期进行了排序?)

Right now you're amend fines in O(n) (n being getCheckOutLength().size()) but you can do it in O(log(n)) since you need just number of late books and not their objects for fining, if you have that number then you multiply it by .01 and use one amend fine function to do it all. 现在,您要修改O(n)的罚款(n为getCheckOutLength()。size()),但您可以在O(log(n))中进行罚款,因为您只需要一些较晚的书而不是它们的对象即可进行细化,如果您有该数字,则可以将其乘以.01并使用一个修正函数来完成所有操作。

So here is the way I suggest: 所以这是我建议的方式:
If you keep getCheckedOutBooks() sorted by their getCheckOutLength() in a vector then you can find which dates are more than curDate by finding std::upper_bound in the vector that gives you the first element that is greater than the currentDate, so from that element index to the end of vector is number of book who should be fined, here is the code: 如果将向量中的getCheckedOutBooks()按其getCheckOutLength()排序,则可以通过在向量中找到std :: upper_bound来查找比curDate更长的日期,该向量为您提供第一个大于currentDate的元素,因此向量结尾处的元素索引是应罚款的书数,以下是代码:

int checkedDateComparator(Patron & leftHand, Patron & rightHand){
    return leftHand.getCheckedOutLength() < rightHand.getCheckOutLength();  
}
bool operator==(Patron & a, Patron & b){
    return a.getCheckedOutLength() < b.getCheckOutLength();
}
void Library::incrementCurrentDate()
{
    currentDate++;

    for (int i = 0; i < members.size(); i++)
    {
        vector<Book*> ptr = members.at(i)->getCheckedOutBooks();
        Book dummy; //dummy used for find the fines 
        dummy.setCheckedOutLength(currentDate);
        int overdue = ptr.end() - upper_bound(ptr.begin(), ptr.end(), dummmy, checkedDateComparator);
        members.at(i)->amendFine(overdue* .01);
   }
} 

Let's take a step back and look at the requirements. 让我们退后一步,看看需求。 When you go to the library and have some possibly-late checked-out books, you probably ask the librarian what you owe. 当您去图书馆看书时,可能会问图书馆员欠什么。 The librarian looks up your account and tells you. 图书管理员会查询您的帐户并告诉您。 It is at that point that you should be calculating the fees. 在这一点上,您应该计算费用。 What you're doing now is recalculating the fees every midnight (I'm assuming). 您现在正在做的是在每个午夜重新计算费用(我假设是)。 That's the part that's inefficient. 那是效率低下的部分。

Let's instead have this use-case: 取而代之的是这个用例:

  1. Librarian attempts to check out a patron's books 馆员试图检查顾客的书籍
  2. System calculates fees 系统计算费用
  3. Patron pays any outstanding fees 赞助人支付任何未付的费用
  4. Librarian checks out the books 图书管理员检查书籍

The relevant part for your question would be step #2. 您的问题的相关部分将是步骤2。 Here's pseudocode: 这是伪代码:

float CalculateFees(Patron patron)
{
    float result = 0;
    foreach(checkedOutBook in patron.GetCheckedOutBooks())
    {
        result += CalculateFee(checkedOutBook.CheckOutDate(), today);
    }
    return result;
}

float CalculateFee(Date checkOutDate, Date today)
{
    return (today.Day() - checkOutDate.Day()) * 0.10;
}

The whole use-case could be as simple as: 整个用例可以很简单:

void AttemptCheckout(Patron patron, BookList books)
{
    float fees = CalculateFees(patron);
    if(fees == 0 || (fees > 0 && PatronPaysFees(patron, fees)))
    {
        Checkout(patron, books);
    }
    else
    {
        RejectCheckout(patron);
    }
}

I've written this in a way that makes it easy to change the fee formula. 我以一种易于更改费用公式的方式编写了此代码。 Some types of materials accrue fines differently than other types of materials. 某些类型的材料产生的罚款与其他类型的材料不同。 Fines may be capped at a certain amount. 罚款可能会被限制在一定数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM