What are the implications of using a static member function with pthread_create()?

Question

I'm helping a student with their homework, which is a basic threading exercise. Unfortunately, though they're required to use C++11, they're forbidden from using std::thread . I don't see the rationale, but it's not my homework.

Here's the class:

class VaccineInfo {
public:
  VaccineInfo(const std::string &t_input_filename):
    input_filename(t_input_filename)
  { }
  VaccineInfo() = delete;

  static void *count_vaccines(void *t_vi);

  int v1_count() { return vaccine_count["v1"]; }
  int v2_count() { return vaccine_count["v2"]; }
  int v3_count() { return vaccine_count["v3"]; }

private:
  std::string input_filename;

  std::map<std::string, int> vaccine_count {
    { "v1", 0 },
    { "v2", 0 },
    { "v3", 0 }
  };

};

void *VaccineInfo::count_vaccines(void *t_vi) {
  VaccineInfo *vi = reinterpret_cast<VaccineInfo*>(t_vi);
  std::ifstream input_file;
  std::string input_line;

  input_file.open(vi->input_filename);
  if (!input_file.good()) {
    std::cerr << "No such file " << vi->input_filename << std::endl;
    return nullptr;
  }

  while (std::getline(input_file, input_line)) {
    vi->vaccine_count[input_line]++;
  }

  return nullptr;
}

And here's where pthreads comes in.

std::vector<std::string> filenames = find_filenames(".");
std::vector<pthread_t> thread_handles;
std::vector<VaccineInfo> vi_vector;

vi_vector.reserve(filenames.size());

for(const std::string &filename : filenames) {
pthread_t tid;
thread_handles.push_back(tid);
vi_vector.emplace_back(VaccineInfo(filename));
pthread_create(
    &thread_handles.back(), nullptr, &VaccineInfo::count_vaccines,
    static_cast<void*>(&vi_vector.back()));
}

for (const pthread_t tid : thread_handles) {
pthread_join(tid, nullptr);
}

It's a pretty basic exercise, except for how much fluff you have to do to get the old and the new to play nice. And that's what's got me wondering - does using a static member method as the start_routine argument to pthread_create have any undesirable side effects? I know static member variables and functions don't "belong" to any objects, but I normally think of static variables as being one-per-class, regardless of the number of objects. If there's only one copy of the static member function, as well, that seems like you'd be shooting yourself in the foot for parallelization.

Would it just be better, in this case, to make vaccine_count public and make count_vaccines() a global function?

Do hit me with whatever detail you can muster; I'm very curious. =) And, as always, thank you all for your time and effort.

Answer 1

except for how much fluff you have to do to get the old and the new to play nice.

Well, in the STL, that's essentially what the std::thread is actually doing. If you create a thread and force it to cause a stack unwinding, and if you look at said stack, you'll see a lot of weird pointer arithmetic happening with this and pthread_create (or CreateThread on Windows).

That being said, it's not unusual in any way to use a static function of a class that then calls a private member of that class on an object instance, even with the std::thread , it really just depends on what you need those functions to do.

does using a static member method as the start_routine argument to pthread_create have any undesirable side effects?

No. At least not from the perspective of functionality; that is, creating a thread on a static member won't cause any UB or crashes directly just because you are using a static function.

You do have to account for the fact that your operating on a static member function, but that's no different from having to account for constructors/destructors or any function of the language itself. Since this is a homework assignment, it's likely the professor is trying to teach "how things work" less than "how to use C++11".

Would it just be better, in this case, to make vaccine_count public and make count_vaccines() a global function?

Yes and no. Having vaccine_count as a private member then means that count_vaccines must be a friend or static function, and given that vaccine_count seems like an "important" data point that you wouldn't want a "user of the code" inadvertently setting, it's probably better to keep it private.

You could add getters and setters, but that might complicate the code unnecessarily.

You could also just make it a public variable if you trust the users of the code to protect that variable (unlikely), and you could also just make count_vaccines a free/global function, but then you need to have the function after the class declaration. And if the class is a complex class (maybe has templates or some other C++ notion), then it can really complicate the code in how you operate on the class.

So yes, it could go that way, but the professor is likely trying to teach the idea of what a static function is, how threads operate on the class and how pointers work within the constructs of this exercise, among other things.

If you have a static member variable, all objects access that variable.

That's not what static means in this context. The static keyword in C++ simply means that you do not need an object reference to call that code. So a static member variable can be accessed, not just by any object, but by any code, take this example:

class Data {
    public:
        static int x_val;
        int y_val;
};

int Data::x_val; // have to declare it since it's static


int main(int argc, char* argv[]) {
    Data::x_val = 10; // works because it's static.
    Data::y_val = 10; // error: accessing a non-static member
    
    Data obj;
    obj.y_val = 10; // ok because it's a member variable
    obj.x_val = 20; // this works as the complier ultimately translates this to `Data::x_val = 20`
    // BUT, referencing a static member/function on an object instance is "bad form"

    return 0;
}

If you have a static member function... can it be called on more than one core simultaneously?

The static keyword has no effect on which core, or thread, said function is called on or if can be done in parallel.

A CPU core can only execute 1 machine level instruction per clock cycle (so essentially, just 1 assembly instruction), when a C++ program is compiled, linked and assembled, it is these "assembled" set of instructions base on the syntax you wrote that are executed on the core (or cores) of your CPU, not the static functions.

That static function is just an address in memory that gets called on any number of threads on any CPU core that the OS determines at any given time in your program.

Yes, you could call an OS API that pins that thread of execution calling that function to a specific core, but that's a different subject.

And for a last little bit of fun for you, on an assembly level, C++ functions basically get compiled into C-like functions (an extreme over simplification, but merely for demonstration):

C++

class Data {
    public:
        void increment() {
            this->y_val += 1024;
        }
    private:
        int y_val;
};

int main() {
    Data obj;
    obj.y_val = 42;
    obj.increment(); // obj.y_val == 1066
    return 0;
}

C

struct Data {
    int y_val;
};

void Data_increment(Data* this) {
    this->y_val += 1024;
}

int main() {
    Data obj;
    obj.y_val = 42;
    increment(&obj); // obj.y_val == 1066
    return 0;
}

Again, an over simplification, but the point is to illustrate how it all builds to assembly and what the assembly does.

What are the implications of using a static member function with pthread_create()?

Question

1 answers

solution1
1 ACCPTED 2021-05-06 07:03:50

What are the implications of using a static member function with pthread_create()?

Question

1 answers

solution1 1 ACCPTED 2021-05-06 07:03:50

solution1
1 ACCPTED 2021-05-06 07:03:50