克隆的奇怪行为

Question

这是一个相当简单的应用程序，它使用clone()调用创建一个轻量级进程（线程）。

#define _GNU_SOURCE

#include <sched.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <time.h>

#define STACK_SIZE 1024*1024

int func(void* param) {
    printf("I am func, pid %d\n", getpid());    
    return 0;
}

int main(int argc, char const *argv[]) {
    printf("I am main, pid %d\n", getpid());
    void* ptr = malloc(STACK_SIZE);

    printf("I am calling clone\n");             
    int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL);
    // works fine with sleep() call
    // sleep(1);

    if (res == -1) {
        printf("clone error: %d", errno);       
    } else {
        printf("I created child with pid: %d\n", res);      
    }

    printf("Main done, pid %d\n", getpid());        
    return 0;
}

结果如下：

运行1：

➜  LFD401 ./clone
I am main, pid 10974
I am calling clone
I created child with pid: 10975
Main done, pid 10974
I am func, pid 10975

运行2：

➜  LFD401 ./clone
I am main, pid 10995
I am calling clone
I created child with pid: 10996
I created child with pid: 10996
I am func, pid 10996
Main done, pid 10995

运行3：

➜  LFD401 ./clone
I am main, pid 11037
I am calling clone
I created child with pid: 11038
I created child with pid: 11038
I am func, pid 11038
I created child with pid: 11038
I am func, pid 11038
Main done, pid 11037

运行4：

➜  LFD401 ./clone
I am main, pid 11062
I am calling clone
I created child with pid: 11063
Main done, pid 11062
Main done, pid 11062
I am func, pid 11063

这里发生了什么？ 为什么“我创造孩子”的信息有时会被打印几次？

此外，我注意到clone调用后添加延迟“修复”了问题。

Answer 1

你有一个竞争条件（即）你没有stdio隐含的线程安全性。

问题更严重。 您可以获得重复的“func”消息。

问题是使用clone与pthread_create没有相同的保证。 （即）你没有获得printf的线程安全变体。

我不确定，但是，IMO关于stdio流和线程安全的措辞在实践中仅适用于使用pthreads 。

所以，你必须处理你自己的线程锁定。

以下是重新编码为使用pthread_create程序版本。 它似乎没有发生任何事故：

#define _GNU_SOURCE

#include <sched.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>

#define STACK_SIZE 1024*1024

void *func(void* param) {
    printf("I am func, pid %d\n", getpid());
    return (void *) 0;
}

int main(int argc, char const *argv[]) {
    printf("I am main, pid %d\n", getpid());
    void* ptr = malloc(STACK_SIZE);

    printf("I am calling clone\n");

    pthread_t tid;
    pthread_create(&tid,NULL,func,NULL);
    //int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL);
    int res = 0;

    // works fine with sleep() call
    // sleep(1);

    if (res == -1) {
        printf("clone error: %d", errno);
    } else {
        printf("I created child with pid: %d\n", res);
    }

    pthread_join(tid,NULL);
    printf("Main done, pid %d\n", getpid());
    return 0;
}

这是我用来检查错误的测试脚本[这有点粗糙，但应该没问题]。 针对您的版本运行，它将很快中止。 pthread_create版似乎传递得很好

#!/usr/bin/perl
# clonetest -- clone test
#
# arguments:
#   "-p0" -- suppress check for duplicate parent messages
#   "-c0" -- suppress check for duplicate child messages
#   1 -- base name for program to test (e.g. for xyz.c, use xyz)
#   2 -- [optional] number of test iterations (DEFAULT: 100000)

master(@ARGV);
exit(0);

# master -- master control
sub master
{
    my(@argv) = @_;
    my($arg,$sym);

    while (1) {
        $arg = $argv[0];
        last unless (defined($arg));

        last unless ($arg =~ s/^-(.)//);
        $sym = $1;

        shift(@argv);

        $arg = 1
            if ($arg eq "");

        $arg += 0;
        ${"opt_$sym"} = $arg;
    }

    $opt_p //= 1;
    $opt_c //= 1;
    printf("clonetest: p=%d c=%d\n",$opt_p,$opt_c);

    $xfile = shift(@argv);
    $xfile //= "clone1";
    printf("clonetest: xfile='%s'\n",$xfile);

    $itermax = shift(@argv);
    $itermax //= 100000;
    $itermax += 0;
    printf("clonetest: itermax=%d\n",$itermax);

    system("cc -o $xfile -O2 $xfile.c -lpthread");
    $code = $? >> 8;
    die("master: compile error\n")
        if ($code);

    $logf = "/tmp/log";

    for ($iter = 1;  $iter <= $itermax;  ++$iter) {
        printf("iter: %d\n",$iter)
            if ($opt_v);
        dotest($iter);
    }
}

# dotest -- perform single test
sub dotest
{
    my($iter) = @_;
    my($parcnt,$cldcnt);
    my($xfsrc,$bf);

    system("./$xfile > $logf");

    open($xfsrc,"<$logf") or
        die("dotest: unable to open '$logf' -- $!\n");

    while ($bf = <$xfsrc>) {
        chomp($bf);

        if ($opt_p) {
            while ($bf =~ /created/g) {
                ++$parcnt;
            }
        }

        if ($opt_c) {
            while ($bf =~ /func/g) {
                ++$cldcnt;
            }
        }
    }

    close($xfsrc);

    if (($parcnt > 1) or ($cldcnt > 1)) {
        printf("dotest: fail on %d -- parcnt=%d cldcnt=%d\n",
            $iter,$parcnt,$cldcnt);
        system("cat $logf");
        exit(1);
    }
}

更新：

您是否能够使用克隆重新创建OP问题？

绝对。 在我创建pthreads版本之前，除了测试OP的原始版本之外，我还创建了以下版本：

（1）将setlinebuf添加到main的开头

（2）在clone和__fpurge之前添加fflush作为func的第一个语句

（3）在return 0之前在func添加了fflush

版本（2）消除了重复的父消息，但重复的子消息仍然存在

如果您想亲眼看到这个，请从问题，我的版本和测试脚本中下载OP的版本。 然后，在OP的版本上运行测试脚本。

我发布了足够的信息和文件，以便任何人都可以重新创建问题。

请注意，由于我的系统和OP之间的差异，我不能在3-4次尝试时重现问题。 所以，这就是我创建脚本的原因。

该脚本执行100,000次测试运行，通常问题将在5000-15000内表现出来。

Answer 2

您的进程都使用相同的stdout （即C标准库FILE结构），其中包含一个意外共享的缓冲区。 这无疑会造成问题。

Answer 3

我不能重新创建OP的问题，但我不认为printf实际上是一个问题。

glibc文档：

POSIX标准要求默认情况下流操作是原子操作。 即，同时在两个线程中对同一流发出两个流操作将导致操作被执行，就像它们是按顺序发出一样。 在读取或写入时执行的缓冲操作受到保护，不受同一流的其他使用的影响。 为此，每个流都有一个内部锁定对象，必须（隐式）获取才能完成任何工作。

编辑：

尽管上述情况适用于线程，但正如rici指出的那样，对源软件有一个评论：

基本上，除非孩子将自己限制为纯计算和直接系统调用（通过sys / syscall.h），否则你无法安全地使用CLONE_VM。 如果您使用任何标准库，您可能会冒着父母和孩子相互破坏彼此内部状态的风险。 你也遇到了glibc在用户空间中缓存pid / tid的事实，以及glibc期望总是有一个有效的线程指针这一事实，你对clone的调用无法正确初始化，因为它不知道（并且不应该知道））线程的内部实现。

显然，如果设置了CLONE_VM但是没有CLONE_THREAD | CLONE_SIGHAND，那么glibc不能用于克隆。

Answer 4

每个人都暗示：这似乎是一个问题，如何clone() ，进程安全？ 使用printf的锁定版本的粗略草图（使用write(2) ），输出是预期的。

#define _GNU_SOURCE

#include <sched.h>
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include <errno.h>
#include <stdlib.h>
#include <time.h>

#define STACK_SIZE 1024*1024

// VERY rough attempt at a thread-safe printf
#include <stdarg.h>
#define SYNC_REALLOC_GROW 64
int sync_printf(const char *format, ...)
{
  int n, all = 0;
  int size = 256;
  char *p, *np;
  va_list args;

  if ((p = malloc(size)) == NULL)
    return -1;

  for (;;) {
    va_start(args, format);
    n = vsnprintf(p, size, format, args);
    va_end(args);
    if (n < 0)
      return -1;
    all += n;
    if (n < size)
      break;
    size = n + SYNC_REALLOC_GROW;
    if ((np = realloc(p, size)) == NULL) {
      free(p);
      return -1;
    } else {
      p = np;
    }
  }
  // write(2) shoudl be threadsafe, so just in case
  flockfile(stdout);
  n = (int) write(fileno(stdout), p, all);
  fflush(stdout);
  funlockfile(stdout);
  va_end(args);
  free(p);
  return n;
}


int func(void *param)
{
  sync_printf("I am func, pid %d\n", getpid());
  return 0;
}

int main()
{

  sync_printf("I am main, pid %d\n", getpid());
  void *ptr = malloc(STACK_SIZE);

  sync_printf("I am calling clone\n");
  int res = clone(func, ptr + STACK_SIZE, CLONE_VM, NULL);
  // works fine with sleep() call
  // sleep(1);

  if (res == -1) {
    sync_printf("clone error: %d", errno);
  } else {
    sync_printf("I created child with pid: %d\n", res);
  }
  sync_printf("Main done, pid %d\n\n", getpid());
  return 0;
}

第三次：它只是一个草图，没有时间用于强大的版本，但这不应该妨碍你写一个。

Answer 5

正如evaitl指出的那样， glfc的文档记录了printf是线程安全的。 但是，这通常假设您使用指定的glibc函数来创建线程（即pthread_create() ）。 如果你不这样做，那么你就是靠自己。

printf()采用的锁是递归的（参见flockfile ）。 这意味着如果已经采取锁定，则实现将检查锁定的所有者对锁定器。 如果锁定器与所有者相同，则锁定尝试成功。

要区分不同的线程，您需要正确设置TLS ，而不是pthread_create() 。 我猜测的是，在你的情况下，标识线程的TLS变量对于两个线程都是相同的，所以你最终获得了锁。

TL; DR：请使用pthread_create()

克隆的奇怪行为

问题描述

5 个解决方案

解决方案1
5 已采纳 2016-07-20 23:01:44

解决方案2
3 2016-07-20 21:29:40

解决方案3
3 2016-07-20 22:16:29

解决方案4
2 2016-07-20 22:20:37

解决方案5
2 2016-07-20 23:07:29

克隆的奇怪行为

问题描述

5 个解决方案

解决方案1 5 已采纳 2016-07-20 23:01:44

解决方案2 3 2016-07-20 21:29:40

解决方案3 3 2016-07-20 22:16:29

解决方案4 2 2016-07-20 22:20:37

解决方案5 2 2016-07-20 23:07:29

解决方案1
5 已采纳 2016-07-20 23:01:44

解决方案2
3 2016-07-20 21:29:40

解决方案3
3 2016-07-20 22:16:29

解决方案4
2 2016-07-20 22:20:37

解决方案5
2 2016-07-20 23:07:29