多線程C Lua模塊導致Lua腳本中出現段錯誤

Question

我為Lua編寫了一個非常簡單的C庫，其中包含一個用於啟動線程的函數，該線程除了循環外什么也不做：

#include "lua.h"
#include "lauxlib.h"
#include <pthread.h>
#include <stdio.h>

pthread_t handle;
void* mythread(void* args)
{
    printf("In the thread !\n");
    while(1);
    pthread_exit(NULL);
}

int start_mythread()
{
    return pthread_create(&handle, NULL, mythread, NULL);
}

int start_mythread_lua(lua_State* L)
{
    lua_pushnumber(L, start_mythread());
    return 1;
}

static const luaL_Reg testlib[] = {
    {"start_mythread", start_mythread_lua},
    {NULL, NULL}
};

int luaopen_test(lua_State* L)
{
/*
    //for lua 5.2
    luaL_newlib(L, testlib);
    lua_setglobal(L, "test");
*/
    luaL_register(L, "test", testlib);
    return 1;
}

現在，如果我編寫了一個非常簡單的Lua腳本，它就可以做到：

require("test")
test.start_mythread()

使用lua myscript.lua運行腳本有時會導致段錯誤。 這是GDB關於核心轉儲必須說的：

Program terminated with signal 11, Segmentation fault.
#0  0xb778b75c in ?? ()
(gdb) thread apply all bt

Thread 2 (Thread 0xb751c940 (LWP 29078)):
#0  0xb75b3715 in _int_free () at malloc.c:4087
#1  0x08058ab9 in l_alloc ()
#2  0x080513a2 in luaM_realloc_ ()
#3  0x0805047b in sweeplist ()
#4  0x080510ef in luaC_freeall ()
#5  0x080545db in close_state ()
#6  0x0804acba in main () at lua.c:389

Thread 1 (Thread 0xb74efb40 (LWP 29080)):
#0  0xb778b75c in ?? ()
#1  0xb74f6efb in start_thread () from /lib/i386-linux-gnu/i686/cmov/libpthread.so.0
#2  0xb7629dfe in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:129

主線程的堆棧有時會發生一些變化。
似乎start_thread函數想要跳轉到有時恰好屬於無法訪問的內存的給定地址（在本例中為b778b75c）。
編輯
我也有一個valgrind輸出：

==642== Memcheck, a memory error detector
==642== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==642== Using Valgrind-3.10.0 and LibVEX; rerun with -h for copyright info
==642== Command: lua5.1 go.lua
==642== 
In the thread !
In the thread !
==642== Thread 2:
==642== Jump to the invalid address stated on the next line
==642==    at 0x403677C: ???
==642==    by 0x46BEEFA: start_thread (pthread_create.c:309)
==642==    by 0x41C1DFD: clone (clone.S:129)
==642==  Address 0x403677c is not stack'd, malloc'd or (recently) free'd
==642== 
==642== 
==642== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==642==  Access not within mapped region at address 0x403677C
==642==    at 0x403677C: ???
==642==    by 0x46BEEFA: start_thread (pthread_create.c:309)
==642==    by 0x41C1DFD: clone (clone.S:129)
==642==  If you believe this happened as a result of a stack
==642==  overflow in your program's main thread (unlikely but
==642==  possible), you can try to increase the size of the
==642==  main thread stack using the --main-stacksize= flag.
==642==  The main thread stack size used in this run was 8388608.
==642== 
==642== HEAP SUMMARY:
==642==     in use at exit: 1,296 bytes in 6 blocks
==642==   total heap usage: 515 allocs, 509 frees, 31,750 bytes allocated
==642== 
==642== LEAK SUMMARY:
==642==    definitely lost: 0 bytes in 0 blocks
==642==    indirectly lost: 0 bytes in 0 blocks
==642==      possibly lost: 136 bytes in 1 blocks
==642==    still reachable: 1,160 bytes in 5 blocks
==642==         suppressed: 0 bytes in 0 blocks
==642== Rerun with --leak-check=full to see details of leaked memory
==642== 
==642== For counts of detected and suppressed errors, rerun with: -v
==642== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Killed

但是，到目前為止，我一直很好，只是打開lua解釋器，然后一個接一個地手動輸入相同的指令。
同樣，使用相同的lib執行相同功能的C程序：

int start_mythread();

int main()
{
    int ret = start_mythread();
    return ret;
}

正如應該的那樣，在我的測試中從未失敗過。
我已經嘗試了Lua 5.1和5.2，都沒有用。
編輯：我應該指出，我在運行32位Debian Wheezy（Linux 3.2）的單核eeePC上進行了測試。
我剛剛在我的主機（4核64位Arch linux）上再次進行了測試，並且每次在lua myscript.lua segfaults上啟動腳本...從解釋器提示符輸入命令也可以正常工作如上面的C程序。

我之所以寫這個小庫的原因是因為我正在寫一個更大的庫，而我第一次遇到這個問題。 經過數小時的徒勞無功的調試，包括一個接一個地刪除每個共享的結構/變量（是的，我很拼命），我來介紹一下這段代碼。
因此，我的猜測是Lua我做錯了什么，那可能是什么呢？ 我已經盡可能地搜索了這個問題，但是我發現大多數人在使用多個線程使用Lua API時遇到了問題（這不是我要在這里做的事情）。
如果您有任何想法，將不勝感激。

編輯
更准確地說，我想知道在編寫供Lua腳本使用的C庫時是否應該對線程采取額外的預防措施。 當Lua“卸載”庫時，是否需要終止從動態加載的庫中創建的線程？

Answer 1

為什么Segfault會在Lua模塊中發生？

您的Lua腳本在線程完成之前退出，這會導致segfault。 在正常的解釋器關閉過程中，使用dlclose()卸載了Lua模塊，因此該線程的指令從內存中刪除，並且在讀取下一條指令時出現段錯誤。

有什么選擇？

任何在卸載模塊之前停止線程的解決方案都可以使用。 在主線程中使用pthread_join()將等待線程完成（您可能希望使用pthread_cancel()殺死長時間運行的線程）。 在模塊卸載之前，在主線程中調用pthread_exit()也可以防止崩潰（因為這可以防止dlclose() ），但同時也會中止Lua解釋器的正常清理/關閉過程。

以下是一些有效的示例：

int pexit(lua_State* L) {
   pthread_exit(NULL);
   return 0; 
} 

int join(lua_State* L)
{
  pthread_join(handle, NULL);
  return 0;
}

static const luaL_Reg testlib[] = {
    {"start_mythread", start_mythread_lua},
    {"join", join},
    {"exit", pexit},
    {NULL, NULL}
};

void* mythread(void* args) {
  int i, j, k;
    printf("In the thread !\n");
    for (i = 0; i < 10000; ++i) {
      for (j = 0; j < 10000; ++j) {
        for (k = 0; k < 10; ++k) {
          pow(1, i);
        }
      }
    }
    pthread_exit(NULL);
}

現在腳本將正常退出：

require('test')
test.start_mythread()
print("launched thread")
test.join() -- or test.exit()
print("thread joined")

要自動執行此操作，您可以綁定到垃圾回收器，因為在卸載共享對象之前，模塊中的所有對象都已釋放。 （如大狼建議）

關於從main（）調用pthread_exit（）的討論： 如果您沒有顯式調用pthread_exit（），那么如果main（）在其產生的線程之前完成，則存在一個明確的問題 。 它創建的所有線程將終止，因為main（）已完成，並且不再存在以支持線程。 通過讓main（）顯式調用pthread_exit（）作為最后一件事，main（）將被阻塞並保持活動狀態以支持它創建的線程，直到完成。

（此引用有點誤導：從main()返回大致等同於調用exit() ，這將退出包括所有正在運行的線程的進程。這可能與您想要的行為不完全相同。在線程中調用pthread_exit()另一方面，主線程將退出主線程，但保持所有其他線程運行，直到它們自己停止或其他人將其殺死為止。同樣，這可能是或不是您想要的行為。除非您選擇用例的錯誤選項。）

Answer 2

所以，看來我必須確保我的所有線程都通過Lua的卸載我的lib中的時間內完成。

一個辦法

我可以設置一個清理函數，以便在卸載庫時調用該函數。
在此函數中，我可以確保我的lib啟動的所有線程都已終止。 如果我斷開了仍在運行的線程，從中調用pthread_exit可能很容易，但是我不確定它的安全性/清潔性，因為它會突然中斷Lua。
無論如何，我可以通過創建一個帶有設置為我的清理函數的__gc字段的元表來實現此目的，然后將該元表影響到Lua 5.2中我的lib表。

int cleanup(lua_State* L)
{
    /*Do the cleaning*/
    return 0;
}

int luaopen_test(lua_State* L)
{
    //for lua 5.2
    //metatable with cleanup method for the lib
    luaL_newmetatable(L, "test.cleanup");
    //set our cleanup method as the __gc callback
    lua_pushstring(L, "__gc");
    lua_pushcfunction(L, cleanup);
    lua_settable(L, -3);
    //open our test lib
    luaL_newlib(L, testlib);
    //associate it with our metatable
    luaL_setmetatable(L, "test.cleanup");

    return 1;
}

在Lua 5.1中， __gc選項僅適用於userdata。 有幾種解決方案可以使其在我的情況下起作用：
-Lua關閉/程序執行回調結束
- http://lua-users.org/wiki/LuaFaq （見“為什么不__gc和__len元方法在表上工作嗎？”）
-Greatwolf解決的方案是擁有一個全局對象並附加了所述的metatable。

多線程C Lua模塊導致Lua腳本中出現段錯誤

問題描述

2 個解決方案

解決方案1
2 已采納

解決方案2
0 2015-02-16 12:26:06

一個辦法

多線程C Lua模塊導致Lua腳本中出現段錯誤

問題描述

2 個解決方案

解決方案1 2 已采納

解決方案2 0 2015-02-16 12:26:06

一個辦法

解決方案1
2 已采納

解決方案2
0 2015-02-16 12:26:06