简体   繁体   中英

thread 'tokio-runtime-worker' has overflowed its stack

The following code aims to map each response ( .for_each ) onto a struct AggrTrade and append it to a Polars DataFrame df asynchronously.

Something I have noticed is that when the program crashes depends on how it was ran. The length of the DataFrame df impacts when the program crashes. If compiled in release mode with cargo run --release , the program runs for longer than otherwise (height of 400) as opposed to in debug compiled mode (height of around 60). I am guessing there is an issue with duplicate values and them not being dropped, but I can't find it in the code.

Is this due to how I used std::sync::{Arc, Mutex} ?

When I disable the lines:

append_df(&*df, df2).await;

write_parquet(&*df).await;

the code runs fine. Here is the rest of the code:

async fn new_stream(endpoint: &str) -> WebSocketStream<MaybeTlsStream<TcpStream>> {
    let url = "wss://fstream.binance.com/ws/".to_owned() + endpoint;
    let (ws_stream, _) = connect_async(&url).await.expect("Failed to connect");
    ws_stream
}

pub async fn new_handler(endpoint: &str) -> tokio::task::JoinHandle<()> {
    let df = Arc::new(Mutex::new(DataFrame::empty().lazy()));
    
    let stream = new_stream(endpoint).await;
    let handle = tokio::spawn(async move {
        let df = Arc::clone(&df);
        stream
            .for_each(|msg| async {
                match msg {
                    Ok(msg) => {
                        println!("{}", &msg);
                        let jsonmsg: Result<AggTrade, serde_json::Error> =
                            serde_json::from_str(&msg.to_string());
                            
                        let df2 = DataFrame::new(vec![jsonmsg])
                            .expect("Failed to create new dataframe")
                            .lazy();

                        append_df(&*df, df2).await;

                        write_parquet(&*df).await;
                    }
                    Err(e) => {
                        println!("stream error: {}", e);
                    }
                }
            })
            .await
    });
    handle
}

async fn append_df(df: &Mutex<LazyFrame>, df2: LazyFrame) {
    let mut df_edit = df.lock().await;

    *df_edit = concat([df_edit.clone(), df2], false, true).unwrap();
    println!("{:?}", df_edit.clone().collect().unwrap());
}

async fn write_parquet(df: &Mutex<LazyFrame>) {
    let mut df_write = df.lock().await;
    if &df_write.clone().collect().unwrap().height() <= &1000 {
        return;
    }
    println!("Writing to disk now!");

    let time = std::time::SystemTime::now()
        .duration_since(std::time::UNIX_EPOCH)
        .unwrap()
        .as_secs();

    let mut file = std::fs::File::create(&(time.to_string().trim() + ".parquet")).unwrap();

    ParquetWriter::new(&mut file)
        .with_compression(ParquetCompression::Snappy)
        .with_statistics(true)
        .finish(&mut df_write.clone().collect().unwrap())
        .unwrap();

    *df_write = DataFrame::empty().lazy();
}

Error I receive upon crashing:

thread 'tokio-runtime-worker' has overflowed its stack
fatal runtime error: stack overflow
[1]    29186 abort      cargo run -r
2023-01-13 22:20:14.377 osascript[29344:6199784] NSNotificationCenter connection invalid

UPDATE: I have skipped using Polars dataframes and instead did the exact same thing but with my own struct of Vec, Vec and so on, and it is working fine. I have no idea why polars crate is having this weird issue, I suspect it has to do with the concat function of polars.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM