简体   繁体   中英

How do you keep track of all strings allocated in Forth and free them on time?

I see a lot of Forth code just doing s" Hello " s" world" s+ like it's nothing, but now that I think about it, this actually allocates 3 pointers, and lose two of them to the great nothingness. Same problem goes with most uses of slurp-file .

But if I need to put every single string address I allocate into a temporary location to free them later, like s" foo" over >r (... do something... ) r> free , I'm gonna lose my mind. What is the best practice on this?

I don't see a lot of Forth code around taking memory allocation into account, and the stack aspect of it seems to go in a kind of "fire and forget" mood.

The practical

I'm working on a web server which serves HTML files, and while the request is saved in a reusable pad , the response, on the other hand, is a mix of slurped files and string concatenations.

Which means if I let the server run over the internet for some time, and let the various soup of robots you find there play with it, I might lose a consequent amount of memory just to answer them to go away.

The question

So I'm turning to the vivid Forth community around here to ask you for the best practice.

Should I:

  1. Run after every memory allocation in my program and check that I free them sometime
  2. Let the program run and restart it once a limit has been reached
  3. Use the gforth garbage collector extension
  4. Prepare a big lot of memory dedicated to a request and free everything at once at the end of the response

(1) is a scenario in my worst nightmares
(2) is the lazy way, but not that bad
(3) I looked at the code and it seems overkill for me
(4) is what I'd really like to go for, but is a bit ambitious

Bonus: What I'd do if I had to implement solution (4)

  • I would allocate a big chunk of memory and save the pointer in a variable.
  • Then have an equivalent of the here word to point at the next free location in it.
  • Then write new versions for s+ and other string manipulation words that just get here and increment it by their size
  • At the end of the server answer, I would free the initial pointer.

Is this a good strategy? Am I missing something?

Short answer: you keep a list of them.

s" does not always allocate new memory

I was wrong in my interpretation of how s" works. At interpretation (during gforth reading your file or in the interactive terminal), it effectively allocate s memory so that you get a string on the stack. But when s" is compiled into a word, it's execution calls allot instead, which uses existing dictionary space.

Gforth 0.7.3
see s" 
  34 parse save-mem ;            \ interpret
  34 parse  POSTPONE SLiteral ;  \ compile

see save-mem 
  swap >r dup allocate throw swap 2dup r> -rot move ;

see SLiteral 
  tuck 2>r  POSTPONE AHEAD here 2r> mem, align >r  POSTPONE THEN r>  POSTPONE Literal  POSTPONE Literal ;

see mem, 
  here over allot swap move ;

POSTPONE AHEAD allows the code doing the allocation to be called only once during the reading of the string, then skip this part during execution, going directly to the part which pushes the address and length on the stack.

This means strings inlined in code are compiled in place and don't need to be freed.

s" This string should be freed"  \ on the heap

: hello  ( -- addr u )
  s" This one should not." ;     \ in dictionary space

s" is implementation defined

Some forths reuse the same buffer for all their s" calls, while some other forths gives you access to 2 or 3 strings at the same time, but the next one will erase existing data.

So you should not take a s" string for granted and should copy it if you want to keep it.

How to keep track of all strings allocated

The main issue is therefore not the use of s" , but mostly s+ and slurp-file , which both call allocate internally.

I solved it using a so called "free list". Every time I use s+ or slurp-file , I keep a reference to the given pointer, store it in a linked list to be freed later.

The code

\ a simple linked-list keeping track of allocated strings

variable STRBUF-POINTER  \ the current head of the list
0 STRBUF-POINTER !

struct
  cell% field strbuf-prev  \ previous entry
  cell% field strbuf-addr  \ the string allocated
end-struct strbuf%

: add-strbuf  ( addr -- )
  strbuf% %alloc >r
  ( addr )         r@ strbuf-addr !
  STRBUF-POINTER @ r@ strbuf-prev !
  r> STRBUF-POINTER ! ;                \ become the new head

: (?free)  ( addr -- )
  dup if free throw else drop then ;

: free-strbuf  ( -- )   \ walk up the list and free strings
  begin
    STRBUF-POINTER @
  while
    STRBUF-POINTER @ >r
    r@ strbuf-addr @ (?free)           \ free the string
    r@ strbuf-prev @ STRBUF-POINTER !  \ prev becomes new head
    r> (?free)                         \ free the struct itself
  repeat ;

Usage

: my-s+  ( $1 $2 -- $3 )
  s+ over add-strbuf ;

: my-slurp-file  ( $path -- $content )
  slurp-file over add-strbuf ;

: main-process
  begin
    listen  \ wait for client request
    ( ... use my-s+ and my-slurp-file ... )
    send-response
    free-strbuf   \ we free everything we used
  again 
  ;

It seems like this solution was enough to drastically reduce memory usage in my case. But in some cases, you might want to improve it by implementing regions : instead of creating a new element in the linked list for every string, have them keep track of big reusable buffers, like I was talking in solution (4).

I used the following approaches in memory management:

  1. Have multiple heaps; free a whole heap with all allocated memory.

    • For example, a heap per request; when a request has been served, the corresponding heap is freed. You have to be sure that you don't allocate an object that should live longer than the heap.
    • (don't use anymore)
  2. Have multiple data spaces (that are managed via here , allot , , — NB: I use the same names in different namespaces); free a part of a data space (via allot with a negative argument).

    • For example, a special data space is designated for the operation of including files (with nesting). File related actions may allot memory in this data space, and this memory is automatically freed on the end of the file.
  3. Use callbacks. So you allocate memory, pass it to the callback, and then free it.

    • For example: for-filename-content ( sd.filename xt -- ) \ xt ( sd.content -- ) where the symbol sd means a string represented by a ( c-addr u ) pair. So the phrase s" /tmp/foo.txt" ['] type for-filename-content prints the content of the file /tmp/foo.txt .
  4. Introduce special named variables, so when you store a new value, the previous one is freed. This approach can be used for strings, file handles, etc.

    • In my case a defining word creates several words — one to get a value (getter), one to set a value (setter), and one to join a value. It looks like prop x creates the words x ( -- sd ) , set-x ( sd -- ) , and join-x ( sd -- ) . So s" a" set-x s" b" join-x s" c" join-x x type prints "abc".
  5. Introduce a mechanism of events, providing a way to subscribe to an event, and fire an event. So each long-living object can bind his actions to free memory to the "cleanup" event.

  6. Introduce string interpolation (not about memory management, but convenience).

  7. Use buffering to join many small strings (not about memory management, but performance).

    • For example, for serialization of an XML document.

An approach that I thought about but did not implement is the idea of ownership and scopes like in Rust. So, by default, all dynamic strings that are created in a word are freed on exit/throw. If you need to pass a string as a result or save it into a static variable, you have to mark it in some way to change its ownership/scope.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM