...
I guess I could use the FILE word set to write something like getc()
with its own buffering, but that seems messy.
I'm playing with the idea of writing a Roff-like text formatter in
Forth. The input is lines of text "blah blech, and this that the
other...". The text lines can be arbitrarily long so I don't want to
read the entire line into a memory buffer using something like REFILL.
Also, some input lines will be formatting commands like ".i\n" (change
font to italic). Those lines should be given to the Forth text
interpreter.
I guess I could use the FILE word set to write something like getc()
with its own buffering, but that seems messy. I'm wondering if this is
a common situation and there's an idiomatic solution.
I'm playing with the idea of writing a Roff-like text formatter in
Forth. The input is lines of text "blah blech, and this that the
other...". The text lines can be arbitrarily long so I don't want to
read the entire line into a memory buffer using something like REFILL.
Let's say I don't have to worry about individual words overflowing
memory though (segfault is not allowed, but it's ok to panic and quit).
So the main loop will be to copy an input word to the output buffer and
maybe flush the output buffer. The output buffer can be of fixed size.
Also, some input lines will be formatting commands like ".i\n" (change
font to italic). Those lines should be given to the Forth text
interpreter.
I guess I could use the FILE word set to write something like getc()
with its own buffering, but that seems messy. I'm wondering if this is
a common situation and there's an idiomatic solution.
No, it's not common. If you have too little memory, the idiomatic
solution is to limit the line length (see VFX64, even though it has
enough memory).
- anton
I'm playing with the idea of writing a Roff-like text formatter in
Forth. The input is lines of text "blah blech, and this that the
other...". The text lines can be arbitrarily long so I don't want to
read the entire line into a memory buffer using something like REFILL.
Let's say I don't have to worry about individual words overflowing
memory though (segfault is not allowed, but it's ok to panic and quit).
So the main loop will be to copy an input word to the output buffer and
maybe flush the output buffer. The output buffer can be of fixed size.
Also, some input lines will be formatting commands like ".i\n" (change
font to italic). Those lines should be given to the Forth text
interpreter.
I guess I could use the FILE word set to write something like getc()
with its own buffering, but that seems messy. I'm wondering if this is
a common situation and there's an idiomatic solution.
I'm playing with the idea of writing a Roff-like text formatter in
Forth. The input is lines of text "blah blech, and this that the
other...". The text lines can be arbitrarily long so I don't want to
read the entire line into a memory buffer using something like REFILL.
Let's say I don't have to worry about individual words overflowing
memory though (segfault is not allowed, but it's ok to panic and quit).
So the main loop will be to copy an input word to the output buffer and
maybe flush the output buffer. The output buffer can be of fixed size.
Also, some input lines will be formatting commands like ".i\n" (change
font to italic). Those lines should be given to the Forth text
interpreter.
I guess I could use the FILE word set to write something like getc()
with its own buffering, but that seems messy. I'm wondering if this is
a common situation and there's an idiomatic solution.
Why not? For something like Roff (or TeX or Markdown etc.) the whole
input file easily fits into RAM, so a line would fit, too. The
question is if the Forth system supports long lines in REFILL.
And here's the signficance of REFILLing. You could pass everything to
the text interpreter, and install the following recognizer sequence:
First one that recognizes things like ".i\n", and second one that
recognizes everything and then processes the line as ordinary words.
As Anton has already noted lxf uses this internally for parsing source files Mapping the file uses memory regions outside the current process.
It is like first allocating memory and then reading in the entire file
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
Why not? For something like Roff (or TeX or Markdown etc.) the whole
input file easily fits into RAM, so a line would fit, too. The
question is if the Forth system supports long lines in REFILL.
The target processor might not have that much ram.
I was thinking of accomodating modern wysiwyg editors which
don't have line breaks except at the end of paragraphs. Maybe that's
not worthwhile.
One obvious approach is to use READ-LINE, but this unfortunately seems
to throw away the newline at the end of the line read, so it's hard to
tell if a complete line has been read, or if the buffer has simply
gotten full.
Testing with gforth, if the buffer size is exactly the
line length, then FILE-POSITION points to just after the line, and the
next call to READ-LINE returns 0 chars. No idea about other Forths.
And here's the signficance of REFILLing. You could pass everything to
the text interpreter, and install the following recognizer sequence:
First one that recognizes things like ".i\n", and second one that
recognizes everything and then processes the line as ordinary words.
I'll see if I can figure out how to do that, though the target Forth
might not have recognizers. What I wanted is a loop like
LOOP
READ a line;
IF line begins with ".", then pass the line to the text interpreter;
ELSE loop through the words on the line, copying them to the output
buffer or maybe to the output device
END LOOP
Getting words from the line should preferably use Forth's built-in
parser.
My current thought is to use
READ-LINE with some fixed buffer size, but have two contiguous buffers
and call READ-LINE twice, to handle the case where a word is split
across two buffers. Then process all words (whitespace terminated)
until the last whitespace is found. Anything left gets copied back
to the beginning of the double buffer, before calling READ-LINE again.
peter <peter.noreply@tin.it> writes:
As Anton has already noted lxf uses this internally for parsing source files >> Mapping the file uses memory regions outside the current process.
It is like first allocating memory and then reading in the entire file
Yes mmap is a virtual memory thing though. My current thought is to use >READ-LINE with some fixed buffer size, but have two contiguous buffers
and call READ-LINE twice, to handle the case where a word is split
across two buffers. Then process all words (whitespace terminated)
until the last whitespace is found. Anything left gets copied back
to the beginning of the double buffer, before calling READ-LINE again.
It's not worth bothering with true double buffering.
"Might"? If you have a concrete target with small RAM (say, one of
the Mecrisp targets),
If you have a machine where you run a WYSIWYG editor, you also have
enough RAM for keeping one line (and probably the whole text).
READ-LINE and the case where u1=line length have been discussed
several times, so apparently it's not so clear to some how a system
should behave,
Alternatively, you could go for using READ-LINE. In that case I would
treat too-long lines as errors
That means going through INCLUDE, EVALUATE, EXECUTE-PARSING, or EXECUTE-PARSING-FILE at some point.
...
READ-LINE and the case where u1=line length have been discussed
several times, so apparently it's not so clear to some how a system
should behave,
That sounds like a deficiency in the standard, but anyway yes, there are
ways to get around it.
On 20/11/2025 12:43 pm, Paul Rubin wrote:
...
READ-LINE and the case where u1=line length have been discussed
several times, so apparently it's not so clear to some how a system
should behave,
That sounds like a deficiency in the standard, but anyway yes, there are
ways to get around it.
When you say 'get around it', do you mean a broken line? If so what can
be done with a broken line because AFAIK not much. I replaced 'flag' in >READ-LINE with a trinary and I'm now looking for uses :-)
When you say 'get around it', do you mean a broken line?
dxf <dxforth@gmail.com> writes:
When you say 'get around it', do you mean a broken line?
The trouble distinguishing between a broken and an unbroken line when
u1=line length. I think for roff though, it's ok to abandon the wish to handle arbitrarily long lines. The deficiency in the standard is not explaining READ-LINE's exact behaviour in this situation. I was able to experimentally resolve the issue in gforth, but other Forths might vary.
I had for a while liked the idea of running the entire input document
through the text interpreter (wordlists or some other scheme would stop non-formatting-commands from being looked up as Forth words). But I
later mostly lost interest in that.
I guess in the text interpreter, the interpreter is at the beginning of
a line iff IN> @ gives 0, and maybe Roff could use that.
dxf <dxforth@gmail.com> writes:
When you say 'get around it', do you mean a broken line?
The trouble distinguishing between a broken and an unbroken line when
u1=line length.
The deficiency in the standard is not
explaining READ-LINE's exact behaviour in this situation.
The behaviour is specified exactly:
|If a line terminator was received before u1 characters were read, then
|u2 is the number of characters, not including the line terminator,
|actually read [...]. When u1 = u2 the line terminator has
|yet to be reached.
So the first sentence tells you what happens if line lenght < u1. And
the second sentence tells you what happens if line length >= u1.
The deficiency in the standard is in the part that I elided: It says: >(u<=i2<=u1). It does not really contradict that text, but it has
misled a number of people (including you) into thinking that the first >sentence also includes line length = u1. And the many questions about
this issue show this deficiency.
Paul Rubin <no.email@nospam.invalid> writes:
dxf <dxforth@gmail.com> writes:
When you say 'get around it', do you mean a broken line?
The trouble distinguishing between a broken and an unbroken line when >>u1=line length.
There is no such trouble in standard systems. Such a line will be
broken on such a system.
The deficiency in the standard is not
explaining READ-LINE's exact behaviour in this situation.
The behaviour is specified exactly:
|If a line terminator was received before u1 characters were read, then
|u2 is the number of characters, not including the line terminator,
|actually read [...]. When u1 = u2 the line terminator has
|yet to be reached.
- anton
In article <2025Nov21.091552@mips.complang.tuwien.ac.at>,
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
The behaviour is specified exactly:
|If a line terminator was received before u1 characters were read, then
|u2 is the number of characters, not including the line terminator, >>|actually read [...]. When u1 = u2 the line terminator has
|yet to be reached.
The problem is that it builds on the 70' CORE and idea's. It is akward.
I'm playing with the idea of writing a Roff-like text formatter in
Forth. The input is lines of text "blah blech, and this that the
other...". The text lines can be arbitrarily long so I don't want to
read the entire line into a memory buffer using something like REFILL.
Let's say I don't have to worry about individual words overflowing
memory though (segfault is not allowed, but it's ok to panic and quit).
So the main loop will be to copy an input word to the output buffer and
maybe flush the output buffer. The output buffer can be of fixed size.
Also, some input lines will be formatting commands like ".i\n" (change
font to italic). Those lines should be given to the Forth text
interpreter.
I guess I could use the FILE word set to write something like getc()
with its own buffering, but that seems messy. I'm wondering if this is
a common situation and there's an idiomatic solution.
On 18-11-2025 00:25, Paul Rubin wrote:
I'm playing with the idea of writing a Roff-like text formatter in
Forth. The input is lines of text "blah blech, and this that the
other...". The text lines can be arbitrarily long so I don't want to
read the entire line into a memory buffer using something like REFILL.
Let's say I don't have to worry about individual words overflowing
memory though (segfault is not allowed, but it's ok to panic and quit).
So the main loop will be to copy an input word to the output buffer and
maybe flush the output buffer. The output buffer can be of fixed size.
Also, some input lines will be formatting commands like ".i\n" (change
font to italic). Those lines should be given to the Forth text
interpreter.
I guess I could use the FILE word set to write something like getc()
with its own buffering, but that seems messy. I'm wondering if this is
a common situation and there's an idiomatic solution.
I don't know if it's "idiomatic", but it works. In essence, it reads the
file binary. If there is something left at the end of the buffer, it
copies that to the start, adjusts the buffer address and size and
continues. It doesn't return a word per call, you open the file and it >applies a quotation to each word parsed (a n --).
No, it's not beautiful, but it works. BTW, if you happen to be German
and your prose contains words that exceed 256 characters, you're on your
own.
Hans Bezemer
On 18-11-2025 00:25, Paul Rubin wrote:
...
I guess I could use the FILE word set to write something like getc()
with its own buffering, but that seems messy. I'm wondering if this is
a common situation and there's an idiomatic solution.
I don't know if it's "idiomatic", but it works. In essence, it reads the file binary. If there is something left at the end of the buffer, it copies that to the start, adjusts the buffer address and size and continues. It doesn't return a word per call, you open the file and it applies a quotation to each word parsed (a n --).
No, it's not beautiful, but it works. BTW, if you happen to be German and your prose contains words that exceed 256 characters, you're on your own.
...
In article <87o6oyrc7i.fsf@nightsong.com>,
Paul Rubin <no.email@nospam.invalid> wrote:
peter <peter.noreply@tin.it> writes:
As Anton has already noted lxf uses this internally for parsing source files
Mapping the file uses memory regions outside the current process.
It is like first allocating memory and then reading in the entire file
Yes mmap is a virtual memory thing though. My current thought is to use >READ-LINE with some fixed buffer size, but have two contiguous buffers
and call READ-LINE twice, to handle the case where a word is split
across two buffers. Then process all words (whitespace terminated)
until the last whitespace is found. Anything left gets copied back
to the beginning of the double buffer, before calling READ-LINE again.
It's not worth bothering with true double buffering.
The ciforth approach is more sensible. The terminal input buffer is
filled from the input stream. It is large say 16K.
Now you carve lines out of the buffer, and use a parse pointer,
maybe >IN. As soon as you find that there are no more line endings
in the remaining buffer, you copy the remainder to the start of
the buffer and fill the buffer to the brim.
In this case you will not copy more than is necessary.
On Wed, 19 Nov 2025 11:22:08 +0100
albert@spenarnc.xs4all.nl wrote:
In article <87o6oyrc7i.fsf@nightsong.com>,A good approach, IMHO.
Paul Rubin <no.email@nospam.invalid> wrote:
The ciforth approach is more sensible. The terminal input buffer is
filled from the input stream. It is large say 16K.
Now you carve lines out of the buffer, and use a parse pointer,
maybe >IN. As soon as you find that there are no more line endings
in the remaining buffer, you copy the remainder to the start of
the buffer and fill the buffer to the brim.
In this case you will not copy more than is necessary.
But how do you deal with the case where there's no line ending at EoF?
ISTM an additional action is required for the last line.
----
Bah, and indeed Humbug.
| Sysop: | DaiTengu |
|---|---|
| Location: | Appleton, WI |
| Users: | 1,089 |
| Nodes: | 10 (0 / 10) |
| Uptime: | 153:54:29 |
| Calls: | 13,921 |
| Calls today: | 2 |
| Files: | 187,021 |
| D/L today: |
3,760 files (944M bytes) |
| Messages: | 2,457,163 |