Forum: War Ensemble BBS

Using file descriptors

From Luis Mendes@luisXXXlupe@gmail.com to comp.lang.awk on Thu Nov 6 08:35:14 2025

From Newsgroup: comp.lang.awk

Hi all,

I've build a small gawk script that is intended to provide the user three outputs, all at the END block.
This is running in Linux.
Preferable, I'd like some non-specific gawk awk.

The script is invoked as:
find . -name '*.yml' | xargs awk -f script.awk

Schematics of the script:
do some stuff for some lines of each file

END {
do some stuff
print "title1"
for - some cycle
print "details"

print "title2"
for - another cycle
printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t -
k2,2"

print "title3"
for - third cycle
print detail
}

As it is, it prints all from first for cycle, the title of the second, the third cycle and only aftwerwards the detail of the second cycle.
As I have searched and understand, the sort is done after all of the
printfs.

So, I thought of using file descriptors, to have the prints in order.
Modified the printf line to:
printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t -k2,2 |

&10"

And adding below:
while ((getline < "/dev/fd/10") >0)
print $0

But get either, bad file descriptor or file descriptor not found.
What should be modified?

Another question, as some file descriptors are in use, how to find a file descriptor that is free to be used?

Thanks,

Luís Mendes
--- Synchronet 3.21a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Thu Nov 6 10:00:07 2025

From Newsgroup: comp.lang.awk

On 06.11.2025 09:35, Luis Mendes wrote:

Hi all,

I've build a small gawk script that is intended to provide the user three outputs, all at the END block.
This is running in Linux.
Preferable, I'd like some non-specific gawk awk.

The script is invoked as:
find . -name '*.yml' | xargs awk -f script.awk

Schematics of the script:
do some stuff for some lines of each file

END {
do some stuff
print "title1"
for - some cycle
print "details"

print "title2"
for - another cycle
printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t -
k2,2"

print "title3"
for - third cycle
print detail
}

As it is, it prints all from first for cycle, the title of the second, the third cycle and only aftwerwards the detail of the second cycle.
As I have searched and understand, the sort is done after all of the printfs.

So, I thought of using file descriptors, to have the prints in order. Modified the printf line to:
printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t -k2,2 | cat>&10"

And adding below:
while ((getline < "/dev/fd/10") >0)
print $0

But get either, bad file descriptor or file descriptor not found.
What should be modified?

Another question, as some file descriptors are in use, how to find a file descriptor that is free to be used?

This is not literally answering your question, but I think a better alternative...

After the "sort" command, before the "title 3", close that output
channel for the sort pipe...
(I'm using a variable for the "sort" command to simplify that.)

BEGIN { cmd = "sort -t -k2,2" ; ... }
...
END { ...
for - another cycle
printf "%12s,%6s\n", k_arr[1], somefunc(k_arr[2]) | cmd

close (cmd) # <<<<<<<<

print "title3"
...
}

(I think this is standard Awk, not a GNU Awk feature. - CMIIW.)

HTH.

Janis

Thanks,

Luís Mendes

--- Synchronet 3.21a-Linux NewsLink 1.2

From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Thu Nov 6 17:17:49 2025

From Newsgroup: comp.lang.awk

In article <690c5dc2$0$665$14726298@news.sunsite.dk>,
Luis Mendes <luisXXXlupe@gmail.com> wrote:

Hi all,

I've build a small gawk script that is intended to provide the user three >outputs, all at the END block.
This is running in Linux.
Preferable, I'd like some non-specific gawk awk.

The script is invoked as:
find . -name '*.yml' | xargs awk -f script.awk

Schematics of the script:
do some stuff for some lines of each file

I think Janis has the right idea here - which is that you have to close the sort command before it will generate any output. In your script, without
an explicit close(), the sort gets closed automatically when AWK exits,
which results in the output coming out then (after everything else has
already been printed by the AWK script).

This is kind of the "first principle" of "sort" - the fact that it cannot generate any output until it is sure that it has read all the input. Note
that in GAWK, "sort" is the "poster child" for why there is an optional
second arg to close() - so that you can run "sort" as a co-process and
close the input side - allowing "sort" to now generate output - without
closing the output side. This allows the GAWK program to read back the
output from "sort".
--
"Every time Mitt opens his mouth, a swing state gets its wings."

(Should be on a bumper sticker)
--- Synchronet 3.21a-Linux NewsLink 1.2

From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Thu Nov 6 23:50:15 2025

From Newsgroup: comp.lang.awk

On 06.11.2025 18:17, Kenny McCormack wrote:

In article <690c5dc2$0$665$14726298@news.sunsite.dk>,
Luis Mendes <luisXXXlupe@gmail.com> wrote:

Hi all,

I've build a small gawk script that is intended to provide the user three >> outputs, all at the END block.
This is running in Linux.
Preferable, I'd like some non-specific gawk awk.

The script is invoked as:
find . -name '*.yml' | xargs awk -f script.awk

Schematics of the script:
do some stuff for some lines of each file

I think Janis has the right idea here - which is that you have to close the sort command before it will generate any output. In your script, without
an explicit close(), the sort gets closed automatically when AWK exits,
which results in the output coming out then (after everything else has already been printed by the AWK script).

This is kind of the "first principle" of "sort" - the fact that it cannot generate any output until it is sure that it has read all the input. [...]

Just be aware that the effect can not only be seen with external
commands that wait for all input (like sort) but also for commands
that are able to immediately output its processed input; e.g. in

{ print | "cat -" }
END { print "END" }

if fed with, say, 'seq 10' you will first see the "END". Awk just
cannot know whether another output will subsequently get fed into
that pipe, as in

{ print | "cat -" }
END { print "END"
print "PS" | "cat -" }

so it has to wait until the program terminates unless explicitly
closed.

So for a correct sequencing of the output in case where _external_
commands are triggered a close() seems mandatory.

The insight is, IMO, that the piped command string is acting like
a _handle_ (associated with some file descriptor), and if there's
somewhere in the program another such handle it's addressing the
same channel; e.g. in this simple case

{ print | "sort -r" }
{ print | "sort -r" }

the printed output is processed by the identical "sort -r" command.
Whereas if your handle differs, as in

{ print | "sort -r" }
{ print | "sort -r" }

you'll get the output of two instances. - That's also one reason
why there should a variable be used instead of a literal string
if you refer to the identical command instance; it avoids typos
that could change the operational semantics.

Janis

--- Synchronet 3.21a-Linux NewsLink 1.2

From Luis Mendes@luisXXXlupe@gmail.com to comp.lang.awk on Thu Nov 6 23:02:09 2025

From Newsgroup: comp.lang.awk

On Thu, 6 Nov 2025 23:50:15 +0100, Janis Papanagnou wrote:

On 06.11.2025 18:17, Kenny McCormack wrote:

In article <690c5dc2$0$665$14726298@news.sunsite.dk>,
Luis Mendes <luisXXXlupe@gmail.com> wrote:

Hi all,

I've build a small gawk script that is intended to provide the user
three outputs, all at the END block.
This is running in Linux.
Preferable, I'd like some non-specific gawk awk.

The script is invoked as:
find . -name '*.yml' | xargs awk -f script.awk

Schematics of the script:
do some stuff for some lines of each file

I think Janis has the right idea here - which is that you have to close
the sort command before it will generate any output. In your script,
without an explicit close(), the sort gets closed automatically when
AWK exits, which results in the output coming out then (after
everything else has already been printed by the AWK script).

This is kind of the "first principle" of "sort" - the fact that it
cannot generate any output until it is sure that it has read all the
input. [...]

Just be aware that the effect can not only be seen with external
commands that wait for all input (like sort) but also for commands that
are able to immediately output its processed input; e.g. in

{ print | "cat -" }
END { print "END" }

if fed with, say, 'seq 10' you will first see the "END". Awk just cannot
know whether another output will subsequently get fed into that pipe, as
in

{ print | "cat -" }
END { print "END"
print "PS" | "cat -" }

so it has to wait until the program terminates unless explicitly closed.

So for a correct sequencing of the output in case where _external_
commands are triggered a close() seems mandatory.

The insight is, IMO, that the piped command string is acting like a
_handle_ (associated with some file descriptor), and if there's
somewhere in the program another such handle it's addressing the same channel; e.g. in this simple case

{ print | "sort -r" } { print | "sort -r" }

the printed output is processed by the identical "sort -r" command.
Whereas if your handle differs, as in

{ print | "sort -r" } { print | "sort -r" }

you'll get the output of two instances. - That's also one reason why
there should a variable be used instead of a literal string if you refer
to the identical command instance; it avoids typos that could change the operational semantics.

Janis

Thank you very much Janis and Kenny for your help.
Doing just the sort command without the pipe stuff and adding a close of
the sort command solved the issue.
Yes, it's better to use a variable instead of typing a string twice.

All the best,

Luís
--- Synchronet 3.21a-Linux NewsLink 1.2

From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 9 10:03:42 2025

From Newsgroup: comp.lang.awk

On 11/6/2025 2:35 AM, Luis Mendes wrote:

Hi all,

I've build a small gawk script that is intended to provide the user three outputs, all at the END block.
This is running in Linux.
Preferable, I'd like some non-specific gawk awk.

The script is invoked as:
find . -name '*.yml' | xargs awk -f script.awk

Schematics of the script:
do some stuff for some lines of each file

END {
do some stuff
print "title1"
for - some cycle
print "details"

print "title2"
for - another cycle
printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t - k2,2"

print "title3"
for - third cycle
print detail
}

As it is, it prints all from first for cycle, the title of the second, the third cycle and only aftwerwards the detail of the second cycle.
As I have searched and understand, the sort is done after all of the
printfs.

So, I thought of using file descriptors, to have the prints in order. Modified the printf line to:
printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t -k2,2 | cat>&10"

And adding below:
while ((getline < "/dev/fd/10") >0)
print $0

But get either, bad file descriptor or file descriptor not found.
What should be modified?

Another question, as some file descriptors are in use, how to find a file descriptor that is free to be used?

Thanks,

Luís Mendes

As has already been pointed out you can solve this specific problem by
just closing the pipeline to `sort` after the loop in which you use it,
but FYI a general approach to producing output sorted in various ways
that doesn't require you to spawn a subshell from awk to call sort
(thereby letting awk focus on what it does best, manipulate text, while
the shell does what it does best, sequence calls to tools) and also
solves this problem is the Decorate-Sort-Undecorate idiom (https://rosettacode.org/wiki/Decorate-sort-undecorate_idiom), e.g.
untested:

awk ' # Decorate
...
END {
do some stuff
OFS = "-"
sectNr = 0
lineNr = 0
print ++sectNr, ++lineNr, "title1"
for - some cycle
print sectNr, ++lineNr, "details"

print ++sectNr, 0, "title2"
for - another cycle
print sectNr, 0, sprintf("%12s,%6s", k_arr[1], somefunc(k_arr[2]))
k2,2"

lineNr = 0
print ++sectNr, ++lineNr, "title3"
for - third cycle
print sectNr, ++lineNr, "details"
}
' |
sort -t- -k1,1n -k2,2n -k4,4 | # Sort
cut -d- -f3- # Undecorate

The first awk command decorates the output by prefixing it with:

a) a section number for each section of output you apparently want so we
can initially sort on that to keep the sections in order, and
b) a line number within the first and third sections to keep those lines
in that order when sorted, and
c) the same line number, 0, for all lines within the middle section as
we want to sort that by the subsequent content, not by the original
order of the output lines in that section.

The sort command then sorts by the section number, the line numbers
(which only affect the first and third sections), then the content
(which only affects the middle section).

The cut command then removes the section and line numbers added by the
first awk.

That will work with any awk, sort, and cut (or you can replace cut with
a second awk if you prefer).

Ed.

--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online
Recent Visitors
- Ptb1970
  Sat Dec 13 17:34:42 2025
  from Wisconsin via Telnet
- Microbot
  Sat Dec 13 17:04:31 2025
  from Moore, Ok via Telnet
- John F Kennedy
  Fri Dec 12 21:48:00 2025
  from Crazyworldbbs.Com:2323 via Telnet
- Microbot
  Fri Dec 12 18:16:00 2025
  from Moore, Ok via Telnet

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,089
Nodes:	10 (0 / 10)
Uptime:	153:54:27
Calls:	13,921
Calls today:	2
Files:	187,021
D/L today:	3,760 files (944M bytes)
Messages:	2,457,163

Using file descriptors

Who's Online

Recent Visitors

System Info