• Using file descriptors

    From Luis Mendes@luisXXXlupe@gmail.com to comp.lang.awk on Thu Nov 6 08:35:14 2025
    From Newsgroup: comp.lang.awk

    Hi all,

    I've build a small gawk script that is intended to provide the user three outputs, all at the END block.
    This is running in Linux.
    Preferable, I'd like some non-specific gawk awk.

    The script is invoked as:
    find . -name '*.yml' | xargs awk -f script.awk

    Schematics of the script:
    do some stuff for some lines of each file

    END {
    do some stuff
    print "title1"
    for - some cycle
    print "details"

    print "title2"
    for - another cycle
    printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t -
    k2,2"

    print "title3"
    for - third cycle
    print detail
    }

    As it is, it prints all from first for cycle, the title of the second, the third cycle and only aftwerwards the detail of the second cycle.
    As I have searched and understand, the sort is done after all of the
    printfs.

    So, I thought of using file descriptors, to have the prints in order.
    Modified the printf line to:
    printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t -k2,2 |
    &10"

    And adding below:
    while ((getline < "/dev/fd/10") >0)
    print $0


    But get either, bad file descriptor or file descriptor not found.
    What should be modified?

    Another question, as some file descriptors are in use, how to find a file descriptor that is free to be used?

    Thanks,


    Luís Mendes
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Thu Nov 6 10:00:07 2025
    From Newsgroup: comp.lang.awk

    On 06.11.2025 09:35, Luis Mendes wrote:
    Hi all,

    I've build a small gawk script that is intended to provide the user three outputs, all at the END block.
    This is running in Linux.
    Preferable, I'd like some non-specific gawk awk.

    The script is invoked as:
    find . -name '*.yml' | xargs awk -f script.awk

    Schematics of the script:
    do some stuff for some lines of each file

    END {
    do some stuff
    print "title1"
    for - some cycle
    print "details"

    print "title2"
    for - another cycle
    printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t -
    k2,2"

    print "title3"
    for - third cycle
    print detail
    }

    As it is, it prints all from first for cycle, the title of the second, the third cycle and only aftwerwards the detail of the second cycle.
    As I have searched and understand, the sort is done after all of the printfs.

    So, I thought of using file descriptors, to have the prints in order. Modified the printf line to:
    printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t -k2,2 | cat>&10"

    And adding below:
    while ((getline < "/dev/fd/10") >0)
    print $0


    But get either, bad file descriptor or file descriptor not found.
    What should be modified?

    Another question, as some file descriptors are in use, how to find a file descriptor that is free to be used?

    This is not literally answering your question, but I think a better alternative...

    After the "sort" command, before the "title 3", close that output
    channel for the sort pipe...
    (I'm using a variable for the "sort" command to simplify that.)

    BEGIN { cmd = "sort -t -k2,2" ; ... }
    ...
    END { ...
    for - another cycle
    printf "%12s,%6s\n", k_arr[1], somefunc(k_arr[2]) | cmd

    close (cmd) # <<<<<<<<

    print "title3"
    ...
    }

    (I think this is standard Awk, not a GNU Awk feature. - CMIIW.)

    HTH.

    Janis


    Thanks,


    Luís Mendes


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From gazelle@gazelle@shell.xmission.com (Kenny McCormack) to comp.lang.awk on Thu Nov 6 17:17:49 2025
    From Newsgroup: comp.lang.awk

    In article <690c5dc2$0$665$14726298@news.sunsite.dk>,
    Luis Mendes <luisXXXlupe@gmail.com> wrote:
    Hi all,

    I've build a small gawk script that is intended to provide the user three >outputs, all at the END block.
    This is running in Linux.
    Preferable, I'd like some non-specific gawk awk.

    The script is invoked as:
    find . -name '*.yml' | xargs awk -f script.awk

    Schematics of the script:
    do some stuff for some lines of each file

    I think Janis has the right idea here - which is that you have to close the sort command before it will generate any output. In your script, without
    an explicit close(), the sort gets closed automatically when AWK exits,
    which results in the output coming out then (after everything else has
    already been printed by the AWK script).

    This is kind of the "first principle" of "sort" - the fact that it cannot generate any output until it is sure that it has read all the input. Note
    that in GAWK, "sort" is the "poster child" for why there is an optional
    second arg to close() - so that you can run "sort" as a co-process and
    close the input side - allowing "sort" to now generate output - without
    closing the output side. This allows the GAWK program to read back the
    output from "sort".
    --
    "Every time Mitt opens his mouth, a swing state gets its wings."

    (Should be on a bumper sticker)
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Thu Nov 6 23:50:15 2025
    From Newsgroup: comp.lang.awk

    On 06.11.2025 18:17, Kenny McCormack wrote:
    In article <690c5dc2$0$665$14726298@news.sunsite.dk>,
    Luis Mendes <luisXXXlupe@gmail.com> wrote:
    Hi all,

    I've build a small gawk script that is intended to provide the user three >> outputs, all at the END block.
    This is running in Linux.
    Preferable, I'd like some non-specific gawk awk.

    The script is invoked as:
    find . -name '*.yml' | xargs awk -f script.awk

    Schematics of the script:
    do some stuff for some lines of each file

    I think Janis has the right idea here - which is that you have to close the sort command before it will generate any output. In your script, without
    an explicit close(), the sort gets closed automatically when AWK exits,
    which results in the output coming out then (after everything else has already been printed by the AWK script).

    This is kind of the "first principle" of "sort" - the fact that it cannot generate any output until it is sure that it has read all the input. [...]

    Just be aware that the effect can not only be seen with external
    commands that wait for all input (like sort) but also for commands
    that are able to immediately output its processed input; e.g. in

    { print | "cat -" }
    END { print "END" }

    if fed with, say, 'seq 10' you will first see the "END". Awk just
    cannot know whether another output will subsequently get fed into
    that pipe, as in

    { print | "cat -" }
    END { print "END"
    print "PS" | "cat -" }

    so it has to wait until the program terminates unless explicitly
    closed.

    So for a correct sequencing of the output in case where _external_
    commands are triggered a close() seems mandatory.

    The insight is, IMO, that the piped command string is acting like
    a _handle_ (associated with some file descriptor), and if there's
    somewhere in the program another such handle it's addressing the
    same channel; e.g. in this simple case

    { print | "sort -r" }
    { print | "sort -r" }

    the printed output is processed by the identical "sort -r" command.
    Whereas if your handle differs, as in

    { print | "sort -r" }
    { print | "sort -r" }

    you'll get the output of two instances. - That's also one reason
    why there should a variable be used instead of a literal string
    if you refer to the identical command instance; it avoids typos
    that could change the operational semantics.

    Janis

    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Luis Mendes@luisXXXlupe@gmail.com to comp.lang.awk on Thu Nov 6 23:02:09 2025
    From Newsgroup: comp.lang.awk

    On Thu, 6 Nov 2025 23:50:15 +0100, Janis Papanagnou wrote:

    On 06.11.2025 18:17, Kenny McCormack wrote:
    In article <690c5dc2$0$665$14726298@news.sunsite.dk>,
    Luis Mendes <luisXXXlupe@gmail.com> wrote:
    Hi all,

    I've build a small gawk script that is intended to provide the user
    three outputs, all at the END block.
    This is running in Linux.
    Preferable, I'd like some non-specific gawk awk.

    The script is invoked as:
    find . -name '*.yml' | xargs awk -f script.awk

    Schematics of the script:
    do some stuff for some lines of each file

    I think Janis has the right idea here - which is that you have to close
    the sort command before it will generate any output. In your script,
    without an explicit close(), the sort gets closed automatically when
    AWK exits, which results in the output coming out then (after
    everything else has already been printed by the AWK script).

    This is kind of the "first principle" of "sort" - the fact that it
    cannot generate any output until it is sure that it has read all the
    input. [...]

    Just be aware that the effect can not only be seen with external
    commands that wait for all input (like sort) but also for commands that
    are able to immediately output its processed input; e.g. in

    { print | "cat -" }
    END { print "END" }

    if fed with, say, 'seq 10' you will first see the "END". Awk just cannot
    know whether another output will subsequently get fed into that pipe, as
    in

    { print | "cat -" }
    END { print "END"
    print "PS" | "cat -" }

    so it has to wait until the program terminates unless explicitly closed.

    So for a correct sequencing of the output in case where _external_
    commands are triggered a close() seems mandatory.

    The insight is, IMO, that the piped command string is acting like a
    _handle_ (associated with some file descriptor), and if there's
    somewhere in the program another such handle it's addressing the same channel; e.g. in this simple case

    { print | "sort -r" } { print | "sort -r" }


    the printed output is processed by the identical "sort -r" command.
    Whereas if your handle differs, as in

    { print | "sort -r" } { print | "sort -r" }

    you'll get the output of two instances. - That's also one reason why
    there should a variable be used instead of a literal string if you refer
    to the identical command instance; it avoids typos that could change the operational semantics.

    Janis

    Thank you very much Janis and Kenny for your help.
    Doing just the sort command without the pipe stuff and adding a close of
    the sort command solved the issue.
    Yes, it's better to use a variable instead of typing a string twice.

    All the best,

    Luís
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Sun Nov 9 10:03:42 2025
    From Newsgroup: comp.lang.awk

    On 11/6/2025 2:35 AM, Luis Mendes wrote:
    Hi all,

    I've build a small gawk script that is intended to provide the user three outputs, all at the END block.
    This is running in Linux.
    Preferable, I'd like some non-specific gawk awk.

    The script is invoked as:
    find . -name '*.yml' | xargs awk -f script.awk

    Schematics of the script:
    do some stuff for some lines of each file

    END {
    do some stuff
    print "title1"
    for - some cycle
    print "details"

    print "title2"
    for - another cycle
    printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t - k2,2"

    print "title3"
    for - third cycle
    print detail
    }

    As it is, it prints all from first for cycle, the title of the second, the third cycle and only aftwerwards the detail of the second cycle.
    As I have searched and understand, the sort is done after all of the
    printfs.

    So, I thought of using file descriptors, to have the prints in order. Modified the printf line to:
    printf("%12s,%6s\n", k_arr[1], somefunc(k_arr[2])) | "sort -t -k2,2 | cat>&10"

    And adding below:
    while ((getline < "/dev/fd/10") >0)
    print $0


    But get either, bad file descriptor or file descriptor not found.
    What should be modified?

    Another question, as some file descriptors are in use, how to find a file descriptor that is free to be used?

    Thanks,


    Luís Mendes

    As has already been pointed out you can solve this specific problem by
    just closing the pipeline to `sort` after the loop in which you use it,
    but FYI a general approach to producing output sorted in various ways
    that doesn't require you to spawn a subshell from awk to call sort
    (thereby letting awk focus on what it does best, manipulate text, while
    the shell does what it does best, sequence calls to tools) and also
    solves this problem is the Decorate-Sort-Undecorate idiom (https://rosettacode.org/wiki/Decorate-sort-undecorate_idiom), e.g.
    untested:

    awk ' # Decorate
    ...
    END {
    do some stuff
    OFS = "-"
    sectNr = 0
    lineNr = 0
    print ++sectNr, ++lineNr, "title1"
    for - some cycle
    print sectNr, ++lineNr, "details"

    print ++sectNr, 0, "title2"
    for - another cycle
    print sectNr, 0, sprintf("%12s,%6s", k_arr[1], somefunc(k_arr[2]))
    k2,2"

    lineNr = 0
    print ++sectNr, ++lineNr, "title3"
    for - third cycle
    print sectNr, ++lineNr, "details"
    }
    ' |
    sort -t- -k1,1n -k2,2n -k4,4 | # Sort
    cut -d- -f3- # Undecorate

    The first awk command decorates the output by prefixing it with:

    a) a section number for each section of output you apparently want so we
    can initially sort on that to keep the sections in order, and
    b) a line number within the first and third sections to keep those lines
    in that order when sorted, and
    c) the same line number, 0, for all lines within the middle section as
    we want to sort that by the subsequent content, not by the original
    order of the output lines in that section.

    The sort command then sorts by the section number, the line numbers
    (which only affect the first and third sections), then the content
    (which only affects the middle section).

    The cut command then removes the section and line numbers added by the
    first awk.

    That will work with any awk, sort, and cut (or you can replace cut with
    a second awk if you prefer).

    Ed.



    --- Synchronet 3.21a-Linux NewsLink 1.2