• Re: Reverse Polish Notation Parser

    From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Tue Nov 7 14:02:27 2023
    From Newsgroup: comp.lang.awk

    On 07.11.2023 12:48, Ed Morton wrote:
    On 11/6/2023 5:14 AM, Janis Papanagnou wrote:

    switch (some_variable) [
    case 42: ...
    case "string": ...
    case /pattern/: ...

    Nitpick - it's

    And I expected the nitpick on my typo using '=' (instead of '==') in
    the 'if' comparison. ;-)


    case /regexp/:

    rather than:

    case /pattern/

    The word "pattern" is ambiguous and misused all over awk documentation.

    You are right that the historic (and, sadly, surviving) documentation
    speaks about 'pattern' and '/regexp'/'.

    You can make some argument for it in:

    pattern { action }

    since that includes `BEGIN`, integers, etc. I'd argue that should be:

    condition { action }

    Decades ago I was the first one suggesting the term 'condition' here so
    you certainly don't need to teach me. (Myself I was using that term in
    my Awk courses even since the 1990's.)


    but in the "case" statement above what goes inside `/.../` is simply and always a regexp.

    Yes. Sorry for my sloppiness here. Thanks for the nitpick.


    [...]


    One point you may want to consider is the trim() function; the
    two substitutions can be combined in one

    sub (/^[[:space:]]+(.*)[[:space:]]+$/, "&", str)

    (but test that in your Awk versions before using it; "&" is an
    old feature but off the top of my head I'm not sure whether the
    subexpression with parenthesis /...(...).../ is generally
    supported in other Awks).

    You can write that, but it's not a capture group that can be
    backreferenced from the replacement and if it was "&" wouldn't refer to
    the string that matched ".*" anyway, it'd refer to the string that
    matched the whole regexp.

    Using my Awk it does what advertised. I merely didn't find it clearly documented whether the '&' is generally guaranteed to refer to the
    grouping.


    You could use a capture group in GNU awk for gensub():

    I deliberately abstained from gensub() here since the OP avoids GNU
    Awk specifics.


    str = gensub (/^[[:space:]]+(.*)[[:space:]]+$/, "\\1", 1, str)

    and in most awks you could do:

    gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)

    Yes, this is a sensible alternative.


    but there are some out there that will fail to do both substitutions for
    that case (tawk and nawk maybe?) and so you need:

    Really? But why? - Alternatives in regexp is certainly an old feature
    (at least since nawk in the 1980's).


    sub(/^[[:space:]]+/, "", str)
    sub(/[[:space:]]+$/, "", str)
    [...]

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Janis Papanagnou@janis_papanagnou+ng@hotmail.com to comp.lang.awk on Tue Nov 7 14:37:40 2023
    From Newsgroup: comp.lang.awk

    On 07.11.2023 14:02, Janis Papanagnou wrote:

    One point you may want to consider is the trim() function; the
    two substitutions can be combined in one

    sub (/^[[:space:]]+(.*)[[:space:]]+$/, "&", str)

    (but test that in your Awk versions before using it; "&" is an
    old feature but off the top of my head I'm not sure whether the
    subexpression with parenthesis /...(...).../ is generally
    supported in other Awks).

    You can write that, but it's not a capture group that can be
    backreferenced from the replacement and if it was "&" wouldn't refer to
    the string that matched ".*" anyway, it'd refer to the string that
    matched the whole regexp.

    Using my Awk it does what advertised. I merely didn't find it clearly documented whether the '&' is generally guaranteed to refer to the
    grouping.

    I stand corrected, I had a wrong test case! - "&" does _not_ work
    on grouping, and as you said (and as it is specified), it generally
    defines the whole match. (It wouldn't make sense otherwise.) - Thanks!

    Janis

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Kaz Kylheku@864-117-4973@kylheku.com to comp.lang.awk on Tue Nov 7 17:24:38 2023
    From Newsgroup: comp.lang.awk

    On 2023-11-06, Mike Sanders <porkchop@invalid.foo> wrote:
    Ping Janis: Question... Are you interested in rewriting this
    as a Gawk only implementation? Would be great for switch/case
    statements IMO. If so, I'll add your version to the file at
    my website.

    The cppawk preprocessor supports a case macro which
    compiles to the switch statement for Gawk, or to portable
    Awk code.

    The macro is documented in its own man page:

    https://www.kylheku.com/cgit/cppawk/tree/cppawk-case.1

    case is safer than switch because it doesn't have implicit
    "fallthrough".

    Each case must end with one of: cbreak, cfall or cret: break,
    fallthrough or return.
    --
    TXR Programming Language: http://nongnu.org/txr
    Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
    Mastodon: @Kazinator@mstdn.ca
    NOTE: If you use Google Groups, I don't see you, unless you're whitelisted.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Tue Nov 7 18:46:27 2023
    From Newsgroup: comp.lang.awk

    Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

    [*] https://www.gnu.org/software/gawk/manual/gawk.html#Switch-Statement

    Will read/study (as always thanks Janis!)
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From porkchop@porkchop@invalid.foo (Mike Sanders) to comp.lang.awk on Tue Nov 7 18:50:51 2023
    From Newsgroup: comp.lang.awk

    Kaz Kylheku <864-117-4973@kylheku.com> wrote:

    The cppawk preprocessor supports a case macro which
    compiles to the switch statement for Gawk, or to portable
    Awk code.

    The macro is documented in its own man page:

    https://www.kylheku.com/cgit/cppawk/tree/cppawk-case.1

    case is safer than switch because it doesn't have implicit
    "fallthrough".

    Each case must end with one of: cbreak, cfall or cret: break,
    fallthrough or return.

    Hey-hey Kaz.

    That's really nifty in fact. I might try my hand at a cppawk
    project just to familiarizes myself with its workings.

    Thanks & mucho appreciate the head's up kind sir, must study
    more about it all.
    --
    :wq
    Mike Sanders

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Wed Nov 8 04:22:28 2023
    From Newsgroup: comp.lang.awk

    On 11/7/2023 7:02 AM, Janis Papanagnou wrote:
    On 07.11.2023 12:48, Ed Morton wrote:
    <snip>
    and in most awks you could do:

    gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)

    Yes, this is a sensible alternative.


    but there are some out there that will fail to do both substitutions for
    that case (tawk and nawk maybe?) and so you need:

    Really? But why? - Alternatives in regexp is certainly an old feature
    (at least since nawk in the 1980's).

    In my opinion it's just a bug. It was demonstrated to me when I posted
    an answer on Stack Overflow several years ago that I can't find right
    now. I know it's not gawk and I'm about 99% sure it's neither BSD awk
    nor /usr/xpg[46]/bin/awk so the only non-oawk awks I can imagine would
    have this problem are nawk, tawk (I'm about 80% sure I remember tawk is
    one that DOES have the problem), and/or busybox awk, none of which I
    have access to, so if anyone has and could test them by running:

    $ echo ' foo ' | awk '{gsub(/^ +| +$/,""); print "<" $0 ">"}'
    <foo>

    and let us know which don't produce that output, that'd be great.

    Ed.

    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Ed Morton@mortonspam@gmail.com to comp.lang.awk on Wed Nov 8 05:41:01 2023
    From Newsgroup: comp.lang.awk

    On 11/8/2023 4:22 AM, Ed Morton wrote:
    On 11/7/2023 7:02 AM, Janis Papanagnou wrote:
    On 07.11.2023 12:48, Ed Morton wrote:
    <snip>
    and in most awks you could do:

        gsub (/^[[:space:]]+|[[:space:]]+$/, "", str)

    Yes, this is a sensible alternative.


    but there are some out there that will fail to do both substitutions for >>> that case (tawk and nawk maybe?) and so you need:

    Really? But why? - Alternatives in regexp is certainly an old feature
    (at least since nawk in the 1980's).

    In my opinion it's just a bug. It was demonstrated to me when I posted
    an answer on Stack Overflow several years ago that I can't find right
    now. I know it's not gawk and I'm about 99% sure it's neither BSD awk
    nor /usr/xpg[46]/bin/awk so the only non-oawk awks I can imagine would
    have this problem are nawk, tawk (I'm about 80% sure I remember tawk is
    one that DOES have the problem), and/or busybox awk, none of which I
    have access to, so if anyone has and could test them by running:

    $ echo ' foo ' | awk '{gsub(/^ +| +$/,""); print "<" $0 ">"}'    <foo>

    and let us know which don't produce that output, that'd be great.

        Ed.


    I found a different answer I had posted (also years ago) that mentions
    this bug and there I say it's tawk and mawk1 where it occurs. Still
    can't find the original where I was told about it (and it was in a
    discussion in comments that's probably been removed by now) but that
    should be good enough for others to reproduce it.

    The specific case was removing quotes from around a field in a CSV:

    $ printf '"foo"' | awk '{gsub(/^"+|"+$/,""); print "<" $0 ">"}'
    <foo>

    but I doubt if that detail matters.

    Regards,

    Ed.
    --- Synchronet 3.20a-Linux NewsLink 1.114