• Wrong ideas about chatbots

    From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.misc on Sun Jun 8 00:39:21 2025
    From Newsgroup: comp.misc

    Ben Collver <bencollver@tilde.pink> wrote or quoted:
    | you'd start with an enormous quantity of text, then do a lot
    |of computationally-intense statistical analysis to map out which
    |words and phrases are most likely to appear near to one another.

    I had already explained why that was off, but let me give you
    all a recent example from a recent chat of mine with a chatbot.

    I asked the chatbot to write a program for left-associative parsing
    of English. He must have mixed up my "left-associative" with the
    more common "left-recursive" or figured I just said it wrong.

    He clearly did not know about this specific left-associative
    parsing method for natural languages. Even after I gave him
    the exact source, he still did not get it right.

    Then I laid it out for him [1]. After that, he wrote a program [2]
    for left-associative parsing of English. Here is what he produced:
    [3]. I also asked him for an explanation of the approach to laymen,
    so if you want to learn more about it, see [4].

    So tell me, how is real understanding like this supposed
    to happen if chatbots just work based on "a statistical
    analysis of which words often show up together"?

    [1] How I explained it to him, after his first program was
    not what I wanted

    Maybe you're mislead applying standard terms to the special
    NEWCAT approach.

    It starts by an empty object and then always appends the next
    word until the end of text is reached. No big recursion there.

    We get: empty + "the"

    Now, we need a grammar rule to see if that combination is
    possible. For this purpose each of "empty" and "the" have
    a category which is a data structure with their attributes,
    and the grammar rule than checks whether two things with these
    categories can be combined and if so, it creates the new
    sentence start "empty + 'the'" with a new category given by
    the rule.

    Then we try to add "cat" to the sentence start. So the sentence
    is build left-to-right, that's what's "left-associative" about it.

    Finally, we add the "." after "fish" and then we may have
    a complete sentence if everything was allowed by the rules.

    This parsing is trivial. The crucial thing is to write the
    categories and the rules so that all continuations of a
    sentence start that are legal in English are allowed by the
    rules and all others are rejected.

    [2] The parser for a tiny subset of English he wrote then,
    according to my above explanation

    import builtins
    import sys
    import time

    # Toy lexicon
    LEXICON = {
    'the': ('DET', {}),
    'cat': ('N', {'number': 'sg'}),
    'cats': ('N', {'number': 'pl'}),
    'dog': ('N', {'number': 'sg'}),
    'dogs': ('N', {'number': 'pl'}),
    'and': ('CONJ', {}),
    'eats': ('V', {'number': 'sg'}),
    'eat': ('V', {'number': 'pl'}),
    }

    def get_category(word):
    if word not in LEXICON:
    raise ValueError(f"Unknown word: {word}")
    return LEXICON[word]

    def combine(state, next_token):
    cat, features = next_token
    print(f" [combine] State: {state}, Next: ({cat}, {features})")
    # If state is None, we're at the start
    if state is None:
    if cat == 'DET':
    print(" [combine] Start DET")
    return ('subj', {'number': None, 'is_complete': False, 'needs_conj': False, 'pending_det': True})
    elif cat == 'N':
    print(" [combine] Start bare N (fail)")
    return None
    else:
    print(" [combine] Start fail")
    return None
    # If we're building a subject
    if state[0] == 'subj':
    subj = state[1]
    # --- Coordination context: CONJ + DET + N ---
    if subj.get('needs_conj') and subj.get('pending_det') and cat == 'N':
    print(" [combine] CONJ + DET + N -> coordinated NP (plural)")
    return ('subj', {'number': 'pl', 'is_complete': True, 'needs_conj': False, 'pending_det': False})
    # --- Plain DET + N (not in coordination) ---
    if subj.get('pending_det') and cat == 'N':
    print(" [combine] DET + N -> NP")
    return ('subj', {'number': features['number'], 'is_complete': True, 'needs_conj': False, 'pending_det': False})
    # --- NP + CONJ ---
    if subj.get('is_complete') and cat == 'CONJ':
    print(" [combine] NP + CONJ")
    return ('subj', {'number': subj['number'], 'is_complete': False, 'needs_conj': True, 'pending_det': False})
    # --- CONJ + DET ---
    if subj.get('needs_conj') and cat == 'DET':
    print(" [combine] CONJ + DET")
    return ('subj', {'number': None, 'is_complete': False, 'needs_conj': True, 'pending_det': True, 'prev_number': subj['number']})
    # --- NP + V (agreement check) ---
    if subj.get('is_complete') and cat == 'V':
    print(f" [combine] NP + V: subj number {subj['number']}, verb number {features['number']}")
    if subj['number'] == features['number']:
    print(" [combine] Agreement OK")
    return ('sent', {'number': subj['number']})
    else:
    print(" [combine] Agreement FAIL")
    return None
    print(" [combine] No rule applies (fail)")
    return None
    # If we already have a sentence, nothing more allowed
    if state[0] == 'sent':
    print(" [combine] Already sent (fail)")
    return None
    print(" [combine] No rule applies (fail)")
    return None

    def parse(sentence):
    words = sentence.lower().split()
    state = None
    print(f"\nParsing: {sentence}")
    for word in words:
    print(f" State: {state}, Next: {word}")
    cat = get_category(word)
    state = combine(state, cat)
    if state is None:
    print(f" Failed at: {word}")
    return False
    print(f" Final state: {state}")
    return state is not None and state[0] == 'sent'

    # Test sentences
    sentences = [
    "the cat eats", # ✔️ singular
    "the cats eat", # ✔️ plural
    "the cat and the dog eat", # ✔️ coordination (plural)
    "the cat and the dogs eat", # ✔️ coordination (plural)
    "the cat and the dog eats", # ❌ subject-verb disagreement
    "the cats eats", # ❌ subject-verb disagreement
    "the cat and eat", # ❌ incomplete NP
    "cat eats", # ❌ missing DET
    "the cat and the", # ❌ incomplete NP
    ]

    for s in sentences:
    result = parse(s)
    print(f"Result: {'ACCEPT' if result else 'REJECT'}")

    [3] the abridged output of the parser

    Parsing: the cat eats
    Result: ACCEPT

    Parsing: the cats eat
    Result: ACCEPT

    Parsing: the cat and the dog eat
    Result: ACCEPT

    Parsing: the cat and the dogs eat
    Result: ACCEPT

    Parsing: the cat and the dog eats
    Result: REJECT

    Parsing: the cats eats
    Result: REJECT

    Parsing: the cat and eat
    Result: REJECT

    Parsing: cat eats
    Result: REJECT

    Parsing: the cat and the
    Result: REJECT

    [4] the chatbot explains it:

    What is Left-Associative Parsing?

    Left-associative parsing is a way of analyzing a sentence by
    reading it from left to right, always combining the information
    you've built up so far (the "sentence start" or "current state")
    with the next word.

    - At every step, you only ever look at the current state and
    the next word - never more than that.

    - You never look ahead or backtrack; you just keep moving
    forward.

    - This is sometimes called "incremental" or "shift-reduce"
    parsing, but here we mean: Always combine the current
    state with the next word, and update the state.

    Why is this interesting?

    - It's simple, fast, and models how humans often process
    language in real time.

    - It forces you to encode all the information you'll need for
    future steps in the current state, because you never get to
    "look back".

    What are Complex Categories?

    In traditional grammar, you might label things as "Noun Phrase
    (NP)", "Verb Phrase (VP)", etc.

    But in left-associative parsing, the current state must
    carry all the information you'll need for the rest of the
    parse.

    So, you use complex categories - data structures that store
    not just the grammatical type, but also features like number
    (singular/plural), whether you're in the middle of a coordination
    ("and"), whether you're expecting a noun, and so on.

    In the script:

    - The state is a tuple like ('subj', {...}) or
    ('sent', {...}).

    - The dictionary inside holds all the features you need:
    "number", "is_complete", "needs_conj", "pending_det", etc.


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.misc on Sun Jun 8 12:51:07 2025
    From Newsgroup: comp.misc

    ram@zedat.fu-berlin.de (Stefan Ram) wrote or quoted:
    Parsing: the cat eats
    Result: ACCEPT

    I asked the chatbot to modify the program to also output the
    meaning in both a Python-like notation and with an image.

    Input: the dog eats
    Result: ACCEPT
    Meaning: eats(one(dog))
    /^ ^\
    / 0 0 \
    / Y \
    ~~~~~
    V\ - /V
    / \
    | |
    (__V__)

    ----------------------------------------
    Input: the dogs eat
    Result: ACCEPT
    Meaning: eat(morethanone(dog))
    /^ ^\ /^ ^\
    / 0 0 \ / 0 0 \
    / Y \ / Y \
    ~~~~~ ~~~~~
    V\ - /VV\ - /V
    / \ / \
    | | | |
    (__V__) (__V__)

    ----------------------------------------
    Input: the cat eats
    Result: ACCEPT
    Meaning: eats(one(cat))
    /\_/\
    ( o.o )
    ^ <
    ~~~~~

    ----------------------------------------
    Input: the cats eat
    Result: ACCEPT
    Meaning: eat(morethanone(cat))
    /\_/\ /\_/\
    ( o.o ) ( o.o )
    ^ < > ^ <
    ~~~~~ ~~~~~

    ----------------------------------------
    Input: the cat and the dog eat
    Result: ACCEPT
    Meaning: eat(and_(one(cat), one(dog)))
    /\_/\
    ( o.o )
    ^ <
    ~~~~~

    /^ ^\
    / 0 0 \
    / Y \
    ~~~~~
    V\ - /V
    / \
    | |
    (__V__)

    ----------------------------------------
    Input: the cat and the dogs eat
    Result: ACCEPT
    Meaning: eat(and_(one(cat), morethanone(dog)))
    /\_/\
    ( o.o )
    ^ <
    ~~~~~

    /^ ^\ /^ ^\
    / 0 0 \ / 0 0 \
    / Y \ / Y \
    ~~~~~ ~~~~~
    V\ - /VV\ - /V
    / \ / \
    | | | |
    (__V__) (__V__)

    ----------------------------------------
    Input: the dog and eat
    Result: REJECT
    ----------------------------------------
    Input: cat eats
    Result: REJECT
    ----------------------------------------
    Input: the cats eats
    Result: REJECT
    ----------------------------------------
    Input: the dog and the cat eats
    Result: REJECT
    ----------------------------------------


    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Ben Collver@bencollver@tilde.pink to comp.misc on Sun Jun 8 14:27:49 2025
    From Newsgroup: comp.misc

    On 2025-06-08, Stefan Ram <ram@zedat.fu-berlin.de> wrote:
    Ben Collver <bencollver@tilde.pink> wrote or quoted:
    ^^^^^^

    Just to be clear, i quoted it. I am not the author of this blog.

    | you'd start with an enormous quantity of text, then do a lot
    |of computationally-intense statistical analysis to map out which
    |words and phrases are most likely to appear near to one another.

    You are fixating on the technical and ignoring the social. From the
    original article:

    [AI] turns social relations into number-crunching operations...

    Meredith Whittaker, president of the Signal Foundation, has
    described AI as being fundamentally "surveillance technology".

    AI systems have found their best product-market fit in police and
    military applications, where short-circuiting people's critical
    thinking and decision-making processes is incredibly useful...
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.misc on Sun Jun 8 14:59:40 2025
    From Newsgroup: comp.misc

    Ben Collver <bencollver@tilde.pink> wrote or quoted:
    You are fixating on the technical and ignoring the social. From the
    original article:
    [AI] turns social relations into number-crunching operations...
    Meredith Whittaker, president of the Signal Foundation, has
    described AI as being fundamentally "surveillance technology".
    AI systems have found their best product-market fit in police and
    military applications, where short-circuiting people's critical
    thinking and decision-making processes is incredibly useful...

    It's true that for some folks, AI kind of takes the place of
    real social interaction, and it can be used for things like
    surveillance, law enforcement, or the military.

    It's a good thing when people spot potential risks and speak
    up about them.

    But people really shouldn't be acting as if every social
    interaction we have is now run by AI, or like those police
    and military uses are all AI is good for.

    You could say the same thing about all kinds of tech and scientific
    progress. Take psychology, for example. There are techniques that
    let you figure out someone's political leanings just from subtle word
    choices. Any kind of scientific or technical breakthrough can get
    twisted by bad actors, like dictators, to spy on their own people,
    mess with them, or even go after other countries.


    --- Synchronet 3.21a-Linux NewsLink 1.2