Forum: War Ensemble BBS

POINT OF VIEW OF AN ALGORITHM (Re: Algorithm introduced in Hogwild!SGD (Niu et al., 2011)) (Re: parallel random-access machine)

From Mild Shock@janburse@fastmail.fm to sci.physics.relativity,sci.math,comp.lang.prolog on Mon Dec 1 23:12:14 2025

From Newsgroup: comp.lang.prolog

Hi,

I am not saying anything. Thats the definition of PRAM.
Whats wrong with you, are you a 5 year old moron.
I am only citing a theoretical computer science model:

- Concurrent read concurrent write (CRCW)—multiple
processors can read and write. A CRCW PRAM is sometimes
called a concurrent random-access machine. https://en.wikipedia.org/wiki/Parallel_RAM

Technically with multi-channel memory nowadays, it
doesn't need locks on the hardware level, only tiny
serialization, could even happen outside of the CPU.

So if you drop some barrier requirements, you could
really have the chaos of a PRAM, for worse or
for better. I think you need to accept that,

even if its to big to fit in your tiny squirrel brain.

Bye

P.S.: "effectively CREW, since only one write per address at
a time", it will just block the other cores? Short answer:
Yes — if two cores try to write the same address, one

of them is forced to stall (block) until the other completes.
In real hardware, the effect can mimic CRCW behavior over
a short time window, even though it’s not truly simultaneous.

this blocking usually happens in the cache-coherence
system, not at DRAM. Modern CPUs use MESI/MOESI. It happens
over a small interval [t₁, t₂] dictated by cache coherence.

From the POINT OF VIEW OF AN ALGORITHM, it’s “CRCW enough.”

Bosephis Otlesnov schrieb:

Mild Shock wrote:

What are you, a 5 year old moron?

There are millions of algorithm that use volatile variables. Just look
at the Java code base.

But I was not refering to multi-threading, I was refering to PRAM for
matrix operations.

i thought you said you wanna read and write parallel to RAM, aka PRAM, let
me see.. zum zum zum, yeah, you said that. Take a lock at timing
requirements for a read/write cycle, deadlines etc, shared memory or not, fucking idiot.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Mild Shock@janburse@fastmail.fm to sci.physics.relativity,sci.math,comp.lang.prolog on Mon Dec 1 23:37:23 2025

From Newsgroup: comp.lang.prolog

Hi,

Come on squirrel brain, that we practically have
PRAM on multi-core CPUs, is an old hat. ARM kept
up with MESI/MOESI in 2011:

https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/CacheCoherencyWhitepaper_6June2011.pdf

What are you squirrel brain, some russion developer
controlling a drone from within EMACS ? Meanwhile
ARM and Intel and Snapdragon etc.. have developed

much more marvels than only this simple PRAM.
The excitement on the side of ARM is quite big,
that they got into the boat of OpenAI:

OpenAI co-founder on new deal with AMD https://www.youtube.com/watch?v=WuXCNpbO9hI

Bye

P.S.: Because of contention, you should of course
only use volatile variables carefully. It might
not scale well to 1000 cores.

There are also algorithms around to lift the
pressure when there is a large amount of cores.
Even Doug Lea has already put a few utilities in

java.concurrent.* for certain problems with large
number of cores, kind of easter eggs in java.concurrent.*.
But I am not sure whether Doug Lea is involved in

additions for AI accelerators. But he is in the
Program Committee of:

Parallel programming for emerging hardware, including
AI accelerators, processor-in-memory, programmable logic,
non-volatile memory technologies, and quantum computers https://ppopp26.sigplan.org/track/PPoPP-2026-papers

It could be that the data flow compiler, things sketched
by OpenXLA already work well enough.

Mild Shock schrieb:

Hi,

I am not saying anything. Thats the definition of PRAM.
Whats wrong with you, are you a 5 year old moron.
I am only citing a theoretical computer science model:

- Concurrent read concurrent write (CRCW)—multiple
processors can read and write. A CRCW PRAM is sometimes
called a concurrent random-access machine. https://en.wikipedia.org/wiki/Parallel_RAM

Technically with multi-channel memory nowadays, it
doesn't need locks on the hardware level, only tiny
serialization, could even happen outside of the CPU.

So if you drop some barrier requirements, you could
really have the chaos of a PRAM, for worse or
for better. I think you need to accept that,

even if its to big to fit in your tiny squirrel brain.

Bye

P.S.: "effectively CREW, since only one write per address at
a time", it will just block the other cores? Short answer:
Yes — if two cores try to write the same address, one

of them is forced to stall (block) until the other completes.
In real hardware, the effect can mimic CRCW behavior over
a short time window, even though it’s not truly simultaneous.

this blocking usually happens in the cache-coherence
system, not at DRAM. Modern CPUs use MESI/MOESI. It happens
over a small interval [t₁, t₂] dictated by cache coherence.

From the POINT OF VIEW OF AN ALGORITHM, it’s “CRCW enough.”

Bosephis Otlesnov schrieb:

Mild Shock wrote:

What are you, a 5 year old moron?

There are millions of algorithm that use volatile variables. Just look
at the Java code base.

But I was not refering to multi-threading, I was refering to PRAM for
matrix operations.

i thought you said you wanna read and write parallel to RAM, aka PRAM,
let
me see.. zum zum zum, yeah, you said that. Take a lock at timing
requirements for a read/write cycle, deadlines etc, shared memory or not,
fucking idiot.

--- Synchronet 3.21a-Linux NewsLink 1.2

From Mild Shock@janburse@fastmail.fm to sci.physics.relativity,sci.math,comp.lang.prolog on Mon Dec 1 23:53:21 2025

From Newsgroup: comp.lang.prolog

Hi,

Looking at how they phrase it:

"symposium focuses on improving the programming
productivity and performance engineering of all
concurrent and parallel systems—multicore, multi-
threaded, heterogeneous, clustered, and distributed
systems, grids, accelerators such as ASICs, GPUs,
FPGAs, data centers, clouds, large scale machines,
and quantum computers. PPoPP is also interested in
new and emerging parallel workloads and applications,
such as artificial intelligence and large-scale
scientific/enterprise workloads." https://ppopp26.sigplan.org/track/PPoPP-2026-papers

It could be also that academia was overrun by the AI boom.
Is lost in the nowhere. That the techno lords have
created realities turning the academia into savages.

No wonder there is a call for automated AI researchers,
and automated AI engineers, by the AI industry itself.
And which might be the outcome of the current manhatten

project, also known as genesis mission. So that the AI
can be programmed by AI, AI which is more knowledgable
than tiny accademics. We are maybe heading towards a

first Ultraintelligence, that will then shape subsequent
Ultraintelligences. As described by I. J. Good:

"Let an ultraintelligent machine be defined as a machine
that can far surpass all the intellectual activities of
any man however clever. Since the design of machines is
one of these intellectual activities, an ultraintelligent
machine could design even better machines; there would
then unquestionably be an 'intelligence explosion,' and
the intelligence of man would be left far behind...
Thus the first ultraintelligent machine is the last
invention that man need ever make, provided that the
machine is docile enough to tell us how to keep it under
control. It is curious that this point is made so
seldom outside of science fiction. It is sometimes
worthwhile to take science fiction seriously." https://exhibits.stanford.edu/feigenbaum/catalog/gz727rg3869

Bye

Mild Shock schrieb:

Hi,

Come on squirrel brain, that we practically have
PRAM on multi-core CPUs, is an old hat. ARM kept
up with MESI/MOESI in 2011:

https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/CacheCoherencyWhitepaper_6June2011.pdf

What are you squirrel brain, some russion developer
controlling a drone from within EMACS ? Meanwhile
ARM and Intel and Snapdragon etc.. have developed

much more marvels than only this simple PRAM.
The excitement on the side of ARM is quite big,
that they got into the boat of OpenAI:

OpenAI co-founder on new deal with AMD https://www.youtube.com/watch?v=WuXCNpbO9hI

Bye

P.S.: Because of contention, you should of course
only use volatile variables carefully. It might
not scale well to 1000 cores.

There are also algorithms around to lift the
pressure when there is a large amount of cores.
Even Doug Lea has already put a few utilities in

java.concurrent.* for certain problems with large
number of cores, kind of easter eggs in java.concurrent.*.
But I am not sure whether Doug Lea is involved in

additions for AI accelerators. But he is in the
Program Committee of:

Parallel programming for emerging hardware, including
AI accelerators, processor-in-memory, programmable logic,
non-volatile memory technologies, and quantum computers https://ppopp26.sigplan.org/track/PPoPP-2026-papers

It could be that the data flow compiler, things sketched
by OpenXLA already work well enough.

Mild Shock schrieb:

Hi,

I am not saying anything. Thats the definition of PRAM.
Whats wrong with you, are you a 5 year old moron.
I am only citing a theoretical computer science model:

- Concurrent read concurrent write (CRCW)—multiple
processors can read and write. A CRCW PRAM is sometimes
called a concurrent random-access machine.
https://en.wikipedia.org/wiki/Parallel_RAM

Technically with multi-channel memory nowadays, it
doesn't need locks on the hardware level, only tiny
serialization, could even happen outside of the CPU.

So if you drop some barrier requirements, you could
really have the chaos of a PRAM, for worse or
for better. I think you need to accept that,

even if its to big to fit in your tiny squirrel brain.

Bye

P.S.: "effectively CREW, since only one write per address at
a time", it will just block the other cores? Short answer:
Yes — if two cores try to write the same address, one

of them is forced to stall (block) until the other completes.
In real hardware, the effect can mimic CRCW behavior over
a short time window, even though it’s not truly simultaneous.

this blocking usually happens in the cache-coherence
system, not at DRAM. Modern CPUs use MESI/MOESI. It happens
over a small interval [t₁, t₂] dictated by cache coherence.

From the POINT OF VIEW OF AN ALGORITHM, it’s “CRCW enough.”

Bosephis Otlesnov schrieb:

Mild Shock wrote:

What are you, a 5 year old moron?

There are millions of algorithm that use volatile variables. Just look >>>> at the Java code base.

But I was not refering to multi-threading, I was refering to PRAM for
matrix operations.

i thought you said you wanna read and write parallel to RAM, aka
PRAM, let
me see.. zum zum zum, yeah, you said that. Take a lock at timing
requirements for a read/write cycle, deadlines etc, shared memory or
not,
fucking idiot.

--- Synchronet 3.21a-Linux NewsLink 1.2

Who's Online

System Info

Sysop:	DaiTengu
Location:	Appleton, WI
Users:	1,090
Nodes:	10 (0 / 10)
Uptime:	159:53:05
Calls:	13,922
Files:	187,021
D/L today:	886 files (250M bytes)
Messages:	2,457,303

POINT OF VIEW OF AN ALGORITHM (Re: Algorithm introduced in Hogwild!SGD (Niu et al., 2011)) (Re: parallel random-access machine)

Who's Online

System Info