• GlusterFS with replica 3

    From ^Bart@none@none.it to alt.os.linux on Sat Jul 5 08:41:23 2025
    From Newsgroup: alt.os.linux

    Hello everyone,

    I have a cluster with the latest GlusterFS on Debian12 with three nodes
    but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    If I don't run backup, rsync, du, etc. the cluster works well, I used
    also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
    for each node, free ram is roughly of 300MB, used about 1,5GB and
    buffer/cache about 5GB.

    I read on internet AWS starts from 16GB of ram for GlusterFS, other
    documents said to use 12GB of ram, do you have experience about it?

    ^Bart
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Lawrence D'Oliveiro@ldo@nz.invalid to alt.os.linux on Sat Jul 5 07:18:35 2025
    From Newsgroup: alt.os.linux

    On Sat, 5 Jul 2025 08:41:23 +0200, ^Bart wrote:

    I have a cluster with the latest GlusterFS on Debian12 with three nodes
    but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    Any messages in the logs? journalctl? dmesg?
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Daniel70@daniel47@eternal-september.org to alt.os.linux on Sat Jul 5 20:08:15 2025
    From Newsgroup: alt.os.linux

    On 5/07/2025 4:41 pm, ^Bart wrote:
    Hello everyone,

    I have a cluster with the latest GlusterFS on Debian12 with three nodes
    but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    If I don't run backup, rsync, du, etc. the cluster works well, I used
    also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
    for each node, free ram is roughly of 300MB, used about 1,5GB and buffer/cache about 5GB.

    Free Ram 300MB approx
    Used Ram 1.5GB
    Buffer/cache 5GB approx

    Total 8GB approx
    Available Ram 8GB approx

    .... so are you all full up??

    I read on internet AWS starts from 16GB of ram for GlusterFS, other documents said to use 12GB of ram, do you have experience about it?

    ^Bart

    If you clear your Buffer/cache, might things run better??
    --
    Daniel70
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From J.O. Aho@user@example.net to alt.os.linux on Sat Jul 5 17:25:43 2025
    From Newsgroup: alt.os.linux

    On 05/07/2025 08.41, ^Bart wrote:

    I have a cluster with the latest GlusterFS on Debian12 with three nodes
    but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    Have you seen anything in the logs? Maybe check /var/log/glusterfs/glusterd.log
    It can be lock that hasn't been released, sadly only fix is to restart glusterd.


    If I don't run backup, rsync, du, etc. the cluster works well, I used
    also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
    for each node, free ram is roughly of 300MB, used about 1,5GB and buffer/cache about 5GB.

    It's quite many years since I used GlusterFS, but back then at work we
    had quite large DELL servers (64GB RAM, 2 CPU with 8 cores each) with
    SAN based storage as nodes, the system got degradation with having high read/write. In the end those was replaced by standard NFS servers which
    gave more stability and then have replication from one SAN to another,
    sure not a fully HA solution.

    I think I should have pushed more for Lustre and the guys at CERN were
    really helpful when I did some testing with just a simple setup which
    needed a bit more nodes, sadly the requirements changed on the way that
    made we needed to provide the file system to a closed source operating
    system with poor file system support.


    I read on internet AWS starts from 16GB of ram for GlusterFS, other documents said to use 12GB of ram, do you have experience about it?
    gluster.org do write, for basic nodes: 2 CPU’s, 4GB of RAM each, 1
    Gigabit network.
    --
    //Aho
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From vallor@vallor@cultnix.org to alt.os.linux on Sun Jul 6 02:09:52 2025
    From Newsgroup: alt.os.linux

    On Sat, 5 Jul 2025 20:08:15 +1000, Daniel70
    <daniel47@eternal-september.org> wrote in <104atij$1e1pj$1@dont-email.me>:

    On 5/07/2025 4:41 pm, ^Bart wrote:
    Hello everyone,

    I have a cluster with the latest GlusterFS on Debian12 with three nodes
    but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    If I don't run backup, rsync, du, etc. the cluster works well, I used
    also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
    for each node, free ram is roughly of 300MB, used about 1,5GB and
    buffer/cache about 5GB.

    Free Ram 300MB approx
    Used Ram 1.5GB
    Buffer/cache 5GB approx

    Total 8GB approx
    Available Ram 8GB approx

    .... so are you all full up??

    I read on internet AWS starts from 16GB of ram for GlusterFS, other
    documents said to use 12GB of ram, do you have experience about it?

    ^Bart

    If you clear your Buffer/cache, might things run better??

    Hi Daniel,

    The way Linux works, free memory gets used as Buffer/cache.
    As more memory is allocated, it pulls it from the B/C. It's
    basically part of the "free" memory, but being "borrowed"
    by the OS for better performance.
    --
    -v System76 Thelio Mega v1.1 x86_64 NVIDIA RTX 3090Ti 24G
    OS: Linux 6.15.4 D: Mint 22.1 DE: Xfce 4.18 Mem: 258G
    "Ever notice how fast Windows runs? Neither have I."
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From Paul@nospam@needed.invalid to alt.os.linux on Sun Jul 6 04:12:45 2025
    From Newsgroup: alt.os.linux

    On Sat, 7/5/2025 10:09 PM, vallor wrote:
    On Sat, 5 Jul 2025 20:08:15 +1000, Daniel70
    <daniel47@eternal-september.org> wrote in <104atij$1e1pj$1@dont-email.me>:

    On 5/07/2025 4:41 pm, ^Bart wrote:
    Hello everyone,

    I have a cluster with the latest GlusterFS on Debian12 with three nodes >>> but when I try a simple du -sh of the /var/www the node went in "N"
    state and it doesn't come back to Y so I need to manually restart
    glusterd daemon.

    If I don't run backup, rsync, du, etc. the cluster works well, I used
    also vmstat to see ram, cpu and disk; I have 4 vcpu 70% and 8GB of ram
    for each node, free ram is roughly of 300MB, used about 1,5GB and
    buffer/cache about 5GB.

    Free Ram 300MB approx
    Used Ram 1.5GB
    Buffer/cache 5GB approx

    Total 8GB approx
    Available Ram 8GB approx

    .... so are you all full up??

    I read on internet AWS starts from 16GB of ram for GlusterFS, other
    documents said to use 12GB of ram, do you have experience about it?

    ^Bart

    If you clear your Buffer/cache, might things run better??

    Hi Daniel,

    The way Linux works, free memory gets used as Buffer/cache.
    As more memory is allocated, it pulls it from the B/C. It's
    basically part of the "free" memory, but being "borrowed"
    by the OS for better performance.


    "If you clear your Buffer/cache, might things run better"

    Some people will tell you to test that for your very own
    self as a Linux user, and to your surprise, it is as they
    predicted, it makes no difference if you do a drop-cache.

    On some of the older Linux media, I used to write stuff
    as a liner note. This is from the paper wrapper on the
    Knoppix 5.3.1 DVD I made a while back.

    echo 1 > /proc/sys/vm/drop_caches

    1=PageCache
    2=Dentries,Inodes
    3=Both

    If you run "top" in one terminal session, then issue
    the command in another terminal session, you will see
    some numbers change in the "top" display, but the performance
    of the machine does not change.

    *******

    The OPs problem is at a different scale than what a
    home user can reproduce. Perhaps a person who has
    a Linux based work environment, has seen such a setup
    and can comment. Us home users, it would be pretty difficult
    to make a convincing setup.

    Imagine if the owner of Archive.org came online and
    asked a question about "a problem with his setup".
    Not many users here, have an Archive.org setup in
    their basement :-}

    Paul
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From ^Bart@none@none.it to alt.os.linux on Mon Jul 7 22:52:37 2025
    From Newsgroup: alt.os.linux

    Have you seen anything in the logs? Maybe check /var/log/glusterfs/ glusterd.log

    There's nothing wrong in the normal mode but when the system starts the
    backup of some db, more or less six, the node changes from Y to N but
    just with the most important db 1,9GB and it works well with other dbs
    untill 1,4GB and on the logs of gluster I can read something like "the
    node is disconnect" because can't read other peers.

    It can be lock that hasn't been released, sadly only fix is to restart glusterd.

    It's very sadly I can just restart the daemon to fix the "N" :\ but I
    could try to add more ram and get other 2GB so change from 8GB to 10GB.

    It's quite many years since I used GlusterFS, but back then at work we
    had quite large DELL servers (64GB RAM, 2 CPU with 8 cores each) with
    SAN based storage as nodes, the system got degradation with having high

    I think GlusterFS needs more than 8GB of ram to prevent spikes when the
    system does backup, ok also without backup job there isn't a lot of free memory (300-400MB) but there aren't down nodes!

    read/write. In the end those was replaced by standard NFS servers which
    gave more stability and then have replication from one SAN to another,
    sure not a fully HA solution.

    I'm watching cephfs but on internet I read it needs more ram than what I
    use now so... like what I wrote above I think now I could try to upgrade
    ram and run tests on GlusterFS because to change a "production cluster"
    is not easy like a charm but I know also there aren't future plans about gluster and I heard it will be closed so... cephfs will be the only alternative.

    gluster.org do write, for basic nodes: 2 CPU’s, 4GB of RAM each, 1
    Gigabit network.

    I read it but in a real production environment I think the cpu and ram quantities are little bit different...

    ^Bart
    --- Synchronet 3.21a-Linux NewsLink 1.2
  • From J.O. Aho@user@example.net to alt.os.linux on Tue Jul 8 09:10:54 2025
    From Newsgroup: alt.os.linux

    On 07/07/2025 22.52, ^Bart wrote:
    Have you seen anything in the logs? Maybe check /var/log/glusterfs/
    glusterd.log

    There's nothing wrong in the normal mode but when the system starts the backup of some db, more or less six, the node changes from Y to N but
    just with the most important db 1,9GB and it works well with other dbs untill 1,4GB and on the logs of gluster I can read something like "the
    node is disconnect" because can't read other peers.

    Could this be that you are reaching max transfer on your network, which
    could lead to that there ain't enough bandwidth for both file transfer
    and checking of nodes are up?

    Another alternative is that the nodes can't handle the speed the
    incoming traffic has for a longer time, as it's on 100% disk
    utilization, network traffic will also suffer (I have seen this, when
    disk is slow, everything else seems to go into a crawl and network
    connections fail as everything is queued up and while queued packages
    times out).


    It can be lock that hasn't been released, sadly only fix is to restart
    glusterd.

    It's very sadly I can just restart the daemon to fix the "N" :\ but I
    could try to add more ram and get other 2GB so change from 8GB to 10GB.

    If the RAM is used a cache before things are written down to disk, this
    could help for a while, until the extra 2 GB is used up. If lucky that
    is more than needed and then nothing ill will happen until you have more
    data to backup.


    It's quite many years since I used GlusterFS, but back then at work we
    had quite large DELL servers (64GB RAM, 2 CPU with 8 cores each) with
    SAN based storage as nodes, the system got degradation with having high

    I think GlusterFS needs more than 8GB of ram to prevent spikes when the system does backup, ok also without backup job there isn't a lot of free memory (300-400MB) but there aren't down nodes!

    read/write. In the end those was replaced by standard NFS servers
    which gave more stability and then have replication from one SAN to
    another, sure not a fully HA solution.

    I'm watching cephfs but on internet I read it needs more ram than what I
    use now so... like what I wrote above I think now I could try to upgrade
    ram and run tests on GlusterFS because to change a "production cluster"
    is not easy like a charm but I know also there aren't future plans about gluster and I heard it will be closed so... cephfs will be the only alternative.

    Yeah, it's a load of work, think we had like 48h downtime when we
    switched from gluster to nfs, customers wasn't that happy.

    gluster.org do write, for basic nodes: 2 CPU’s, 4GB of RAM each, 1
    Gigabit network.

    I read it but in a real production environment I think the cpu and ram quantities are little bit different...

    All depends on what you are doing, small amount cpu/ram works fine in
    lab environments as you usually don't have 300+ clients trying to write.
    --
    //Aho

    --- Synchronet 3.21a-Linux NewsLink 1.2