Files
microdao-daarion/DAARION-INFRASTRUCTURE-STACK.md
Apple 12545a7c76 🏗️ Add DAARION Infrastructure Stack
- Terraform + Ansible + K3s + Vault + Consul + Observability
- Decentralized network architecture (own datacenters)
- Complete Ansible playbooks:
  - bootstrap.yml: OS setup, packages, SSH
  - hardening.yml: Security (UFW, fail2ban, auditd, Trivy)
  - k3s-install.yml: Lightweight Kubernetes cluster
- Production inventory with NODE1, NODE3
- Group variables for all nodes
- Security check cron script
- Multi-DC ready with Consul support
2026-01-10 05:31:51 -08:00

27 KiB
Raw Permalink Blame History

🏗️ DAARION Infrastructure Stack — Децентралізована мережа

Версія: 1.0.0
Дата: 2026-01-10
Статус: В процесі впровадження


🎯 Концепція

Децентралізована мережа власних датацентрів та нод, розподілених географічно:

  • Без залежності від одного cloud-провайдера
  • Гібридна інфраструктура (bare-metal + VM + K8s)
  • Multi-DC архітектура з Consul для service discovery

📦 Technology Stack

┌─────────────────────────────────────────────────────────────────┐
│                      INFRASTRUCTURE LAYER                        │
├─────────────────────────────────────────────────────────────────┤
│  Terraform          │ Infrastructure as Code                    │
│  (networks, VPC,    │ - Мережі, VPC, firewall rules            │
│   LB, DNS, storage) │ - Load Balancers, DNS records            │
│                     │ - Storage provisioning                    │
├─────────────────────────────────────────────────────────────────┤
│                      CONFIGURATION LAYER                         │
├─────────────────────────────────────────────────────────────────┤
│  Ansible            │ Configuration Management                  │
│  (OS bootstrap,     │ - SSH keys, users, packages              │
│   hardening, k3s)   │ - Security hardening                     │
│                     │ - K3s/K8s cluster bootstrap              │
├─────────────────────────────────────────────────────────────────┤
│                      SECRETS LAYER                               │
├─────────────────────────────────────────────────────────────────┤
│  HashiCorp Vault    │ Centralized Secrets Management           │
│  + External Secrets │ - Database credentials                   │
│    Operator         │ - API keys, certificates                 │
│                     │ - Dynamic secrets rotation               │
├─────────────────────────────────────────────────────────────────┤
│                      ORCHESTRATION LAYER                         │
├─────────────────────────────────────────────────────────────────┤
│  K3s / Kubernetes   │ Container Orchestration                  │
│  + CoreDNS          │ - Lightweight K8s (k3s for edge)         │
│                     │ - Service discovery via CoreDNS          │
├─────────────────────────────────────────────────────────────────┤
│                      SERVICE DISCOVERY (Multi-DC)                │
├─────────────────────────────────────────────────────────────────┤
│  Consul             │ Multi-DC Service Discovery               │
│  (for hybrid/       │ - Cross-datacenter discovery             │
│   multi-DC)         │ - Health checking                        │
│                     │ - Service mesh (optional)                │
├─────────────────────────────────────────────────────────────────┤
│                      OBSERVABILITY LAYER                         │
├─────────────────────────────────────────────────────────────────┤
│  Prometheus         │ Metrics collection & alerting            │
│  Grafana            │ Dashboards & visualization               │
│  Loki               │ Log aggregation                          │
│  Tempo              │ Distributed tracing                      │
└─────────────────────────────────────────────────────────────────┘

🌍 Поточна мережа

Node Location Type Role Status
NODE1 Hetzner DE Dedicated Master, Gateway Active
NODE2 Local (Ivan) MacBook M4 Dev, Testing Active
NODE3 Remote DC Threadripper+RTX3090 AI/ML, GPU Active
NODE4+ TBD Various Compute 🔜 Planned

📁 Repository Structure

infrastructure/
├── terraform/
│   ├── modules/
│   │   ├── network/           # VPC, subnets, firewall
│   │   ├── compute/           # VMs, bare-metal provisioning
│   │   ├── dns/               # DNS records
│   │   ├── storage/           # Volumes, NFS, S3-compatible
│   │   └── load-balancer/     # HAProxy, Traefik configs
│   ├── environments/
│   │   ├── production/
│   │   ├── staging/
│   │   └── development/
│   └── main.tf
│
├── ansible/
│   ├── inventory/
│   │   ├── production.yml
│   │   ├── staging.yml
│   │   └── group_vars/
│   │       ├── all.yml
│   │       ├── masters.yml
│   │       ├── workers.yml
│   │       └── gpu_nodes.yml
│   ├── playbooks/
│   │   ├── bootstrap.yml      # OS setup, SSH, packages
│   │   ├── hardening.yml      # Security hardening
│   │   ├── k3s-install.yml    # K3s cluster setup
│   │   ├── vault-setup.yml    # Vault installation
│   │   ├── observability.yml  # Prometheus/Grafana/Loki
│   │   └── consul-setup.yml   # Consul for multi-DC
│   ├── roles/
│   │   ├── common/
│   │   ├── security/
│   │   ├── docker/
│   │   ├── k3s/
│   │   ├── vault/
│   │   ├── consul/
│   │   └── observability/
│   └── ansible.cfg
│
├── kubernetes/
│   ├── base/
│   │   ├── namespaces/
│   │   ├── rbac/
│   │   └── network-policies/
│   ├── apps/
│   │   ├── daarion-core/
│   │   ├── postgres/
│   │   ├── redis/
│   │   └── monitoring/
│   ├── external-secrets/
│   │   └── vault-backend.yml
│   └── kustomization.yaml
│
├── vault/
│   ├── policies/
│   ├── secrets-engines/
│   └── auth-methods/
│
├── consul/
│   ├── config/
│   └── services/
│
└── observability/
    ├── prometheus/
    ├── grafana/
    ├── loki/
    └── tempo/

🚀 Phase 1: Базова інфраструктура

Почнемо з встановлення базового стеку на NODE1 та NODE3.

1.1 Ansible Inventory

# ansible/inventory/production.yml
all:
  vars:
    ansible_python_interpreter: /usr/bin/python3
    timezone: "UTC"
    
  children:
    masters:
      hosts:
        node1:
          ansible_host: 144.76.224.179
          ansible_user: root
          node_role: master
          datacenter: hetzner-de
          
    workers:
      hosts:
        node3:
          ansible_host: 80.77.35.151
          ansible_port: 33147
          ansible_user: zevs
          ansible_become: yes
          ansible_become_pass: "{{ vault_node3_password }}"
          node_role: worker
          datacenter: remote-dc
          gpu: true
          gpu_type: "rtx3090"
          
    gpu_nodes:
      hosts:
        node3:
          
    local_dev:
      hosts:
        node2:
          ansible_host: 192.168.1.244
          ansible_user: apple
          node_role: development
          datacenter: local

1.2 Bootstrap Playbook

# ansible/playbooks/bootstrap.yml
---
- name: Bootstrap all nodes
  hosts: all
  become: yes
  
  vars:
    common_packages:
      - curl
      - wget
      - git
      - htop
      - vim
      - jq
      - unzip
      - ca-certificates
      - gnupg
      - lsb-release
      
  tasks:
    - name: Set timezone
      timezone:
        name: "{{ timezone }}"
        
    - name: Update apt cache
      apt:
        update_cache: yes
        cache_valid_time: 3600
      when: ansible_os_family == "Debian"
      
    - name: Install common packages
      apt:
        name: "{{ common_packages }}"
        state: present
      when: ansible_os_family == "Debian"
        
    - name: Create admin group
      group:
        name: daarion-admin
        state: present
        
    - name: Setup SSH authorized keys
      authorized_key:
        user: "{{ ansible_user }}"
        key: "{{ lookup('file', '~/.ssh/daarion_network.pub') }}"
        state: present
        
    - name: Disable password authentication
      lineinfile:
        path: /etc/ssh/sshd_config
        regexp: '^#?PasswordAuthentication'
        line: 'PasswordAuthentication no'
      notify: restart sshd
        
    - name: Set hostname
      hostname:
        name: "{{ inventory_hostname }}"
        
    - name: Update /etc/hosts
      lineinfile:
        path: /etc/hosts
        line: "{{ hostvars[item].ansible_host }} {{ item }}"
        state: present
      loop: "{{ groups['all'] }}"
      when: hostvars[item].ansible_host is defined
      
  handlers:
    - name: restart sshd
      service:
        name: sshd
        state: restarted

1.3 Security Hardening Playbook

# ansible/playbooks/hardening.yml
---
- name: Security Hardening
  hosts: all
  become: yes
  
  vars:
    security_packages:
      - fail2ban
      - ufw
      - auditd
      - rkhunter
      - unattended-upgrades
      
    allowed_ssh_port: "{{ ansible_port | default(22) }}"
    
  tasks:
    - name: Install security packages
      apt:
        name: "{{ security_packages }}"
        state: present
        
    - name: Install Trivy
      shell: |
        curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
      args:
        creates: /usr/local/bin/trivy
        
    # UFW Configuration
    - name: UFW - Default deny incoming
      ufw:
        direction: incoming
        policy: deny
        
    - name: UFW - Default deny outgoing
      ufw:
        direction: outgoing
        policy: deny
        
    - name: UFW - Allow SSH
      ufw:
        rule: allow
        port: "{{ allowed_ssh_port }}"
        proto: tcp
        
    - name: UFW - Allow necessary outgoing
      ufw:
        rule: allow
        direction: out
        port: "{{ item.port }}"
        proto: "{{ item.proto }}"
      loop:
        - { port: 53, proto: udp }    # DNS
        - { port: 80, proto: tcp }    # HTTP
        - { port: 443, proto: tcp }   # HTTPS
        - { port: 123, proto: udp }   # NTP
        
    - name: UFW - Allow K3s ports (masters)
      ufw:
        rule: allow
        port: "{{ item }}"
        proto: tcp
      loop:
        - 6443   # K3s API
        - 10250  # Kubelet
      when: "'masters' in group_names"
      
    - name: UFW - Enable
      ufw:
        state: enabled
        
    # Fail2ban
    - name: Configure fail2ban
      template:
        src: templates/jail.local.j2
        dest: /etc/fail2ban/jail.local
      notify: restart fail2ban
      
    # Kernel hardening
    - name: Kernel hardening sysctl
      sysctl:
        name: "{{ item.name }}"
        value: "{{ item.value }}"
        state: present
        reload: yes
      loop:
        - { name: 'net.ipv4.ip_forward', value: '1' }  # Required for K8s
        - { name: 'net.ipv4.conf.all.accept_redirects', value: '0' }
        - { name: 'net.ipv4.conf.default.accept_redirects', value: '0' }
        - { name: 'net.ipv4.tcp_syncookies', value: '1' }
        - { name: 'kernel.randomize_va_space', value: '2' }
        
    # Security check script
    - name: Create scripts directory
      file:
        path: /opt/scripts
        state: directory
        mode: '0755'
        
    - name: Deploy security check script
      copy:
        src: files/security-check.sh
        dest: /opt/scripts/security-check.sh
        mode: '0755'
        
    - name: Setup security cron
      cron:
        name: "Hourly security check"
        minute: "0"
        job: "/opt/scripts/security-check.sh"
        
  handlers:
    - name: restart fail2ban
      service:
        name: fail2ban
        state: restarted

1.4 K3s Installation Playbook

# ansible/playbooks/k3s-install.yml
---
- name: Install K3s on Masters
  hosts: masters
  become: yes
  
  vars:
    k3s_version: "v1.29.0+k3s1"
    
  tasks:
    - name: Download K3s installer
      get_url:
        url: https://get.k3s.io
        dest: /tmp/k3s-install.sh
        mode: '0755'
        
    - name: Install K3s server
      shell: |
        INSTALL_K3S_VERSION={{ k3s_version }} \
        K3S_TOKEN={{ k3s_token }} \
        sh /tmp/k3s-install.sh server \
          --disable traefik \
          --disable servicelb \
          --write-kubeconfig-mode 644 \
          --tls-san {{ ansible_host }} \
          --node-label "datacenter={{ datacenter }}" \
          --node-label "node-role={{ node_role }}"
      args:
        creates: /etc/rancher/k3s/k3s.yaml
        
    - name: Wait for K3s to be ready
      wait_for:
        port: 6443
        delay: 10
        timeout: 300
        
    - name: Get K3s token
      slurp:
        src: /var/lib/rancher/k3s/server/node-token
      register: k3s_token_file
      
    - name: Save K3s token
      set_fact:
        k3s_join_token: "{{ k3s_token_file.content | b64decode | trim }}"
        
    - name: Fetch kubeconfig
      fetch:
        src: /etc/rancher/k3s/k3s.yaml
        dest: "{{ playbook_dir }}/../kubeconfig/{{ inventory_hostname }}.yaml"
        flat: yes

---
- name: Install K3s on Workers
  hosts: workers
  become: yes
  
  vars:
    k3s_version: "v1.29.0+k3s1"
    k3s_master: "{{ hostvars[groups['masters'][0]].ansible_host }}"
    
  tasks:
    - name: Download K3s installer
      get_url:
        url: https://get.k3s.io
        dest: /tmp/k3s-install.sh
        mode: '0755'
        
    - name: Install K3s agent
      shell: |
        INSTALL_K3S_VERSION={{ k3s_version }} \
        K3S_URL=https://{{ k3s_master }}:6443 \
        K3S_TOKEN={{ hostvars[groups['masters'][0]].k3s_join_token }} \
        sh /tmp/k3s-install.sh agent \
          --node-label "datacenter={{ datacenter }}" \
          --node-label "node-role={{ node_role }}" \
          {% if gpu is defined and gpu %}
          --node-label "gpu=true" \
          --node-label "gpu-type={{ gpu_type }}"
          {% endif %}
      args:
        creates: /etc/rancher/k3s/k3s.yaml

🔐 Phase 2: Vault Setup

2.1 Vault Installation

# ansible/playbooks/vault-setup.yml
---
- name: Install HashiCorp Vault
  hosts: masters
  become: yes
  
  vars:
    vault_version: "1.15.4"
    vault_data_dir: "/opt/vault/data"
    
  tasks:
    - name: Create vault user
      user:
        name: vault
        system: yes
        shell: /bin/false
        
    - name: Create vault directories
      file:
        path: "{{ item }}"
        state: directory
        owner: vault
        group: vault
        mode: '0750'
      loop:
        - /opt/vault
        - /opt/vault/data
        - /opt/vault/config
        - /opt/vault/logs
        
    - name: Download Vault
      get_url:
        url: "https://releases.hashicorp.com/vault/{{ vault_version }}/vault_{{ vault_version }}_linux_amd64.zip"
        dest: /tmp/vault.zip
        
    - name: Extract Vault
      unarchive:
        src: /tmp/vault.zip
        dest: /usr/local/bin
        remote_src: yes
        
    - name: Vault configuration
      template:
        src: templates/vault.hcl.j2
        dest: /opt/vault/config/vault.hcl
        owner: vault
        group: vault
      notify: restart vault
        
    - name: Vault systemd service
      template:
        src: templates/vault.service.j2
        dest: /etc/systemd/system/vault.service
      notify:
        - reload systemd
        - restart vault
        
    - name: Enable and start Vault
      service:
        name: vault
        enabled: yes
        state: started
        
  handlers:
    - name: reload systemd
      systemd:
        daemon_reload: yes
        
    - name: restart vault
      service:
        name: vault
        state: restarted

2.2 Vault Configuration

# ansible/templates/vault.hcl.j2
ui = true

storage "file" {
  path = "/opt/vault/data"
}

listener "tcp" {
  address     = "0.0.0.0:8200"
  tls_disable = "true"  # Enable TLS in production!
}

api_addr = "http://{{ ansible_host }}:8200"
cluster_addr = "https://{{ ansible_host }}:8201"

2.3 External Secrets Operator

# kubernetes/external-secrets/vault-backend.yml
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
  name: vault-backend
spec:
  provider:
    vault:
      server: "http://node1:8200"
      path: "secret"
      version: "v2"
      auth:
        kubernetes:
          mountPath: "kubernetes"
          role: "external-secrets"
          serviceAccountRef:
            name: "external-secrets"
            namespace: "external-secrets"

---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: postgres-credentials
  namespace: daarion
spec:
  refreshInterval: "1h"
  secretStoreRef:
    name: vault-backend
    kind: ClusterSecretStore
  target:
    name: postgres-credentials
    creationPolicy: Owner
  data:
    - secretKey: username
      remoteRef:
        key: secret/data/postgres
        property: username
    - secretKey: password
      remoteRef:
        key: secret/data/postgres
        property: password

🔍 Phase 3: Consul (Multi-DC)

3.1 Consul Installation

# ansible/playbooks/consul-setup.yml
---
- name: Install Consul
  hosts: all
  become: yes
  
  vars:
    consul_version: "1.17.1"
    consul_datacenter: "{{ datacenter }}"
    consul_is_server: "{{ 'masters' in group_names }}"
    
  tasks:
    - name: Create consul user
      user:
        name: consul
        system: yes
        shell: /bin/false
        
    - name: Create consul directories
      file:
        path: "{{ item }}"
        state: directory
        owner: consul
        group: consul
      loop:
        - /opt/consul
        - /opt/consul/data
        - /opt/consul/config
        
    - name: Download Consul
      get_url:
        url: "https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_linux_amd64.zip"
        dest: /tmp/consul.zip
        
    - name: Extract Consul
      unarchive:
        src: /tmp/consul.zip
        dest: /usr/local/bin
        remote_src: yes
        
    - name: Consul configuration
      template:
        src: templates/consul.hcl.j2
        dest: /opt/consul/config/consul.hcl
        owner: consul
        group: consul
      notify: restart consul
        
    - name: Consul systemd service
      template:
        src: templates/consul.service.j2
        dest: /etc/systemd/system/consul.service
      notify:
        - reload systemd
        - restart consul
        
    - name: Enable and start Consul
      service:
        name: consul
        enabled: yes
        state: started
        
  handlers:
    - name: reload systemd
      systemd:
        daemon_reload: yes
        
    - name: restart consul
      service:
        name: consul
        state: restarted

3.2 Consul Configuration

# ansible/templates/consul.hcl.j2
datacenter = "{{ consul_datacenter }}"
data_dir = "/opt/consul/data"
log_level = "INFO"
node_name = "{{ inventory_hostname }}"
bind_addr = "{{ ansible_host }}"
client_addr = "0.0.0.0"

{% if consul_is_server %}
server = true
bootstrap_expect = {{ groups['masters'] | length }}
ui_config {
  enabled = true
}
{% endif %}

# Join other servers
retry_join = [
{% for host in groups['masters'] %}
  "{{ hostvars[host].ansible_host }}"{% if not loop.last %},{% endif %}

{% endfor %}
]

# WAN federation for multi-DC
{% if groups['masters'] | length > 1 %}
retry_join_wan = [
{% for host in groups['masters'] %}
  "{{ hostvars[host].ansible_host }}"{% if not loop.last %},{% endif %}

{% endfor %}
]
{% endif %}

# Service mesh
connect {
  enabled = true
}

# DNS
ports {
  dns = 8600
}

# ACL (enable in production)
acl {
  enabled = false
  default_policy = "allow"
}

📊 Phase 4: Observability Stack

4.1 Prometheus + Grafana + Loki + Tempo

# ansible/playbooks/observability.yml
---
- name: Deploy Observability Stack
  hosts: masters
  become: yes
  
  tasks:
    - name: Create monitoring namespace
      kubernetes.core.k8s:
        state: present
        definition:
          apiVersion: v1
          kind: Namespace
          metadata:
            name: monitoring
            
    - name: Add Prometheus Helm repo
      kubernetes.core.helm_repository:
        name: prometheus-community
        repo_url: https://prometheus-community.github.io/helm-charts
        
    - name: Add Grafana Helm repo
      kubernetes.core.helm_repository:
        name: grafana
        repo_url: https://grafana.github.io/helm-charts
        
    - name: Install kube-prometheus-stack
      kubernetes.core.helm:
        name: prometheus
        chart_ref: prometheus-community/kube-prometheus-stack
        release_namespace: monitoring
        create_namespace: yes
        values:
          prometheus:
            prometheusSpec:
              retention: 30d
              storageSpec:
                volumeClaimTemplate:
                  spec:
                    accessModes: ["ReadWriteOnce"]
                    resources:
                      requests:
                        storage: 50Gi
          grafana:
            adminPassword: "{{ vault_grafana_password }}"
            persistence:
              enabled: true
              size: 10Gi
              
    - name: Install Loki
      kubernetes.core.helm:
        name: loki
        chart_ref: grafana/loki-stack
        release_namespace: monitoring
        values:
          loki:
            persistence:
              enabled: true
              size: 50Gi
          promtail:
            enabled: true
            
    - name: Install Tempo
      kubernetes.core.helm:
        name: tempo
        chart_ref: grafana/tempo
        release_namespace: monitoring
        values:
          tempo:
            retention: 168h  # 7 days

4.2 Grafana Dashboards

# kubernetes/apps/monitoring/grafana-dashboards.yml
apiVersion: v1
kind: ConfigMap
metadata:
  name: daarion-dashboards
  namespace: monitoring
  labels:
    grafana_dashboard: "1"
data:
  daarion-network.json: |
    {
      "dashboard": {
        "title": "DAARION Network Overview",
        "panels": [
          {
            "title": "Total Nodes",
            "type": "stat",
            "targets": [{"expr": "count(up{job=\"node-exporter\"})"}]
          },
          {
            "title": "Nodes by Datacenter",
            "type": "piechart",
            "targets": [{"expr": "count by (datacenter) (up{job=\"node-exporter\"})"}]
          },
          {
            "title": "GPU Nodes",
            "type": "stat",
            "targets": [{"expr": "count(up{job=\"node-exporter\", gpu=\"true\"})"}]
          },
          {
            "title": "K3s Cluster Status",
            "type": "stat",
            "targets": [{"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\"})"}]
          }
        ]
      }
    }

🚀 Quick Start

Крок 1: Підготовка

# Клонувати репозиторій
git clone git@github.com:IvanTytar/microdao-daarion.git
cd microdao-daarion/infrastructure

# Створити SSH ключ для мережі
ssh-keygen -t ed25519 -f ~/.ssh/daarion_network -C "daarion-network"

# Встановити Ansible
pip install ansible ansible-lint

# Встановити Terraform
brew install terraform  # macOS

Крок 2: Налаштування inventory

# Скопіювати приклад
cp ansible/inventory/example.yml ansible/inventory/production.yml

# Відредагувати під свої ноди
vim ansible/inventory/production.yml

Крок 3: Bootstrap нод

cd ansible

# Перевірити з'єднання
ansible all -i inventory/production.yml -m ping

# Bootstrap
ansible-playbook -i inventory/production.yml playbooks/bootstrap.yml

# Hardening
ansible-playbook -i inventory/production.yml playbooks/hardening.yml

Крок 4: K3s кластер

# Встановити K3s
ansible-playbook -i inventory/production.yml playbooks/k3s-install.yml

# Перевірити
export KUBECONFIG=kubeconfig/node1.yaml
kubectl get nodes

Крок 5: Vault + Consul

# Vault
ansible-playbook -i inventory/production.yml playbooks/vault-setup.yml

# Consul (якщо multi-DC)
ansible-playbook -i inventory/production.yml playbooks/consul-setup.yml

Крок 6: Observability

# Prometheus + Grafana + Loki + Tempo
ansible-playbook -i inventory/production.yml playbooks/observability.yml

📋 Checklist

Phase 1: Foundation

  • NODE1 security hardening
  • NODE3 security hardening
  • PostgreSQL on NODE1 & NODE3
  • Ansible repository structure
  • SSH key distribution
  • Bootstrap playbook tested

Phase 2: K3s Cluster

  • K3s on NODE1 (master)
  • K3s on NODE3 (worker + GPU)
  • CoreDNS configured
  • Network policies

Phase 3: Secrets & Discovery

  • Vault installed
  • External Secrets Operator
  • Consul (if needed for multi-DC)

Phase 4: Observability

  • Prometheus
  • Grafana
  • Loki
  • Tempo
  • Alerting rules

Автор: Ivan Tytar & AI Assistant
Останнє оновлення: 2026-01-10