Files
microdao-daarion/DAARION-INFRASTRUCTURE-STACK.md
Apple 12545a7c76 🏗️ Add DAARION Infrastructure Stack
- Terraform + Ansible + K3s + Vault + Consul + Observability
- Decentralized network architecture (own datacenters)
- Complete Ansible playbooks:
  - bootstrap.yml: OS setup, packages, SSH
  - hardening.yml: Security (UFW, fail2ban, auditd, Trivy)
  - k3s-install.yml: Lightweight Kubernetes cluster
- Production inventory with NODE1, NODE3
- Group variables for all nodes
- Security check cron script
- Multi-DC ready with Consul support
2026-01-10 05:31:51 -08:00

994 lines
27 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🏗️ DAARION Infrastructure Stack — Децентралізована мережа
**Версія:** 1.0.0
**Дата:** 2026-01-10
**Статус:** В процесі впровадження
---
## 🎯 Концепція
**Децентралізована мережа власних датацентрів та нод**, розподілених географічно:
- Без залежності від одного cloud-провайдера
- Гібридна інфраструктура (bare-metal + VM + K8s)
- Multi-DC архітектура з Consul для service discovery
---
## 📦 Technology Stack
```
┌─────────────────────────────────────────────────────────────────┐
│ INFRASTRUCTURE LAYER │
├─────────────────────────────────────────────────────────────────┤
│ Terraform │ Infrastructure as Code │
│ (networks, VPC, │ - Мережі, VPC, firewall rules │
│ LB, DNS, storage) │ - Load Balancers, DNS records │
│ │ - Storage provisioning │
├─────────────────────────────────────────────────────────────────┤
│ CONFIGURATION LAYER │
├─────────────────────────────────────────────────────────────────┤
│ Ansible │ Configuration Management │
│ (OS bootstrap, │ - SSH keys, users, packages │
│ hardening, k3s) │ - Security hardening │
│ │ - K3s/K8s cluster bootstrap │
├─────────────────────────────────────────────────────────────────┤
│ SECRETS LAYER │
├─────────────────────────────────────────────────────────────────┤
│ HashiCorp Vault │ Centralized Secrets Management │
│ + External Secrets │ - Database credentials │
│ Operator │ - API keys, certificates │
│ │ - Dynamic secrets rotation │
├─────────────────────────────────────────────────────────────────┤
│ ORCHESTRATION LAYER │
├─────────────────────────────────────────────────────────────────┤
│ K3s / Kubernetes │ Container Orchestration │
│ + CoreDNS │ - Lightweight K8s (k3s for edge) │
│ │ - Service discovery via CoreDNS │
├─────────────────────────────────────────────────────────────────┤
│ SERVICE DISCOVERY (Multi-DC) │
├─────────────────────────────────────────────────────────────────┤
│ Consul │ Multi-DC Service Discovery │
│ (for hybrid/ │ - Cross-datacenter discovery │
│ multi-DC) │ - Health checking │
│ │ - Service mesh (optional) │
├─────────────────────────────────────────────────────────────────┤
│ OBSERVABILITY LAYER │
├─────────────────────────────────────────────────────────────────┤
│ Prometheus │ Metrics collection & alerting │
│ Grafana │ Dashboards & visualization │
│ Loki │ Log aggregation │
│ Tempo │ Distributed tracing │
└─────────────────────────────────────────────────────────────────┘
```
---
## 🌍 Поточна мережа
| Node | Location | Type | Role | Status |
|------|----------|------|------|--------|
| **NODE1** | Hetzner DE | Dedicated | Master, Gateway | ✅ Active |
| **NODE2** | Local (Ivan) | MacBook M4 | Dev, Testing | ✅ Active |
| **NODE3** | Remote DC | Threadripper+RTX3090 | AI/ML, GPU | ✅ Active |
| **NODE4+** | TBD | Various | Compute | 🔜 Planned |
---
## 📁 Repository Structure
```
infrastructure/
├── terraform/
│ ├── modules/
│ │ ├── network/ # VPC, subnets, firewall
│ │ ├── compute/ # VMs, bare-metal provisioning
│ │ ├── dns/ # DNS records
│ │ ├── storage/ # Volumes, NFS, S3-compatible
│ │ └── load-balancer/ # HAProxy, Traefik configs
│ ├── environments/
│ │ ├── production/
│ │ ├── staging/
│ │ └── development/
│ └── main.tf
├── ansible/
│ ├── inventory/
│ │ ├── production.yml
│ │ ├── staging.yml
│ │ └── group_vars/
│ │ ├── all.yml
│ │ ├── masters.yml
│ │ ├── workers.yml
│ │ └── gpu_nodes.yml
│ ├── playbooks/
│ │ ├── bootstrap.yml # OS setup, SSH, packages
│ │ ├── hardening.yml # Security hardening
│ │ ├── k3s-install.yml # K3s cluster setup
│ │ ├── vault-setup.yml # Vault installation
│ │ ├── observability.yml # Prometheus/Grafana/Loki
│ │ └── consul-setup.yml # Consul for multi-DC
│ ├── roles/
│ │ ├── common/
│ │ ├── security/
│ │ ├── docker/
│ │ ├── k3s/
│ │ ├── vault/
│ │ ├── consul/
│ │ └── observability/
│ └── ansible.cfg
├── kubernetes/
│ ├── base/
│ │ ├── namespaces/
│ │ ├── rbac/
│ │ └── network-policies/
│ ├── apps/
│ │ ├── daarion-core/
│ │ ├── postgres/
│ │ ├── redis/
│ │ └── monitoring/
│ ├── external-secrets/
│ │ └── vault-backend.yml
│ └── kustomization.yaml
├── vault/
│ ├── policies/
│ ├── secrets-engines/
│ └── auth-methods/
├── consul/
│ ├── config/
│ └── services/
└── observability/
├── prometheus/
├── grafana/
├── loki/
└── tempo/
```
---
## 🚀 Phase 1: Базова інфраструктура
Почнемо з встановлення базового стеку на NODE1 та NODE3.
### 1.1 Ansible Inventory
```yaml
# ansible/inventory/production.yml
all:
vars:
ansible_python_interpreter: /usr/bin/python3
timezone: "UTC"
children:
masters:
hosts:
node1:
ansible_host: 144.76.224.179
ansible_user: root
node_role: master
datacenter: hetzner-de
workers:
hosts:
node3:
ansible_host: 80.77.35.151
ansible_port: 33147
ansible_user: zevs
ansible_become: yes
ansible_become_pass: "{{ vault_node3_password }}"
node_role: worker
datacenter: remote-dc
gpu: true
gpu_type: "rtx3090"
gpu_nodes:
hosts:
node3:
local_dev:
hosts:
node2:
ansible_host: 192.168.1.244
ansible_user: apple
node_role: development
datacenter: local
```
### 1.2 Bootstrap Playbook
```yaml
# ansible/playbooks/bootstrap.yml
---
- name: Bootstrap all nodes
hosts: all
become: yes
vars:
common_packages:
- curl
- wget
- git
- htop
- vim
- jq
- unzip
- ca-certificates
- gnupg
- lsb-release
tasks:
- name: Set timezone
timezone:
name: "{{ timezone }}"
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
when: ansible_os_family == "Debian"
- name: Install common packages
apt:
name: "{{ common_packages }}"
state: present
when: ansible_os_family == "Debian"
- name: Create admin group
group:
name: daarion-admin
state: present
- name: Setup SSH authorized keys
authorized_key:
user: "{{ ansible_user }}"
key: "{{ lookup('file', '~/.ssh/daarion_network.pub') }}"
state: present
- name: Disable password authentication
lineinfile:
path: /etc/ssh/sshd_config
regexp: '^#?PasswordAuthentication'
line: 'PasswordAuthentication no'
notify: restart sshd
- name: Set hostname
hostname:
name: "{{ inventory_hostname }}"
- name: Update /etc/hosts
lineinfile:
path: /etc/hosts
line: "{{ hostvars[item].ansible_host }} {{ item }}"
state: present
loop: "{{ groups['all'] }}"
when: hostvars[item].ansible_host is defined
handlers:
- name: restart sshd
service:
name: sshd
state: restarted
```
### 1.3 Security Hardening Playbook
```yaml
# ansible/playbooks/hardening.yml
---
- name: Security Hardening
hosts: all
become: yes
vars:
security_packages:
- fail2ban
- ufw
- auditd
- rkhunter
- unattended-upgrades
allowed_ssh_port: "{{ ansible_port | default(22) }}"
tasks:
- name: Install security packages
apt:
name: "{{ security_packages }}"
state: present
- name: Install Trivy
shell: |
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
args:
creates: /usr/local/bin/trivy
# UFW Configuration
- name: UFW - Default deny incoming
ufw:
direction: incoming
policy: deny
- name: UFW - Default deny outgoing
ufw:
direction: outgoing
policy: deny
- name: UFW - Allow SSH
ufw:
rule: allow
port: "{{ allowed_ssh_port }}"
proto: tcp
- name: UFW - Allow necessary outgoing
ufw:
rule: allow
direction: out
port: "{{ item.port }}"
proto: "{{ item.proto }}"
loop:
- { port: 53, proto: udp } # DNS
- { port: 80, proto: tcp } # HTTP
- { port: 443, proto: tcp } # HTTPS
- { port: 123, proto: udp } # NTP
- name: UFW - Allow K3s ports (masters)
ufw:
rule: allow
port: "{{ item }}"
proto: tcp
loop:
- 6443 # K3s API
- 10250 # Kubelet
when: "'masters' in group_names"
- name: UFW - Enable
ufw:
state: enabled
# Fail2ban
- name: Configure fail2ban
template:
src: templates/jail.local.j2
dest: /etc/fail2ban/jail.local
notify: restart fail2ban
# Kernel hardening
- name: Kernel hardening sysctl
sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: yes
loop:
- { name: 'net.ipv4.ip_forward', value: '1' } # Required for K8s
- { name: 'net.ipv4.conf.all.accept_redirects', value: '0' }
- { name: 'net.ipv4.conf.default.accept_redirects', value: '0' }
- { name: 'net.ipv4.tcp_syncookies', value: '1' }
- { name: 'kernel.randomize_va_space', value: '2' }
# Security check script
- name: Create scripts directory
file:
path: /opt/scripts
state: directory
mode: '0755'
- name: Deploy security check script
copy:
src: files/security-check.sh
dest: /opt/scripts/security-check.sh
mode: '0755'
- name: Setup security cron
cron:
name: "Hourly security check"
minute: "0"
job: "/opt/scripts/security-check.sh"
handlers:
- name: restart fail2ban
service:
name: fail2ban
state: restarted
```
### 1.4 K3s Installation Playbook
```yaml
# ansible/playbooks/k3s-install.yml
---
- name: Install K3s on Masters
hosts: masters
become: yes
vars:
k3s_version: "v1.29.0+k3s1"
tasks:
- name: Download K3s installer
get_url:
url: https://get.k3s.io
dest: /tmp/k3s-install.sh
mode: '0755'
- name: Install K3s server
shell: |
INSTALL_K3S_VERSION={{ k3s_version }} \
K3S_TOKEN={{ k3s_token }} \
sh /tmp/k3s-install.sh server \
--disable traefik \
--disable servicelb \
--write-kubeconfig-mode 644 \
--tls-san {{ ansible_host }} \
--node-label "datacenter={{ datacenter }}" \
--node-label "node-role={{ node_role }}"
args:
creates: /etc/rancher/k3s/k3s.yaml
- name: Wait for K3s to be ready
wait_for:
port: 6443
delay: 10
timeout: 300
- name: Get K3s token
slurp:
src: /var/lib/rancher/k3s/server/node-token
register: k3s_token_file
- name: Save K3s token
set_fact:
k3s_join_token: "{{ k3s_token_file.content | b64decode | trim }}"
- name: Fetch kubeconfig
fetch:
src: /etc/rancher/k3s/k3s.yaml
dest: "{{ playbook_dir }}/../kubeconfig/{{ inventory_hostname }}.yaml"
flat: yes
---
- name: Install K3s on Workers
hosts: workers
become: yes
vars:
k3s_version: "v1.29.0+k3s1"
k3s_master: "{{ hostvars[groups['masters'][0]].ansible_host }}"
tasks:
- name: Download K3s installer
get_url:
url: https://get.k3s.io
dest: /tmp/k3s-install.sh
mode: '0755'
- name: Install K3s agent
shell: |
INSTALL_K3S_VERSION={{ k3s_version }} \
K3S_URL=https://{{ k3s_master }}:6443 \
K3S_TOKEN={{ hostvars[groups['masters'][0]].k3s_join_token }} \
sh /tmp/k3s-install.sh agent \
--node-label "datacenter={{ datacenter }}" \
--node-label "node-role={{ node_role }}" \
{% if gpu is defined and gpu %}
--node-label "gpu=true" \
--node-label "gpu-type={{ gpu_type }}"
{% endif %}
args:
creates: /etc/rancher/k3s/k3s.yaml
```
---
## 🔐 Phase 2: Vault Setup
### 2.1 Vault Installation
```yaml
# ansible/playbooks/vault-setup.yml
---
- name: Install HashiCorp Vault
hosts: masters
become: yes
vars:
vault_version: "1.15.4"
vault_data_dir: "/opt/vault/data"
tasks:
- name: Create vault user
user:
name: vault
system: yes
shell: /bin/false
- name: Create vault directories
file:
path: "{{ item }}"
state: directory
owner: vault
group: vault
mode: '0750'
loop:
- /opt/vault
- /opt/vault/data
- /opt/vault/config
- /opt/vault/logs
- name: Download Vault
get_url:
url: "https://releases.hashicorp.com/vault/{{ vault_version }}/vault_{{ vault_version }}_linux_amd64.zip"
dest: /tmp/vault.zip
- name: Extract Vault
unarchive:
src: /tmp/vault.zip
dest: /usr/local/bin
remote_src: yes
- name: Vault configuration
template:
src: templates/vault.hcl.j2
dest: /opt/vault/config/vault.hcl
owner: vault
group: vault
notify: restart vault
- name: Vault systemd service
template:
src: templates/vault.service.j2
dest: /etc/systemd/system/vault.service
notify:
- reload systemd
- restart vault
- name: Enable and start Vault
service:
name: vault
enabled: yes
state: started
handlers:
- name: reload systemd
systemd:
daemon_reload: yes
- name: restart vault
service:
name: vault
state: restarted
```
### 2.2 Vault Configuration
```hcl
# ansible/templates/vault.hcl.j2
ui = true
storage "file" {
path = "/opt/vault/data"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = "true" # Enable TLS in production!
}
api_addr = "http://{{ ansible_host }}:8200"
cluster_addr = "https://{{ ansible_host }}:8201"
```
### 2.3 External Secrets Operator
```yaml
# kubernetes/external-secrets/vault-backend.yml
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: vault-backend
spec:
provider:
vault:
server: "http://node1:8200"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "external-secrets"
serviceAccountRef:
name: "external-secrets"
namespace: "external-secrets"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: postgres-credentials
namespace: daarion
spec:
refreshInterval: "1h"
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: postgres-credentials
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: secret/data/postgres
property: username
- secretKey: password
remoteRef:
key: secret/data/postgres
property: password
```
---
## 🔍 Phase 3: Consul (Multi-DC)
### 3.1 Consul Installation
```yaml
# ansible/playbooks/consul-setup.yml
---
- name: Install Consul
hosts: all
become: yes
vars:
consul_version: "1.17.1"
consul_datacenter: "{{ datacenter }}"
consul_is_server: "{{ 'masters' in group_names }}"
tasks:
- name: Create consul user
user:
name: consul
system: yes
shell: /bin/false
- name: Create consul directories
file:
path: "{{ item }}"
state: directory
owner: consul
group: consul
loop:
- /opt/consul
- /opt/consul/data
- /opt/consul/config
- name: Download Consul
get_url:
url: "https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_linux_amd64.zip"
dest: /tmp/consul.zip
- name: Extract Consul
unarchive:
src: /tmp/consul.zip
dest: /usr/local/bin
remote_src: yes
- name: Consul configuration
template:
src: templates/consul.hcl.j2
dest: /opt/consul/config/consul.hcl
owner: consul
group: consul
notify: restart consul
- name: Consul systemd service
template:
src: templates/consul.service.j2
dest: /etc/systemd/system/consul.service
notify:
- reload systemd
- restart consul
- name: Enable and start Consul
service:
name: consul
enabled: yes
state: started
handlers:
- name: reload systemd
systemd:
daemon_reload: yes
- name: restart consul
service:
name: consul
state: restarted
```
### 3.2 Consul Configuration
```hcl
# ansible/templates/consul.hcl.j2
datacenter = "{{ consul_datacenter }}"
data_dir = "/opt/consul/data"
log_level = "INFO"
node_name = "{{ inventory_hostname }}"
bind_addr = "{{ ansible_host }}"
client_addr = "0.0.0.0"
{% if consul_is_server %}
server = true
bootstrap_expect = {{ groups['masters'] | length }}
ui_config {
enabled = true
}
{% endif %}
# Join other servers
retry_join = [
{% for host in groups['masters'] %}
"{{ hostvars[host].ansible_host }}"{% if not loop.last %},{% endif %}
{% endfor %}
]
# WAN federation for multi-DC
{% if groups['masters'] | length > 1 %}
retry_join_wan = [
{% for host in groups['masters'] %}
"{{ hostvars[host].ansible_host }}"{% if not loop.last %},{% endif %}
{% endfor %}
]
{% endif %}
# Service mesh
connect {
enabled = true
}
# DNS
ports {
dns = 8600
}
# ACL (enable in production)
acl {
enabled = false
default_policy = "allow"
}
```
---
## 📊 Phase 4: Observability Stack
### 4.1 Prometheus + Grafana + Loki + Tempo
```yaml
# ansible/playbooks/observability.yml
---
- name: Deploy Observability Stack
hosts: masters
become: yes
tasks:
- name: Create monitoring namespace
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
- name: Add Prometheus Helm repo
kubernetes.core.helm_repository:
name: prometheus-community
repo_url: https://prometheus-community.github.io/helm-charts
- name: Add Grafana Helm repo
kubernetes.core.helm_repository:
name: grafana
repo_url: https://grafana.github.io/helm-charts
- name: Install kube-prometheus-stack
kubernetes.core.helm:
name: prometheus
chart_ref: prometheus-community/kube-prometheus-stack
release_namespace: monitoring
create_namespace: yes
values:
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
grafana:
adminPassword: "{{ vault_grafana_password }}"
persistence:
enabled: true
size: 10Gi
- name: Install Loki
kubernetes.core.helm:
name: loki
chart_ref: grafana/loki-stack
release_namespace: monitoring
values:
loki:
persistence:
enabled: true
size: 50Gi
promtail:
enabled: true
- name: Install Tempo
kubernetes.core.helm:
name: tempo
chart_ref: grafana/tempo
release_namespace: monitoring
values:
tempo:
retention: 168h # 7 days
```
### 4.2 Grafana Dashboards
```yaml
# kubernetes/apps/monitoring/grafana-dashboards.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: daarion-dashboards
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
daarion-network.json: |
{
"dashboard": {
"title": "DAARION Network Overview",
"panels": [
{
"title": "Total Nodes",
"type": "stat",
"targets": [{"expr": "count(up{job=\"node-exporter\"})"}]
},
{
"title": "Nodes by Datacenter",
"type": "piechart",
"targets": [{"expr": "count by (datacenter) (up{job=\"node-exporter\"})"}]
},
{
"title": "GPU Nodes",
"type": "stat",
"targets": [{"expr": "count(up{job=\"node-exporter\", gpu=\"true\"})"}]
},
{
"title": "K3s Cluster Status",
"type": "stat",
"targets": [{"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\"})"}]
}
]
}
}
```
---
## 🚀 Quick Start
### Крок 1: Підготовка
```bash
# Клонувати репозиторій
git clone git@github.com:IvanTytar/microdao-daarion.git
cd microdao-daarion/infrastructure
# Створити SSH ключ для мережі
ssh-keygen -t ed25519 -f ~/.ssh/daarion_network -C "daarion-network"
# Встановити Ansible
pip install ansible ansible-lint
# Встановити Terraform
brew install terraform # macOS
```
### Крок 2: Налаштування inventory
```bash
# Скопіювати приклад
cp ansible/inventory/example.yml ansible/inventory/production.yml
# Відредагувати під свої ноди
vim ansible/inventory/production.yml
```
### Крок 3: Bootstrap нод
```bash
cd ansible
# Перевірити з'єднання
ansible all -i inventory/production.yml -m ping
# Bootstrap
ansible-playbook -i inventory/production.yml playbooks/bootstrap.yml
# Hardening
ansible-playbook -i inventory/production.yml playbooks/hardening.yml
```
### Крок 4: K3s кластер
```bash
# Встановити K3s
ansible-playbook -i inventory/production.yml playbooks/k3s-install.yml
# Перевірити
export KUBECONFIG=kubeconfig/node1.yaml
kubectl get nodes
```
### Крок 5: Vault + Consul
```bash
# Vault
ansible-playbook -i inventory/production.yml playbooks/vault-setup.yml
# Consul (якщо multi-DC)
ansible-playbook -i inventory/production.yml playbooks/consul-setup.yml
```
### Крок 6: Observability
```bash
# Prometheus + Grafana + Loki + Tempo
ansible-playbook -i inventory/production.yml playbooks/observability.yml
```
---
## 📋 Checklist
### Phase 1: Foundation
- [x] NODE1 security hardening
- [x] NODE3 security hardening
- [x] PostgreSQL on NODE1 & NODE3
- [ ] Ansible repository structure
- [ ] SSH key distribution
- [ ] Bootstrap playbook tested
### Phase 2: K3s Cluster
- [ ] K3s on NODE1 (master)
- [ ] K3s on NODE3 (worker + GPU)
- [ ] CoreDNS configured
- [ ] Network policies
### Phase 3: Secrets & Discovery
- [ ] Vault installed
- [ ] External Secrets Operator
- [ ] Consul (if needed for multi-DC)
### Phase 4: Observability
- [ ] Prometheus
- [ ] Grafana
- [ ] Loki
- [ ] Tempo
- [ ] Alerting rules
---
**Автор:** Ivan Tytar & AI Assistant
**Останнє оновлення:** 2026-01-10