- Terraform + Ansible + K3s + Vault + Consul + Observability - Decentralized network architecture (own datacenters) - Complete Ansible playbooks: - bootstrap.yml: OS setup, packages, SSH - hardening.yml: Security (UFW, fail2ban, auditd, Trivy) - k3s-install.yml: Lightweight Kubernetes cluster - Production inventory with NODE1, NODE3 - Group variables for all nodes - Security check cron script - Multi-DC ready with Consul support
27 KiB
27 KiB
🏗️ DAARION Infrastructure Stack — Децентралізована мережа
Версія: 1.0.0
Дата: 2026-01-10
Статус: В процесі впровадження
🎯 Концепція
Децентралізована мережа власних датацентрів та нод, розподілених географічно:
- Без залежності від одного cloud-провайдера
- Гібридна інфраструктура (bare-metal + VM + K8s)
- Multi-DC архітектура з Consul для service discovery
📦 Technology Stack
┌─────────────────────────────────────────────────────────────────┐
│ INFRASTRUCTURE LAYER │
├─────────────────────────────────────────────────────────────────┤
│ Terraform │ Infrastructure as Code │
│ (networks, VPC, │ - Мережі, VPC, firewall rules │
│ LB, DNS, storage) │ - Load Balancers, DNS records │
│ │ - Storage provisioning │
├─────────────────────────────────────────────────────────────────┤
│ CONFIGURATION LAYER │
├─────────────────────────────────────────────────────────────────┤
│ Ansible │ Configuration Management │
│ (OS bootstrap, │ - SSH keys, users, packages │
│ hardening, k3s) │ - Security hardening │
│ │ - K3s/K8s cluster bootstrap │
├─────────────────────────────────────────────────────────────────┤
│ SECRETS LAYER │
├─────────────────────────────────────────────────────────────────┤
│ HashiCorp Vault │ Centralized Secrets Management │
│ + External Secrets │ - Database credentials │
│ Operator │ - API keys, certificates │
│ │ - Dynamic secrets rotation │
├─────────────────────────────────────────────────────────────────┤
│ ORCHESTRATION LAYER │
├─────────────────────────────────────────────────────────────────┤
│ K3s / Kubernetes │ Container Orchestration │
│ + CoreDNS │ - Lightweight K8s (k3s for edge) │
│ │ - Service discovery via CoreDNS │
├─────────────────────────────────────────────────────────────────┤
│ SERVICE DISCOVERY (Multi-DC) │
├─────────────────────────────────────────────────────────────────┤
│ Consul │ Multi-DC Service Discovery │
│ (for hybrid/ │ - Cross-datacenter discovery │
│ multi-DC) │ - Health checking │
│ │ - Service mesh (optional) │
├─────────────────────────────────────────────────────────────────┤
│ OBSERVABILITY LAYER │
├─────────────────────────────────────────────────────────────────┤
│ Prometheus │ Metrics collection & alerting │
│ Grafana │ Dashboards & visualization │
│ Loki │ Log aggregation │
│ Tempo │ Distributed tracing │
└─────────────────────────────────────────────────────────────────┘
🌍 Поточна мережа
| Node | Location | Type | Role | Status |
|---|---|---|---|---|
| NODE1 | Hetzner DE | Dedicated | Master, Gateway | ✅ Active |
| NODE2 | Local (Ivan) | MacBook M4 | Dev, Testing | ✅ Active |
| NODE3 | Remote DC | Threadripper+RTX3090 | AI/ML, GPU | ✅ Active |
| NODE4+ | TBD | Various | Compute | 🔜 Planned |
📁 Repository Structure
infrastructure/
├── terraform/
│ ├── modules/
│ │ ├── network/ # VPC, subnets, firewall
│ │ ├── compute/ # VMs, bare-metal provisioning
│ │ ├── dns/ # DNS records
│ │ ├── storage/ # Volumes, NFS, S3-compatible
│ │ └── load-balancer/ # HAProxy, Traefik configs
│ ├── environments/
│ │ ├── production/
│ │ ├── staging/
│ │ └── development/
│ └── main.tf
│
├── ansible/
│ ├── inventory/
│ │ ├── production.yml
│ │ ├── staging.yml
│ │ └── group_vars/
│ │ ├── all.yml
│ │ ├── masters.yml
│ │ ├── workers.yml
│ │ └── gpu_nodes.yml
│ ├── playbooks/
│ │ ├── bootstrap.yml # OS setup, SSH, packages
│ │ ├── hardening.yml # Security hardening
│ │ ├── k3s-install.yml # K3s cluster setup
│ │ ├── vault-setup.yml # Vault installation
│ │ ├── observability.yml # Prometheus/Grafana/Loki
│ │ └── consul-setup.yml # Consul for multi-DC
│ ├── roles/
│ │ ├── common/
│ │ ├── security/
│ │ ├── docker/
│ │ ├── k3s/
│ │ ├── vault/
│ │ ├── consul/
│ │ └── observability/
│ └── ansible.cfg
│
├── kubernetes/
│ ├── base/
│ │ ├── namespaces/
│ │ ├── rbac/
│ │ └── network-policies/
│ ├── apps/
│ │ ├── daarion-core/
│ │ ├── postgres/
│ │ ├── redis/
│ │ └── monitoring/
│ ├── external-secrets/
│ │ └── vault-backend.yml
│ └── kustomization.yaml
│
├── vault/
│ ├── policies/
│ ├── secrets-engines/
│ └── auth-methods/
│
├── consul/
│ ├── config/
│ └── services/
│
└── observability/
├── prometheus/
├── grafana/
├── loki/
└── tempo/
🚀 Phase 1: Базова інфраструктура
Почнемо з встановлення базового стеку на NODE1 та NODE3.
1.1 Ansible Inventory
# ansible/inventory/production.yml
all:
vars:
ansible_python_interpreter: /usr/bin/python3
timezone: "UTC"
children:
masters:
hosts:
node1:
ansible_host: 144.76.224.179
ansible_user: root
node_role: master
datacenter: hetzner-de
workers:
hosts:
node3:
ansible_host: 80.77.35.151
ansible_port: 33147
ansible_user: zevs
ansible_become: yes
ansible_become_pass: "{{ vault_node3_password }}"
node_role: worker
datacenter: remote-dc
gpu: true
gpu_type: "rtx3090"
gpu_nodes:
hosts:
node3:
local_dev:
hosts:
node2:
ansible_host: 192.168.1.244
ansible_user: apple
node_role: development
datacenter: local
1.2 Bootstrap Playbook
# ansible/playbooks/bootstrap.yml
---
- name: Bootstrap all nodes
hosts: all
become: yes
vars:
common_packages:
- curl
- wget
- git
- htop
- vim
- jq
- unzip
- ca-certificates
- gnupg
- lsb-release
tasks:
- name: Set timezone
timezone:
name: "{{ timezone }}"
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
when: ansible_os_family == "Debian"
- name: Install common packages
apt:
name: "{{ common_packages }}"
state: present
when: ansible_os_family == "Debian"
- name: Create admin group
group:
name: daarion-admin
state: present
- name: Setup SSH authorized keys
authorized_key:
user: "{{ ansible_user }}"
key: "{{ lookup('file', '~/.ssh/daarion_network.pub') }}"
state: present
- name: Disable password authentication
lineinfile:
path: /etc/ssh/sshd_config
regexp: '^#?PasswordAuthentication'
line: 'PasswordAuthentication no'
notify: restart sshd
- name: Set hostname
hostname:
name: "{{ inventory_hostname }}"
- name: Update /etc/hosts
lineinfile:
path: /etc/hosts
line: "{{ hostvars[item].ansible_host }} {{ item }}"
state: present
loop: "{{ groups['all'] }}"
when: hostvars[item].ansible_host is defined
handlers:
- name: restart sshd
service:
name: sshd
state: restarted
1.3 Security Hardening Playbook
# ansible/playbooks/hardening.yml
---
- name: Security Hardening
hosts: all
become: yes
vars:
security_packages:
- fail2ban
- ufw
- auditd
- rkhunter
- unattended-upgrades
allowed_ssh_port: "{{ ansible_port | default(22) }}"
tasks:
- name: Install security packages
apt:
name: "{{ security_packages }}"
state: present
- name: Install Trivy
shell: |
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
args:
creates: /usr/local/bin/trivy
# UFW Configuration
- name: UFW - Default deny incoming
ufw:
direction: incoming
policy: deny
- name: UFW - Default deny outgoing
ufw:
direction: outgoing
policy: deny
- name: UFW - Allow SSH
ufw:
rule: allow
port: "{{ allowed_ssh_port }}"
proto: tcp
- name: UFW - Allow necessary outgoing
ufw:
rule: allow
direction: out
port: "{{ item.port }}"
proto: "{{ item.proto }}"
loop:
- { port: 53, proto: udp } # DNS
- { port: 80, proto: tcp } # HTTP
- { port: 443, proto: tcp } # HTTPS
- { port: 123, proto: udp } # NTP
- name: UFW - Allow K3s ports (masters)
ufw:
rule: allow
port: "{{ item }}"
proto: tcp
loop:
- 6443 # K3s API
- 10250 # Kubelet
when: "'masters' in group_names"
- name: UFW - Enable
ufw:
state: enabled
# Fail2ban
- name: Configure fail2ban
template:
src: templates/jail.local.j2
dest: /etc/fail2ban/jail.local
notify: restart fail2ban
# Kernel hardening
- name: Kernel hardening sysctl
sysctl:
name: "{{ item.name }}"
value: "{{ item.value }}"
state: present
reload: yes
loop:
- { name: 'net.ipv4.ip_forward', value: '1' } # Required for K8s
- { name: 'net.ipv4.conf.all.accept_redirects', value: '0' }
- { name: 'net.ipv4.conf.default.accept_redirects', value: '0' }
- { name: 'net.ipv4.tcp_syncookies', value: '1' }
- { name: 'kernel.randomize_va_space', value: '2' }
# Security check script
- name: Create scripts directory
file:
path: /opt/scripts
state: directory
mode: '0755'
- name: Deploy security check script
copy:
src: files/security-check.sh
dest: /opt/scripts/security-check.sh
mode: '0755'
- name: Setup security cron
cron:
name: "Hourly security check"
minute: "0"
job: "/opt/scripts/security-check.sh"
handlers:
- name: restart fail2ban
service:
name: fail2ban
state: restarted
1.4 K3s Installation Playbook
# ansible/playbooks/k3s-install.yml
---
- name: Install K3s on Masters
hosts: masters
become: yes
vars:
k3s_version: "v1.29.0+k3s1"
tasks:
- name: Download K3s installer
get_url:
url: https://get.k3s.io
dest: /tmp/k3s-install.sh
mode: '0755'
- name: Install K3s server
shell: |
INSTALL_K3S_VERSION={{ k3s_version }} \
K3S_TOKEN={{ k3s_token }} \
sh /tmp/k3s-install.sh server \
--disable traefik \
--disable servicelb \
--write-kubeconfig-mode 644 \
--tls-san {{ ansible_host }} \
--node-label "datacenter={{ datacenter }}" \
--node-label "node-role={{ node_role }}"
args:
creates: /etc/rancher/k3s/k3s.yaml
- name: Wait for K3s to be ready
wait_for:
port: 6443
delay: 10
timeout: 300
- name: Get K3s token
slurp:
src: /var/lib/rancher/k3s/server/node-token
register: k3s_token_file
- name: Save K3s token
set_fact:
k3s_join_token: "{{ k3s_token_file.content | b64decode | trim }}"
- name: Fetch kubeconfig
fetch:
src: /etc/rancher/k3s/k3s.yaml
dest: "{{ playbook_dir }}/../kubeconfig/{{ inventory_hostname }}.yaml"
flat: yes
---
- name: Install K3s on Workers
hosts: workers
become: yes
vars:
k3s_version: "v1.29.0+k3s1"
k3s_master: "{{ hostvars[groups['masters'][0]].ansible_host }}"
tasks:
- name: Download K3s installer
get_url:
url: https://get.k3s.io
dest: /tmp/k3s-install.sh
mode: '0755'
- name: Install K3s agent
shell: |
INSTALL_K3S_VERSION={{ k3s_version }} \
K3S_URL=https://{{ k3s_master }}:6443 \
K3S_TOKEN={{ hostvars[groups['masters'][0]].k3s_join_token }} \
sh /tmp/k3s-install.sh agent \
--node-label "datacenter={{ datacenter }}" \
--node-label "node-role={{ node_role }}" \
{% if gpu is defined and gpu %}
--node-label "gpu=true" \
--node-label "gpu-type={{ gpu_type }}"
{% endif %}
args:
creates: /etc/rancher/k3s/k3s.yaml
🔐 Phase 2: Vault Setup
2.1 Vault Installation
# ansible/playbooks/vault-setup.yml
---
- name: Install HashiCorp Vault
hosts: masters
become: yes
vars:
vault_version: "1.15.4"
vault_data_dir: "/opt/vault/data"
tasks:
- name: Create vault user
user:
name: vault
system: yes
shell: /bin/false
- name: Create vault directories
file:
path: "{{ item }}"
state: directory
owner: vault
group: vault
mode: '0750'
loop:
- /opt/vault
- /opt/vault/data
- /opt/vault/config
- /opt/vault/logs
- name: Download Vault
get_url:
url: "https://releases.hashicorp.com/vault/{{ vault_version }}/vault_{{ vault_version }}_linux_amd64.zip"
dest: /tmp/vault.zip
- name: Extract Vault
unarchive:
src: /tmp/vault.zip
dest: /usr/local/bin
remote_src: yes
- name: Vault configuration
template:
src: templates/vault.hcl.j2
dest: /opt/vault/config/vault.hcl
owner: vault
group: vault
notify: restart vault
- name: Vault systemd service
template:
src: templates/vault.service.j2
dest: /etc/systemd/system/vault.service
notify:
- reload systemd
- restart vault
- name: Enable and start Vault
service:
name: vault
enabled: yes
state: started
handlers:
- name: reload systemd
systemd:
daemon_reload: yes
- name: restart vault
service:
name: vault
state: restarted
2.2 Vault Configuration
# ansible/templates/vault.hcl.j2
ui = true
storage "file" {
path = "/opt/vault/data"
}
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = "true" # Enable TLS in production!
}
api_addr = "http://{{ ansible_host }}:8200"
cluster_addr = "https://{{ ansible_host }}:8201"
2.3 External Secrets Operator
# kubernetes/external-secrets/vault-backend.yml
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: vault-backend
spec:
provider:
vault:
server: "http://node1:8200"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "external-secrets"
serviceAccountRef:
name: "external-secrets"
namespace: "external-secrets"
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: postgres-credentials
namespace: daarion
spec:
refreshInterval: "1h"
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: postgres-credentials
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: secret/data/postgres
property: username
- secretKey: password
remoteRef:
key: secret/data/postgres
property: password
🔍 Phase 3: Consul (Multi-DC)
3.1 Consul Installation
# ansible/playbooks/consul-setup.yml
---
- name: Install Consul
hosts: all
become: yes
vars:
consul_version: "1.17.1"
consul_datacenter: "{{ datacenter }}"
consul_is_server: "{{ 'masters' in group_names }}"
tasks:
- name: Create consul user
user:
name: consul
system: yes
shell: /bin/false
- name: Create consul directories
file:
path: "{{ item }}"
state: directory
owner: consul
group: consul
loop:
- /opt/consul
- /opt/consul/data
- /opt/consul/config
- name: Download Consul
get_url:
url: "https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_linux_amd64.zip"
dest: /tmp/consul.zip
- name: Extract Consul
unarchive:
src: /tmp/consul.zip
dest: /usr/local/bin
remote_src: yes
- name: Consul configuration
template:
src: templates/consul.hcl.j2
dest: /opt/consul/config/consul.hcl
owner: consul
group: consul
notify: restart consul
- name: Consul systemd service
template:
src: templates/consul.service.j2
dest: /etc/systemd/system/consul.service
notify:
- reload systemd
- restart consul
- name: Enable and start Consul
service:
name: consul
enabled: yes
state: started
handlers:
- name: reload systemd
systemd:
daemon_reload: yes
- name: restart consul
service:
name: consul
state: restarted
3.2 Consul Configuration
# ansible/templates/consul.hcl.j2
datacenter = "{{ consul_datacenter }}"
data_dir = "/opt/consul/data"
log_level = "INFO"
node_name = "{{ inventory_hostname }}"
bind_addr = "{{ ansible_host }}"
client_addr = "0.0.0.0"
{% if consul_is_server %}
server = true
bootstrap_expect = {{ groups['masters'] | length }}
ui_config {
enabled = true
}
{% endif %}
# Join other servers
retry_join = [
{% for host in groups['masters'] %}
"{{ hostvars[host].ansible_host }}"{% if not loop.last %},{% endif %}
{% endfor %}
]
# WAN federation for multi-DC
{% if groups['masters'] | length > 1 %}
retry_join_wan = [
{% for host in groups['masters'] %}
"{{ hostvars[host].ansible_host }}"{% if not loop.last %},{% endif %}
{% endfor %}
]
{% endif %}
# Service mesh
connect {
enabled = true
}
# DNS
ports {
dns = 8600
}
# ACL (enable in production)
acl {
enabled = false
default_policy = "allow"
}
📊 Phase 4: Observability Stack
4.1 Prometheus + Grafana + Loki + Tempo
# ansible/playbooks/observability.yml
---
- name: Deploy Observability Stack
hosts: masters
become: yes
tasks:
- name: Create monitoring namespace
kubernetes.core.k8s:
state: present
definition:
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
- name: Add Prometheus Helm repo
kubernetes.core.helm_repository:
name: prometheus-community
repo_url: https://prometheus-community.github.io/helm-charts
- name: Add Grafana Helm repo
kubernetes.core.helm_repository:
name: grafana
repo_url: https://grafana.github.io/helm-charts
- name: Install kube-prometheus-stack
kubernetes.core.helm:
name: prometheus
chart_ref: prometheus-community/kube-prometheus-stack
release_namespace: monitoring
create_namespace: yes
values:
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 50Gi
grafana:
adminPassword: "{{ vault_grafana_password }}"
persistence:
enabled: true
size: 10Gi
- name: Install Loki
kubernetes.core.helm:
name: loki
chart_ref: grafana/loki-stack
release_namespace: monitoring
values:
loki:
persistence:
enabled: true
size: 50Gi
promtail:
enabled: true
- name: Install Tempo
kubernetes.core.helm:
name: tempo
chart_ref: grafana/tempo
release_namespace: monitoring
values:
tempo:
retention: 168h # 7 days
4.2 Grafana Dashboards
# kubernetes/apps/monitoring/grafana-dashboards.yml
apiVersion: v1
kind: ConfigMap
metadata:
name: daarion-dashboards
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
daarion-network.json: |
{
"dashboard": {
"title": "DAARION Network Overview",
"panels": [
{
"title": "Total Nodes",
"type": "stat",
"targets": [{"expr": "count(up{job=\"node-exporter\"})"}]
},
{
"title": "Nodes by Datacenter",
"type": "piechart",
"targets": [{"expr": "count by (datacenter) (up{job=\"node-exporter\"})"}]
},
{
"title": "GPU Nodes",
"type": "stat",
"targets": [{"expr": "count(up{job=\"node-exporter\", gpu=\"true\"})"}]
},
{
"title": "K3s Cluster Status",
"type": "stat",
"targets": [{"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\"})"}]
}
]
}
}
🚀 Quick Start
Крок 1: Підготовка
# Клонувати репозиторій
git clone git@github.com:IvanTytar/microdao-daarion.git
cd microdao-daarion/infrastructure
# Створити SSH ключ для мережі
ssh-keygen -t ed25519 -f ~/.ssh/daarion_network -C "daarion-network"
# Встановити Ansible
pip install ansible ansible-lint
# Встановити Terraform
brew install terraform # macOS
Крок 2: Налаштування inventory
# Скопіювати приклад
cp ansible/inventory/example.yml ansible/inventory/production.yml
# Відредагувати під свої ноди
vim ansible/inventory/production.yml
Крок 3: Bootstrap нод
cd ansible
# Перевірити з'єднання
ansible all -i inventory/production.yml -m ping
# Bootstrap
ansible-playbook -i inventory/production.yml playbooks/bootstrap.yml
# Hardening
ansible-playbook -i inventory/production.yml playbooks/hardening.yml
Крок 4: K3s кластер
# Встановити K3s
ansible-playbook -i inventory/production.yml playbooks/k3s-install.yml
# Перевірити
export KUBECONFIG=kubeconfig/node1.yaml
kubectl get nodes
Крок 5: Vault + Consul
# Vault
ansible-playbook -i inventory/production.yml playbooks/vault-setup.yml
# Consul (якщо multi-DC)
ansible-playbook -i inventory/production.yml playbooks/consul-setup.yml
Крок 6: Observability
# Prometheus + Grafana + Loki + Tempo
ansible-playbook -i inventory/production.yml playbooks/observability.yml
📋 Checklist
Phase 1: Foundation
- NODE1 security hardening
- NODE3 security hardening
- PostgreSQL on NODE1 & NODE3
- Ansible repository structure
- SSH key distribution
- Bootstrap playbook tested
Phase 2: K3s Cluster
- K3s on NODE1 (master)
- K3s on NODE3 (worker + GPU)
- CoreDNS configured
- Network policies
Phase 3: Secrets & Discovery
- Vault installed
- External Secrets Operator
- Consul (if needed for multi-DC)
Phase 4: Observability
- Prometheus
- Grafana
- Loki
- Tempo
- Alerting rules
Автор: Ivan Tytar & AI Assistant
Останнє оновлення: 2026-01-10