- Terraform + Ansible + K3s + Vault + Consul + Observability - Decentralized network architecture (own datacenters) - Complete Ansible playbooks: - bootstrap.yml: OS setup, packages, SSH - hardening.yml: Security (UFW, fail2ban, auditd, Trivy) - k3s-install.yml: Lightweight Kubernetes cluster - Production inventory with NODE1, NODE3 - Group variables for all nodes - Security check cron script - Multi-DC ready with Consul support
994 lines
27 KiB
Markdown
994 lines
27 KiB
Markdown
# 🏗️ DAARION Infrastructure Stack — Децентралізована мережа
|
||
|
||
**Версія:** 1.0.0
|
||
**Дата:** 2026-01-10
|
||
**Статус:** В процесі впровадження
|
||
|
||
---
|
||
|
||
## 🎯 Концепція
|
||
|
||
**Децентралізована мережа власних датацентрів та нод**, розподілених географічно:
|
||
- Без залежності від одного cloud-провайдера
|
||
- Гібридна інфраструктура (bare-metal + VM + K8s)
|
||
- Multi-DC архітектура з Consul для service discovery
|
||
|
||
---
|
||
|
||
## 📦 Technology Stack
|
||
|
||
```
|
||
┌─────────────────────────────────────────────────────────────────┐
|
||
│ INFRASTRUCTURE LAYER │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ Terraform │ Infrastructure as Code │
|
||
│ (networks, VPC, │ - Мережі, VPC, firewall rules │
|
||
│ LB, DNS, storage) │ - Load Balancers, DNS records │
|
||
│ │ - Storage provisioning │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ CONFIGURATION LAYER │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ Ansible │ Configuration Management │
|
||
│ (OS bootstrap, │ - SSH keys, users, packages │
|
||
│ hardening, k3s) │ - Security hardening │
|
||
│ │ - K3s/K8s cluster bootstrap │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ SECRETS LAYER │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ HashiCorp Vault │ Centralized Secrets Management │
|
||
│ + External Secrets │ - Database credentials │
|
||
│ Operator │ - API keys, certificates │
|
||
│ │ - Dynamic secrets rotation │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ ORCHESTRATION LAYER │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ K3s / Kubernetes │ Container Orchestration │
|
||
│ + CoreDNS │ - Lightweight K8s (k3s for edge) │
|
||
│ │ - Service discovery via CoreDNS │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ SERVICE DISCOVERY (Multi-DC) │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ Consul │ Multi-DC Service Discovery │
|
||
│ (for hybrid/ │ - Cross-datacenter discovery │
|
||
│ multi-DC) │ - Health checking │
|
||
│ │ - Service mesh (optional) │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ OBSERVABILITY LAYER │
|
||
├─────────────────────────────────────────────────────────────────┤
|
||
│ Prometheus │ Metrics collection & alerting │
|
||
│ Grafana │ Dashboards & visualization │
|
||
│ Loki │ Log aggregation │
|
||
│ Tempo │ Distributed tracing │
|
||
└─────────────────────────────────────────────────────────────────┘
|
||
```
|
||
|
||
---
|
||
|
||
## 🌍 Поточна мережа
|
||
|
||
| Node | Location | Type | Role | Status |
|
||
|------|----------|------|------|--------|
|
||
| **NODE1** | Hetzner DE | Dedicated | Master, Gateway | ✅ Active |
|
||
| **NODE2** | Local (Ivan) | MacBook M4 | Dev, Testing | ✅ Active |
|
||
| **NODE3** | Remote DC | Threadripper+RTX3090 | AI/ML, GPU | ✅ Active |
|
||
| **NODE4+** | TBD | Various | Compute | 🔜 Planned |
|
||
|
||
---
|
||
|
||
## 📁 Repository Structure
|
||
|
||
```
|
||
infrastructure/
|
||
├── terraform/
|
||
│ ├── modules/
|
||
│ │ ├── network/ # VPC, subnets, firewall
|
||
│ │ ├── compute/ # VMs, bare-metal provisioning
|
||
│ │ ├── dns/ # DNS records
|
||
│ │ ├── storage/ # Volumes, NFS, S3-compatible
|
||
│ │ └── load-balancer/ # HAProxy, Traefik configs
|
||
│ ├── environments/
|
||
│ │ ├── production/
|
||
│ │ ├── staging/
|
||
│ │ └── development/
|
||
│ └── main.tf
|
||
│
|
||
├── ansible/
|
||
│ ├── inventory/
|
||
│ │ ├── production.yml
|
||
│ │ ├── staging.yml
|
||
│ │ └── group_vars/
|
||
│ │ ├── all.yml
|
||
│ │ ├── masters.yml
|
||
│ │ ├── workers.yml
|
||
│ │ └── gpu_nodes.yml
|
||
│ ├── playbooks/
|
||
│ │ ├── bootstrap.yml # OS setup, SSH, packages
|
||
│ │ ├── hardening.yml # Security hardening
|
||
│ │ ├── k3s-install.yml # K3s cluster setup
|
||
│ │ ├── vault-setup.yml # Vault installation
|
||
│ │ ├── observability.yml # Prometheus/Grafana/Loki
|
||
│ │ └── consul-setup.yml # Consul for multi-DC
|
||
│ ├── roles/
|
||
│ │ ├── common/
|
||
│ │ ├── security/
|
||
│ │ ├── docker/
|
||
│ │ ├── k3s/
|
||
│ │ ├── vault/
|
||
│ │ ├── consul/
|
||
│ │ └── observability/
|
||
│ └── ansible.cfg
|
||
│
|
||
├── kubernetes/
|
||
│ ├── base/
|
||
│ │ ├── namespaces/
|
||
│ │ ├── rbac/
|
||
│ │ └── network-policies/
|
||
│ ├── apps/
|
||
│ │ ├── daarion-core/
|
||
│ │ ├── postgres/
|
||
│ │ ├── redis/
|
||
│ │ └── monitoring/
|
||
│ ├── external-secrets/
|
||
│ │ └── vault-backend.yml
|
||
│ └── kustomization.yaml
|
||
│
|
||
├── vault/
|
||
│ ├── policies/
|
||
│ ├── secrets-engines/
|
||
│ └── auth-methods/
|
||
│
|
||
├── consul/
|
||
│ ├── config/
|
||
│ └── services/
|
||
│
|
||
└── observability/
|
||
├── prometheus/
|
||
├── grafana/
|
||
├── loki/
|
||
└── tempo/
|
||
```
|
||
|
||
---
|
||
|
||
## 🚀 Phase 1: Базова інфраструктура
|
||
|
||
Почнемо з встановлення базового стеку на NODE1 та NODE3.
|
||
|
||
### 1.1 Ansible Inventory
|
||
|
||
```yaml
|
||
# ansible/inventory/production.yml
|
||
all:
|
||
vars:
|
||
ansible_python_interpreter: /usr/bin/python3
|
||
timezone: "UTC"
|
||
|
||
children:
|
||
masters:
|
||
hosts:
|
||
node1:
|
||
ansible_host: 144.76.224.179
|
||
ansible_user: root
|
||
node_role: master
|
||
datacenter: hetzner-de
|
||
|
||
workers:
|
||
hosts:
|
||
node3:
|
||
ansible_host: 80.77.35.151
|
||
ansible_port: 33147
|
||
ansible_user: zevs
|
||
ansible_become: yes
|
||
ansible_become_pass: "{{ vault_node3_password }}"
|
||
node_role: worker
|
||
datacenter: remote-dc
|
||
gpu: true
|
||
gpu_type: "rtx3090"
|
||
|
||
gpu_nodes:
|
||
hosts:
|
||
node3:
|
||
|
||
local_dev:
|
||
hosts:
|
||
node2:
|
||
ansible_host: 192.168.1.244
|
||
ansible_user: apple
|
||
node_role: development
|
||
datacenter: local
|
||
```
|
||
|
||
### 1.2 Bootstrap Playbook
|
||
|
||
```yaml
|
||
# ansible/playbooks/bootstrap.yml
|
||
---
|
||
- name: Bootstrap all nodes
|
||
hosts: all
|
||
become: yes
|
||
|
||
vars:
|
||
common_packages:
|
||
- curl
|
||
- wget
|
||
- git
|
||
- htop
|
||
- vim
|
||
- jq
|
||
- unzip
|
||
- ca-certificates
|
||
- gnupg
|
||
- lsb-release
|
||
|
||
tasks:
|
||
- name: Set timezone
|
||
timezone:
|
||
name: "{{ timezone }}"
|
||
|
||
- name: Update apt cache
|
||
apt:
|
||
update_cache: yes
|
||
cache_valid_time: 3600
|
||
when: ansible_os_family == "Debian"
|
||
|
||
- name: Install common packages
|
||
apt:
|
||
name: "{{ common_packages }}"
|
||
state: present
|
||
when: ansible_os_family == "Debian"
|
||
|
||
- name: Create admin group
|
||
group:
|
||
name: daarion-admin
|
||
state: present
|
||
|
||
- name: Setup SSH authorized keys
|
||
authorized_key:
|
||
user: "{{ ansible_user }}"
|
||
key: "{{ lookup('file', '~/.ssh/daarion_network.pub') }}"
|
||
state: present
|
||
|
||
- name: Disable password authentication
|
||
lineinfile:
|
||
path: /etc/ssh/sshd_config
|
||
regexp: '^#?PasswordAuthentication'
|
||
line: 'PasswordAuthentication no'
|
||
notify: restart sshd
|
||
|
||
- name: Set hostname
|
||
hostname:
|
||
name: "{{ inventory_hostname }}"
|
||
|
||
- name: Update /etc/hosts
|
||
lineinfile:
|
||
path: /etc/hosts
|
||
line: "{{ hostvars[item].ansible_host }} {{ item }}"
|
||
state: present
|
||
loop: "{{ groups['all'] }}"
|
||
when: hostvars[item].ansible_host is defined
|
||
|
||
handlers:
|
||
- name: restart sshd
|
||
service:
|
||
name: sshd
|
||
state: restarted
|
||
```
|
||
|
||
### 1.3 Security Hardening Playbook
|
||
|
||
```yaml
|
||
# ansible/playbooks/hardening.yml
|
||
---
|
||
- name: Security Hardening
|
||
hosts: all
|
||
become: yes
|
||
|
||
vars:
|
||
security_packages:
|
||
- fail2ban
|
||
- ufw
|
||
- auditd
|
||
- rkhunter
|
||
- unattended-upgrades
|
||
|
||
allowed_ssh_port: "{{ ansible_port | default(22) }}"
|
||
|
||
tasks:
|
||
- name: Install security packages
|
||
apt:
|
||
name: "{{ security_packages }}"
|
||
state: present
|
||
|
||
- name: Install Trivy
|
||
shell: |
|
||
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
|
||
args:
|
||
creates: /usr/local/bin/trivy
|
||
|
||
# UFW Configuration
|
||
- name: UFW - Default deny incoming
|
||
ufw:
|
||
direction: incoming
|
||
policy: deny
|
||
|
||
- name: UFW - Default deny outgoing
|
||
ufw:
|
||
direction: outgoing
|
||
policy: deny
|
||
|
||
- name: UFW - Allow SSH
|
||
ufw:
|
||
rule: allow
|
||
port: "{{ allowed_ssh_port }}"
|
||
proto: tcp
|
||
|
||
- name: UFW - Allow necessary outgoing
|
||
ufw:
|
||
rule: allow
|
||
direction: out
|
||
port: "{{ item.port }}"
|
||
proto: "{{ item.proto }}"
|
||
loop:
|
||
- { port: 53, proto: udp } # DNS
|
||
- { port: 80, proto: tcp } # HTTP
|
||
- { port: 443, proto: tcp } # HTTPS
|
||
- { port: 123, proto: udp } # NTP
|
||
|
||
- name: UFW - Allow K3s ports (masters)
|
||
ufw:
|
||
rule: allow
|
||
port: "{{ item }}"
|
||
proto: tcp
|
||
loop:
|
||
- 6443 # K3s API
|
||
- 10250 # Kubelet
|
||
when: "'masters' in group_names"
|
||
|
||
- name: UFW - Enable
|
||
ufw:
|
||
state: enabled
|
||
|
||
# Fail2ban
|
||
- name: Configure fail2ban
|
||
template:
|
||
src: templates/jail.local.j2
|
||
dest: /etc/fail2ban/jail.local
|
||
notify: restart fail2ban
|
||
|
||
# Kernel hardening
|
||
- name: Kernel hardening sysctl
|
||
sysctl:
|
||
name: "{{ item.name }}"
|
||
value: "{{ item.value }}"
|
||
state: present
|
||
reload: yes
|
||
loop:
|
||
- { name: 'net.ipv4.ip_forward', value: '1' } # Required for K8s
|
||
- { name: 'net.ipv4.conf.all.accept_redirects', value: '0' }
|
||
- { name: 'net.ipv4.conf.default.accept_redirects', value: '0' }
|
||
- { name: 'net.ipv4.tcp_syncookies', value: '1' }
|
||
- { name: 'kernel.randomize_va_space', value: '2' }
|
||
|
||
# Security check script
|
||
- name: Create scripts directory
|
||
file:
|
||
path: /opt/scripts
|
||
state: directory
|
||
mode: '0755'
|
||
|
||
- name: Deploy security check script
|
||
copy:
|
||
src: files/security-check.sh
|
||
dest: /opt/scripts/security-check.sh
|
||
mode: '0755'
|
||
|
||
- name: Setup security cron
|
||
cron:
|
||
name: "Hourly security check"
|
||
minute: "0"
|
||
job: "/opt/scripts/security-check.sh"
|
||
|
||
handlers:
|
||
- name: restart fail2ban
|
||
service:
|
||
name: fail2ban
|
||
state: restarted
|
||
```
|
||
|
||
### 1.4 K3s Installation Playbook
|
||
|
||
```yaml
|
||
# ansible/playbooks/k3s-install.yml
|
||
---
|
||
- name: Install K3s on Masters
|
||
hosts: masters
|
||
become: yes
|
||
|
||
vars:
|
||
k3s_version: "v1.29.0+k3s1"
|
||
|
||
tasks:
|
||
- name: Download K3s installer
|
||
get_url:
|
||
url: https://get.k3s.io
|
||
dest: /tmp/k3s-install.sh
|
||
mode: '0755'
|
||
|
||
- name: Install K3s server
|
||
shell: |
|
||
INSTALL_K3S_VERSION={{ k3s_version }} \
|
||
K3S_TOKEN={{ k3s_token }} \
|
||
sh /tmp/k3s-install.sh server \
|
||
--disable traefik \
|
||
--disable servicelb \
|
||
--write-kubeconfig-mode 644 \
|
||
--tls-san {{ ansible_host }} \
|
||
--node-label "datacenter={{ datacenter }}" \
|
||
--node-label "node-role={{ node_role }}"
|
||
args:
|
||
creates: /etc/rancher/k3s/k3s.yaml
|
||
|
||
- name: Wait for K3s to be ready
|
||
wait_for:
|
||
port: 6443
|
||
delay: 10
|
||
timeout: 300
|
||
|
||
- name: Get K3s token
|
||
slurp:
|
||
src: /var/lib/rancher/k3s/server/node-token
|
||
register: k3s_token_file
|
||
|
||
- name: Save K3s token
|
||
set_fact:
|
||
k3s_join_token: "{{ k3s_token_file.content | b64decode | trim }}"
|
||
|
||
- name: Fetch kubeconfig
|
||
fetch:
|
||
src: /etc/rancher/k3s/k3s.yaml
|
||
dest: "{{ playbook_dir }}/../kubeconfig/{{ inventory_hostname }}.yaml"
|
||
flat: yes
|
||
|
||
---
|
||
- name: Install K3s on Workers
|
||
hosts: workers
|
||
become: yes
|
||
|
||
vars:
|
||
k3s_version: "v1.29.0+k3s1"
|
||
k3s_master: "{{ hostvars[groups['masters'][0]].ansible_host }}"
|
||
|
||
tasks:
|
||
- name: Download K3s installer
|
||
get_url:
|
||
url: https://get.k3s.io
|
||
dest: /tmp/k3s-install.sh
|
||
mode: '0755'
|
||
|
||
- name: Install K3s agent
|
||
shell: |
|
||
INSTALL_K3S_VERSION={{ k3s_version }} \
|
||
K3S_URL=https://{{ k3s_master }}:6443 \
|
||
K3S_TOKEN={{ hostvars[groups['masters'][0]].k3s_join_token }} \
|
||
sh /tmp/k3s-install.sh agent \
|
||
--node-label "datacenter={{ datacenter }}" \
|
||
--node-label "node-role={{ node_role }}" \
|
||
{% if gpu is defined and gpu %}
|
||
--node-label "gpu=true" \
|
||
--node-label "gpu-type={{ gpu_type }}"
|
||
{% endif %}
|
||
args:
|
||
creates: /etc/rancher/k3s/k3s.yaml
|
||
```
|
||
|
||
---
|
||
|
||
## 🔐 Phase 2: Vault Setup
|
||
|
||
### 2.1 Vault Installation
|
||
|
||
```yaml
|
||
# ansible/playbooks/vault-setup.yml
|
||
---
|
||
- name: Install HashiCorp Vault
|
||
hosts: masters
|
||
become: yes
|
||
|
||
vars:
|
||
vault_version: "1.15.4"
|
||
vault_data_dir: "/opt/vault/data"
|
||
|
||
tasks:
|
||
- name: Create vault user
|
||
user:
|
||
name: vault
|
||
system: yes
|
||
shell: /bin/false
|
||
|
||
- name: Create vault directories
|
||
file:
|
||
path: "{{ item }}"
|
||
state: directory
|
||
owner: vault
|
||
group: vault
|
||
mode: '0750'
|
||
loop:
|
||
- /opt/vault
|
||
- /opt/vault/data
|
||
- /opt/vault/config
|
||
- /opt/vault/logs
|
||
|
||
- name: Download Vault
|
||
get_url:
|
||
url: "https://releases.hashicorp.com/vault/{{ vault_version }}/vault_{{ vault_version }}_linux_amd64.zip"
|
||
dest: /tmp/vault.zip
|
||
|
||
- name: Extract Vault
|
||
unarchive:
|
||
src: /tmp/vault.zip
|
||
dest: /usr/local/bin
|
||
remote_src: yes
|
||
|
||
- name: Vault configuration
|
||
template:
|
||
src: templates/vault.hcl.j2
|
||
dest: /opt/vault/config/vault.hcl
|
||
owner: vault
|
||
group: vault
|
||
notify: restart vault
|
||
|
||
- name: Vault systemd service
|
||
template:
|
||
src: templates/vault.service.j2
|
||
dest: /etc/systemd/system/vault.service
|
||
notify:
|
||
- reload systemd
|
||
- restart vault
|
||
|
||
- name: Enable and start Vault
|
||
service:
|
||
name: vault
|
||
enabled: yes
|
||
state: started
|
||
|
||
handlers:
|
||
- name: reload systemd
|
||
systemd:
|
||
daemon_reload: yes
|
||
|
||
- name: restart vault
|
||
service:
|
||
name: vault
|
||
state: restarted
|
||
```
|
||
|
||
### 2.2 Vault Configuration
|
||
|
||
```hcl
|
||
# ansible/templates/vault.hcl.j2
|
||
ui = true
|
||
|
||
storage "file" {
|
||
path = "/opt/vault/data"
|
||
}
|
||
|
||
listener "tcp" {
|
||
address = "0.0.0.0:8200"
|
||
tls_disable = "true" # Enable TLS in production!
|
||
}
|
||
|
||
api_addr = "http://{{ ansible_host }}:8200"
|
||
cluster_addr = "https://{{ ansible_host }}:8201"
|
||
```
|
||
|
||
### 2.3 External Secrets Operator
|
||
|
||
```yaml
|
||
# kubernetes/external-secrets/vault-backend.yml
|
||
apiVersion: external-secrets.io/v1beta1
|
||
kind: ClusterSecretStore
|
||
metadata:
|
||
name: vault-backend
|
||
spec:
|
||
provider:
|
||
vault:
|
||
server: "http://node1:8200"
|
||
path: "secret"
|
||
version: "v2"
|
||
auth:
|
||
kubernetes:
|
||
mountPath: "kubernetes"
|
||
role: "external-secrets"
|
||
serviceAccountRef:
|
||
name: "external-secrets"
|
||
namespace: "external-secrets"
|
||
|
||
---
|
||
apiVersion: external-secrets.io/v1beta1
|
||
kind: ExternalSecret
|
||
metadata:
|
||
name: postgres-credentials
|
||
namespace: daarion
|
||
spec:
|
||
refreshInterval: "1h"
|
||
secretStoreRef:
|
||
name: vault-backend
|
||
kind: ClusterSecretStore
|
||
target:
|
||
name: postgres-credentials
|
||
creationPolicy: Owner
|
||
data:
|
||
- secretKey: username
|
||
remoteRef:
|
||
key: secret/data/postgres
|
||
property: username
|
||
- secretKey: password
|
||
remoteRef:
|
||
key: secret/data/postgres
|
||
property: password
|
||
```
|
||
|
||
---
|
||
|
||
## 🔍 Phase 3: Consul (Multi-DC)
|
||
|
||
### 3.1 Consul Installation
|
||
|
||
```yaml
|
||
# ansible/playbooks/consul-setup.yml
|
||
---
|
||
- name: Install Consul
|
||
hosts: all
|
||
become: yes
|
||
|
||
vars:
|
||
consul_version: "1.17.1"
|
||
consul_datacenter: "{{ datacenter }}"
|
||
consul_is_server: "{{ 'masters' in group_names }}"
|
||
|
||
tasks:
|
||
- name: Create consul user
|
||
user:
|
||
name: consul
|
||
system: yes
|
||
shell: /bin/false
|
||
|
||
- name: Create consul directories
|
||
file:
|
||
path: "{{ item }}"
|
||
state: directory
|
||
owner: consul
|
||
group: consul
|
||
loop:
|
||
- /opt/consul
|
||
- /opt/consul/data
|
||
- /opt/consul/config
|
||
|
||
- name: Download Consul
|
||
get_url:
|
||
url: "https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_linux_amd64.zip"
|
||
dest: /tmp/consul.zip
|
||
|
||
- name: Extract Consul
|
||
unarchive:
|
||
src: /tmp/consul.zip
|
||
dest: /usr/local/bin
|
||
remote_src: yes
|
||
|
||
- name: Consul configuration
|
||
template:
|
||
src: templates/consul.hcl.j2
|
||
dest: /opt/consul/config/consul.hcl
|
||
owner: consul
|
||
group: consul
|
||
notify: restart consul
|
||
|
||
- name: Consul systemd service
|
||
template:
|
||
src: templates/consul.service.j2
|
||
dest: /etc/systemd/system/consul.service
|
||
notify:
|
||
- reload systemd
|
||
- restart consul
|
||
|
||
- name: Enable and start Consul
|
||
service:
|
||
name: consul
|
||
enabled: yes
|
||
state: started
|
||
|
||
handlers:
|
||
- name: reload systemd
|
||
systemd:
|
||
daemon_reload: yes
|
||
|
||
- name: restart consul
|
||
service:
|
||
name: consul
|
||
state: restarted
|
||
```
|
||
|
||
### 3.2 Consul Configuration
|
||
|
||
```hcl
|
||
# ansible/templates/consul.hcl.j2
|
||
datacenter = "{{ consul_datacenter }}"
|
||
data_dir = "/opt/consul/data"
|
||
log_level = "INFO"
|
||
node_name = "{{ inventory_hostname }}"
|
||
bind_addr = "{{ ansible_host }}"
|
||
client_addr = "0.0.0.0"
|
||
|
||
{% if consul_is_server %}
|
||
server = true
|
||
bootstrap_expect = {{ groups['masters'] | length }}
|
||
ui_config {
|
||
enabled = true
|
||
}
|
||
{% endif %}
|
||
|
||
# Join other servers
|
||
retry_join = [
|
||
{% for host in groups['masters'] %}
|
||
"{{ hostvars[host].ansible_host }}"{% if not loop.last %},{% endif %}
|
||
|
||
{% endfor %}
|
||
]
|
||
|
||
# WAN federation for multi-DC
|
||
{% if groups['masters'] | length > 1 %}
|
||
retry_join_wan = [
|
||
{% for host in groups['masters'] %}
|
||
"{{ hostvars[host].ansible_host }}"{% if not loop.last %},{% endif %}
|
||
|
||
{% endfor %}
|
||
]
|
||
{% endif %}
|
||
|
||
# Service mesh
|
||
connect {
|
||
enabled = true
|
||
}
|
||
|
||
# DNS
|
||
ports {
|
||
dns = 8600
|
||
}
|
||
|
||
# ACL (enable in production)
|
||
acl {
|
||
enabled = false
|
||
default_policy = "allow"
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 📊 Phase 4: Observability Stack
|
||
|
||
### 4.1 Prometheus + Grafana + Loki + Tempo
|
||
|
||
```yaml
|
||
# ansible/playbooks/observability.yml
|
||
---
|
||
- name: Deploy Observability Stack
|
||
hosts: masters
|
||
become: yes
|
||
|
||
tasks:
|
||
- name: Create monitoring namespace
|
||
kubernetes.core.k8s:
|
||
state: present
|
||
definition:
|
||
apiVersion: v1
|
||
kind: Namespace
|
||
metadata:
|
||
name: monitoring
|
||
|
||
- name: Add Prometheus Helm repo
|
||
kubernetes.core.helm_repository:
|
||
name: prometheus-community
|
||
repo_url: https://prometheus-community.github.io/helm-charts
|
||
|
||
- name: Add Grafana Helm repo
|
||
kubernetes.core.helm_repository:
|
||
name: grafana
|
||
repo_url: https://grafana.github.io/helm-charts
|
||
|
||
- name: Install kube-prometheus-stack
|
||
kubernetes.core.helm:
|
||
name: prometheus
|
||
chart_ref: prometheus-community/kube-prometheus-stack
|
||
release_namespace: monitoring
|
||
create_namespace: yes
|
||
values:
|
||
prometheus:
|
||
prometheusSpec:
|
||
retention: 30d
|
||
storageSpec:
|
||
volumeClaimTemplate:
|
||
spec:
|
||
accessModes: ["ReadWriteOnce"]
|
||
resources:
|
||
requests:
|
||
storage: 50Gi
|
||
grafana:
|
||
adminPassword: "{{ vault_grafana_password }}"
|
||
persistence:
|
||
enabled: true
|
||
size: 10Gi
|
||
|
||
- name: Install Loki
|
||
kubernetes.core.helm:
|
||
name: loki
|
||
chart_ref: grafana/loki-stack
|
||
release_namespace: monitoring
|
||
values:
|
||
loki:
|
||
persistence:
|
||
enabled: true
|
||
size: 50Gi
|
||
promtail:
|
||
enabled: true
|
||
|
||
- name: Install Tempo
|
||
kubernetes.core.helm:
|
||
name: tempo
|
||
chart_ref: grafana/tempo
|
||
release_namespace: monitoring
|
||
values:
|
||
tempo:
|
||
retention: 168h # 7 days
|
||
```
|
||
|
||
### 4.2 Grafana Dashboards
|
||
|
||
```yaml
|
||
# kubernetes/apps/monitoring/grafana-dashboards.yml
|
||
apiVersion: v1
|
||
kind: ConfigMap
|
||
metadata:
|
||
name: daarion-dashboards
|
||
namespace: monitoring
|
||
labels:
|
||
grafana_dashboard: "1"
|
||
data:
|
||
daarion-network.json: |
|
||
{
|
||
"dashboard": {
|
||
"title": "DAARION Network Overview",
|
||
"panels": [
|
||
{
|
||
"title": "Total Nodes",
|
||
"type": "stat",
|
||
"targets": [{"expr": "count(up{job=\"node-exporter\"})"}]
|
||
},
|
||
{
|
||
"title": "Nodes by Datacenter",
|
||
"type": "piechart",
|
||
"targets": [{"expr": "count by (datacenter) (up{job=\"node-exporter\"})"}]
|
||
},
|
||
{
|
||
"title": "GPU Nodes",
|
||
"type": "stat",
|
||
"targets": [{"expr": "count(up{job=\"node-exporter\", gpu=\"true\"})"}]
|
||
},
|
||
{
|
||
"title": "K3s Cluster Status",
|
||
"type": "stat",
|
||
"targets": [{"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\"})"}]
|
||
}
|
||
]
|
||
}
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## 🚀 Quick Start
|
||
|
||
### Крок 1: Підготовка
|
||
|
||
```bash
|
||
# Клонувати репозиторій
|
||
git clone git@github.com:IvanTytar/microdao-daarion.git
|
||
cd microdao-daarion/infrastructure
|
||
|
||
# Створити SSH ключ для мережі
|
||
ssh-keygen -t ed25519 -f ~/.ssh/daarion_network -C "daarion-network"
|
||
|
||
# Встановити Ansible
|
||
pip install ansible ansible-lint
|
||
|
||
# Встановити Terraform
|
||
brew install terraform # macOS
|
||
```
|
||
|
||
### Крок 2: Налаштування inventory
|
||
|
||
```bash
|
||
# Скопіювати приклад
|
||
cp ansible/inventory/example.yml ansible/inventory/production.yml
|
||
|
||
# Відредагувати під свої ноди
|
||
vim ansible/inventory/production.yml
|
||
```
|
||
|
||
### Крок 3: Bootstrap нод
|
||
|
||
```bash
|
||
cd ansible
|
||
|
||
# Перевірити з'єднання
|
||
ansible all -i inventory/production.yml -m ping
|
||
|
||
# Bootstrap
|
||
ansible-playbook -i inventory/production.yml playbooks/bootstrap.yml
|
||
|
||
# Hardening
|
||
ansible-playbook -i inventory/production.yml playbooks/hardening.yml
|
||
```
|
||
|
||
### Крок 4: K3s кластер
|
||
|
||
```bash
|
||
# Встановити K3s
|
||
ansible-playbook -i inventory/production.yml playbooks/k3s-install.yml
|
||
|
||
# Перевірити
|
||
export KUBECONFIG=kubeconfig/node1.yaml
|
||
kubectl get nodes
|
||
```
|
||
|
||
### Крок 5: Vault + Consul
|
||
|
||
```bash
|
||
# Vault
|
||
ansible-playbook -i inventory/production.yml playbooks/vault-setup.yml
|
||
|
||
# Consul (якщо multi-DC)
|
||
ansible-playbook -i inventory/production.yml playbooks/consul-setup.yml
|
||
```
|
||
|
||
### Крок 6: Observability
|
||
|
||
```bash
|
||
# Prometheus + Grafana + Loki + Tempo
|
||
ansible-playbook -i inventory/production.yml playbooks/observability.yml
|
||
```
|
||
|
||
---
|
||
|
||
## 📋 Checklist
|
||
|
||
### Phase 1: Foundation
|
||
- [x] NODE1 security hardening
|
||
- [x] NODE3 security hardening
|
||
- [x] PostgreSQL on NODE1 & NODE3
|
||
- [ ] Ansible repository structure
|
||
- [ ] SSH key distribution
|
||
- [ ] Bootstrap playbook tested
|
||
|
||
### Phase 2: K3s Cluster
|
||
- [ ] K3s on NODE1 (master)
|
||
- [ ] K3s on NODE3 (worker + GPU)
|
||
- [ ] CoreDNS configured
|
||
- [ ] Network policies
|
||
|
||
### Phase 3: Secrets & Discovery
|
||
- [ ] Vault installed
|
||
- [ ] External Secrets Operator
|
||
- [ ] Consul (if needed for multi-DC)
|
||
|
||
### Phase 4: Observability
|
||
- [ ] Prometheus
|
||
- [ ] Grafana
|
||
- [ ] Loki
|
||
- [ ] Tempo
|
||
- [ ] Alerting rules
|
||
|
||
---
|
||
|
||
**Автор:** Ivan Tytar & AI Assistant
|
||
**Останнє оновлення:** 2026-01-10
|