🏗️ Add DAARION Infrastructure Stack
- Terraform + Ansible + K3s + Vault + Consul + Observability - Decentralized network architecture (own datacenters) - Complete Ansible playbooks: - bootstrap.yml: OS setup, packages, SSH - hardening.yml: Security (UFW, fail2ban, auditd, Trivy) - k3s-install.yml: Lightweight Kubernetes cluster - Production inventory with NODE1, NODE3 - Group variables for all nodes - Security check cron script - Multi-DC ready with Consul support
This commit is contained in:
993
DAARION-INFRASTRUCTURE-STACK.md
Normal file
993
DAARION-INFRASTRUCTURE-STACK.md
Normal file
@@ -0,0 +1,993 @@
|
||||
# 🏗️ DAARION Infrastructure Stack — Децентралізована мережа
|
||||
|
||||
**Версія:** 1.0.0
|
||||
**Дата:** 2026-01-10
|
||||
**Статус:** В процесі впровадження
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Концепція
|
||||
|
||||
**Децентралізована мережа власних датацентрів та нод**, розподілених географічно:
|
||||
- Без залежності від одного cloud-провайдера
|
||||
- Гібридна інфраструктура (bare-metal + VM + K8s)
|
||||
- Multi-DC архітектура з Consul для service discovery
|
||||
|
||||
---
|
||||
|
||||
## 📦 Technology Stack
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ INFRASTRUCTURE LAYER │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ Terraform │ Infrastructure as Code │
|
||||
│ (networks, VPC, │ - Мережі, VPC, firewall rules │
|
||||
│ LB, DNS, storage) │ - Load Balancers, DNS records │
|
||||
│ │ - Storage provisioning │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ CONFIGURATION LAYER │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ Ansible │ Configuration Management │
|
||||
│ (OS bootstrap, │ - SSH keys, users, packages │
|
||||
│ hardening, k3s) │ - Security hardening │
|
||||
│ │ - K3s/K8s cluster bootstrap │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ SECRETS LAYER │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ HashiCorp Vault │ Centralized Secrets Management │
|
||||
│ + External Secrets │ - Database credentials │
|
||||
│ Operator │ - API keys, certificates │
|
||||
│ │ - Dynamic secrets rotation │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ ORCHESTRATION LAYER │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ K3s / Kubernetes │ Container Orchestration │
|
||||
│ + CoreDNS │ - Lightweight K8s (k3s for edge) │
|
||||
│ │ - Service discovery via CoreDNS │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ SERVICE DISCOVERY (Multi-DC) │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ Consul │ Multi-DC Service Discovery │
|
||||
│ (for hybrid/ │ - Cross-datacenter discovery │
|
||||
│ multi-DC) │ - Health checking │
|
||||
│ │ - Service mesh (optional) │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ OBSERVABILITY LAYER │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ Prometheus │ Metrics collection & alerting │
|
||||
│ Grafana │ Dashboards & visualization │
|
||||
│ Loki │ Log aggregation │
|
||||
│ Tempo │ Distributed tracing │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🌍 Поточна мережа
|
||||
|
||||
| Node | Location | Type | Role | Status |
|
||||
|------|----------|------|------|--------|
|
||||
| **NODE1** | Hetzner DE | Dedicated | Master, Gateway | ✅ Active |
|
||||
| **NODE2** | Local (Ivan) | MacBook M4 | Dev, Testing | ✅ Active |
|
||||
| **NODE3** | Remote DC | Threadripper+RTX3090 | AI/ML, GPU | ✅ Active |
|
||||
| **NODE4+** | TBD | Various | Compute | 🔜 Planned |
|
||||
|
||||
---
|
||||
|
||||
## 📁 Repository Structure
|
||||
|
||||
```
|
||||
infrastructure/
|
||||
├── terraform/
|
||||
│ ├── modules/
|
||||
│ │ ├── network/ # VPC, subnets, firewall
|
||||
│ │ ├── compute/ # VMs, bare-metal provisioning
|
||||
│ │ ├── dns/ # DNS records
|
||||
│ │ ├── storage/ # Volumes, NFS, S3-compatible
|
||||
│ │ └── load-balancer/ # HAProxy, Traefik configs
|
||||
│ ├── environments/
|
||||
│ │ ├── production/
|
||||
│ │ ├── staging/
|
||||
│ │ └── development/
|
||||
│ └── main.tf
|
||||
│
|
||||
├── ansible/
|
||||
│ ├── inventory/
|
||||
│ │ ├── production.yml
|
||||
│ │ ├── staging.yml
|
||||
│ │ └── group_vars/
|
||||
│ │ ├── all.yml
|
||||
│ │ ├── masters.yml
|
||||
│ │ ├── workers.yml
|
||||
│ │ └── gpu_nodes.yml
|
||||
│ ├── playbooks/
|
||||
│ │ ├── bootstrap.yml # OS setup, SSH, packages
|
||||
│ │ ├── hardening.yml # Security hardening
|
||||
│ │ ├── k3s-install.yml # K3s cluster setup
|
||||
│ │ ├── vault-setup.yml # Vault installation
|
||||
│ │ ├── observability.yml # Prometheus/Grafana/Loki
|
||||
│ │ └── consul-setup.yml # Consul for multi-DC
|
||||
│ ├── roles/
|
||||
│ │ ├── common/
|
||||
│ │ ├── security/
|
||||
│ │ ├── docker/
|
||||
│ │ ├── k3s/
|
||||
│ │ ├── vault/
|
||||
│ │ ├── consul/
|
||||
│ │ └── observability/
|
||||
│ └── ansible.cfg
|
||||
│
|
||||
├── kubernetes/
|
||||
│ ├── base/
|
||||
│ │ ├── namespaces/
|
||||
│ │ ├── rbac/
|
||||
│ │ └── network-policies/
|
||||
│ ├── apps/
|
||||
│ │ ├── daarion-core/
|
||||
│ │ ├── postgres/
|
||||
│ │ ├── redis/
|
||||
│ │ └── monitoring/
|
||||
│ ├── external-secrets/
|
||||
│ │ └── vault-backend.yml
|
||||
│ └── kustomization.yaml
|
||||
│
|
||||
├── vault/
|
||||
│ ├── policies/
|
||||
│ ├── secrets-engines/
|
||||
│ └── auth-methods/
|
||||
│
|
||||
├── consul/
|
||||
│ ├── config/
|
||||
│ └── services/
|
||||
│
|
||||
└── observability/
|
||||
├── prometheus/
|
||||
├── grafana/
|
||||
├── loki/
|
||||
└── tempo/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Phase 1: Базова інфраструктура
|
||||
|
||||
Почнемо з встановлення базового стеку на NODE1 та NODE3.
|
||||
|
||||
### 1.1 Ansible Inventory
|
||||
|
||||
```yaml
|
||||
# ansible/inventory/production.yml
|
||||
all:
|
||||
vars:
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
timezone: "UTC"
|
||||
|
||||
children:
|
||||
masters:
|
||||
hosts:
|
||||
node1:
|
||||
ansible_host: 144.76.224.179
|
||||
ansible_user: root
|
||||
node_role: master
|
||||
datacenter: hetzner-de
|
||||
|
||||
workers:
|
||||
hosts:
|
||||
node3:
|
||||
ansible_host: 80.77.35.151
|
||||
ansible_port: 33147
|
||||
ansible_user: zevs
|
||||
ansible_become: yes
|
||||
ansible_become_pass: "{{ vault_node3_password }}"
|
||||
node_role: worker
|
||||
datacenter: remote-dc
|
||||
gpu: true
|
||||
gpu_type: "rtx3090"
|
||||
|
||||
gpu_nodes:
|
||||
hosts:
|
||||
node3:
|
||||
|
||||
local_dev:
|
||||
hosts:
|
||||
node2:
|
||||
ansible_host: 192.168.1.244
|
||||
ansible_user: apple
|
||||
node_role: development
|
||||
datacenter: local
|
||||
```
|
||||
|
||||
### 1.2 Bootstrap Playbook
|
||||
|
||||
```yaml
|
||||
# ansible/playbooks/bootstrap.yml
|
||||
---
|
||||
- name: Bootstrap all nodes
|
||||
hosts: all
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
common_packages:
|
||||
- curl
|
||||
- wget
|
||||
- git
|
||||
- htop
|
||||
- vim
|
||||
- jq
|
||||
- unzip
|
||||
- ca-certificates
|
||||
- gnupg
|
||||
- lsb-release
|
||||
|
||||
tasks:
|
||||
- name: Set timezone
|
||||
timezone:
|
||||
name: "{{ timezone }}"
|
||||
|
||||
- name: Update apt cache
|
||||
apt:
|
||||
update_cache: yes
|
||||
cache_valid_time: 3600
|
||||
when: ansible_os_family == "Debian"
|
||||
|
||||
- name: Install common packages
|
||||
apt:
|
||||
name: "{{ common_packages }}"
|
||||
state: present
|
||||
when: ansible_os_family == "Debian"
|
||||
|
||||
- name: Create admin group
|
||||
group:
|
||||
name: daarion-admin
|
||||
state: present
|
||||
|
||||
- name: Setup SSH authorized keys
|
||||
authorized_key:
|
||||
user: "{{ ansible_user }}"
|
||||
key: "{{ lookup('file', '~/.ssh/daarion_network.pub') }}"
|
||||
state: present
|
||||
|
||||
- name: Disable password authentication
|
||||
lineinfile:
|
||||
path: /etc/ssh/sshd_config
|
||||
regexp: '^#?PasswordAuthentication'
|
||||
line: 'PasswordAuthentication no'
|
||||
notify: restart sshd
|
||||
|
||||
- name: Set hostname
|
||||
hostname:
|
||||
name: "{{ inventory_hostname }}"
|
||||
|
||||
- name: Update /etc/hosts
|
||||
lineinfile:
|
||||
path: /etc/hosts
|
||||
line: "{{ hostvars[item].ansible_host }} {{ item }}"
|
||||
state: present
|
||||
loop: "{{ groups['all'] }}"
|
||||
when: hostvars[item].ansible_host is defined
|
||||
|
||||
handlers:
|
||||
- name: restart sshd
|
||||
service:
|
||||
name: sshd
|
||||
state: restarted
|
||||
```
|
||||
|
||||
### 1.3 Security Hardening Playbook
|
||||
|
||||
```yaml
|
||||
# ansible/playbooks/hardening.yml
|
||||
---
|
||||
- name: Security Hardening
|
||||
hosts: all
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
security_packages:
|
||||
- fail2ban
|
||||
- ufw
|
||||
- auditd
|
||||
- rkhunter
|
||||
- unattended-upgrades
|
||||
|
||||
allowed_ssh_port: "{{ ansible_port | default(22) }}"
|
||||
|
||||
tasks:
|
||||
- name: Install security packages
|
||||
apt:
|
||||
name: "{{ security_packages }}"
|
||||
state: present
|
||||
|
||||
- name: Install Trivy
|
||||
shell: |
|
||||
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
|
||||
args:
|
||||
creates: /usr/local/bin/trivy
|
||||
|
||||
# UFW Configuration
|
||||
- name: UFW - Default deny incoming
|
||||
ufw:
|
||||
direction: incoming
|
||||
policy: deny
|
||||
|
||||
- name: UFW - Default deny outgoing
|
||||
ufw:
|
||||
direction: outgoing
|
||||
policy: deny
|
||||
|
||||
- name: UFW - Allow SSH
|
||||
ufw:
|
||||
rule: allow
|
||||
port: "{{ allowed_ssh_port }}"
|
||||
proto: tcp
|
||||
|
||||
- name: UFW - Allow necessary outgoing
|
||||
ufw:
|
||||
rule: allow
|
||||
direction: out
|
||||
port: "{{ item.port }}"
|
||||
proto: "{{ item.proto }}"
|
||||
loop:
|
||||
- { port: 53, proto: udp } # DNS
|
||||
- { port: 80, proto: tcp } # HTTP
|
||||
- { port: 443, proto: tcp } # HTTPS
|
||||
- { port: 123, proto: udp } # NTP
|
||||
|
||||
- name: UFW - Allow K3s ports (masters)
|
||||
ufw:
|
||||
rule: allow
|
||||
port: "{{ item }}"
|
||||
proto: tcp
|
||||
loop:
|
||||
- 6443 # K3s API
|
||||
- 10250 # Kubelet
|
||||
when: "'masters' in group_names"
|
||||
|
||||
- name: UFW - Enable
|
||||
ufw:
|
||||
state: enabled
|
||||
|
||||
# Fail2ban
|
||||
- name: Configure fail2ban
|
||||
template:
|
||||
src: templates/jail.local.j2
|
||||
dest: /etc/fail2ban/jail.local
|
||||
notify: restart fail2ban
|
||||
|
||||
# Kernel hardening
|
||||
- name: Kernel hardening sysctl
|
||||
sysctl:
|
||||
name: "{{ item.name }}"
|
||||
value: "{{ item.value }}"
|
||||
state: present
|
||||
reload: yes
|
||||
loop:
|
||||
- { name: 'net.ipv4.ip_forward', value: '1' } # Required for K8s
|
||||
- { name: 'net.ipv4.conf.all.accept_redirects', value: '0' }
|
||||
- { name: 'net.ipv4.conf.default.accept_redirects', value: '0' }
|
||||
- { name: 'net.ipv4.tcp_syncookies', value: '1' }
|
||||
- { name: 'kernel.randomize_va_space', value: '2' }
|
||||
|
||||
# Security check script
|
||||
- name: Create scripts directory
|
||||
file:
|
||||
path: /opt/scripts
|
||||
state: directory
|
||||
mode: '0755'
|
||||
|
||||
- name: Deploy security check script
|
||||
copy:
|
||||
src: files/security-check.sh
|
||||
dest: /opt/scripts/security-check.sh
|
||||
mode: '0755'
|
||||
|
||||
- name: Setup security cron
|
||||
cron:
|
||||
name: "Hourly security check"
|
||||
minute: "0"
|
||||
job: "/opt/scripts/security-check.sh"
|
||||
|
||||
handlers:
|
||||
- name: restart fail2ban
|
||||
service:
|
||||
name: fail2ban
|
||||
state: restarted
|
||||
```
|
||||
|
||||
### 1.4 K3s Installation Playbook
|
||||
|
||||
```yaml
|
||||
# ansible/playbooks/k3s-install.yml
|
||||
---
|
||||
- name: Install K3s on Masters
|
||||
hosts: masters
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
k3s_version: "v1.29.0+k3s1"
|
||||
|
||||
tasks:
|
||||
- name: Download K3s installer
|
||||
get_url:
|
||||
url: https://get.k3s.io
|
||||
dest: /tmp/k3s-install.sh
|
||||
mode: '0755'
|
||||
|
||||
- name: Install K3s server
|
||||
shell: |
|
||||
INSTALL_K3S_VERSION={{ k3s_version }} \
|
||||
K3S_TOKEN={{ k3s_token }} \
|
||||
sh /tmp/k3s-install.sh server \
|
||||
--disable traefik \
|
||||
--disable servicelb \
|
||||
--write-kubeconfig-mode 644 \
|
||||
--tls-san {{ ansible_host }} \
|
||||
--node-label "datacenter={{ datacenter }}" \
|
||||
--node-label "node-role={{ node_role }}"
|
||||
args:
|
||||
creates: /etc/rancher/k3s/k3s.yaml
|
||||
|
||||
- name: Wait for K3s to be ready
|
||||
wait_for:
|
||||
port: 6443
|
||||
delay: 10
|
||||
timeout: 300
|
||||
|
||||
- name: Get K3s token
|
||||
slurp:
|
||||
src: /var/lib/rancher/k3s/server/node-token
|
||||
register: k3s_token_file
|
||||
|
||||
- name: Save K3s token
|
||||
set_fact:
|
||||
k3s_join_token: "{{ k3s_token_file.content | b64decode | trim }}"
|
||||
|
||||
- name: Fetch kubeconfig
|
||||
fetch:
|
||||
src: /etc/rancher/k3s/k3s.yaml
|
||||
dest: "{{ playbook_dir }}/../kubeconfig/{{ inventory_hostname }}.yaml"
|
||||
flat: yes
|
||||
|
||||
---
|
||||
- name: Install K3s on Workers
|
||||
hosts: workers
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
k3s_version: "v1.29.0+k3s1"
|
||||
k3s_master: "{{ hostvars[groups['masters'][0]].ansible_host }}"
|
||||
|
||||
tasks:
|
||||
- name: Download K3s installer
|
||||
get_url:
|
||||
url: https://get.k3s.io
|
||||
dest: /tmp/k3s-install.sh
|
||||
mode: '0755'
|
||||
|
||||
- name: Install K3s agent
|
||||
shell: |
|
||||
INSTALL_K3S_VERSION={{ k3s_version }} \
|
||||
K3S_URL=https://{{ k3s_master }}:6443 \
|
||||
K3S_TOKEN={{ hostvars[groups['masters'][0]].k3s_join_token }} \
|
||||
sh /tmp/k3s-install.sh agent \
|
||||
--node-label "datacenter={{ datacenter }}" \
|
||||
--node-label "node-role={{ node_role }}" \
|
||||
{% if gpu is defined and gpu %}
|
||||
--node-label "gpu=true" \
|
||||
--node-label "gpu-type={{ gpu_type }}"
|
||||
{% endif %}
|
||||
args:
|
||||
creates: /etc/rancher/k3s/k3s.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔐 Phase 2: Vault Setup
|
||||
|
||||
### 2.1 Vault Installation
|
||||
|
||||
```yaml
|
||||
# ansible/playbooks/vault-setup.yml
|
||||
---
|
||||
- name: Install HashiCorp Vault
|
||||
hosts: masters
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
vault_version: "1.15.4"
|
||||
vault_data_dir: "/opt/vault/data"
|
||||
|
||||
tasks:
|
||||
- name: Create vault user
|
||||
user:
|
||||
name: vault
|
||||
system: yes
|
||||
shell: /bin/false
|
||||
|
||||
- name: Create vault directories
|
||||
file:
|
||||
path: "{{ item }}"
|
||||
state: directory
|
||||
owner: vault
|
||||
group: vault
|
||||
mode: '0750'
|
||||
loop:
|
||||
- /opt/vault
|
||||
- /opt/vault/data
|
||||
- /opt/vault/config
|
||||
- /opt/vault/logs
|
||||
|
||||
- name: Download Vault
|
||||
get_url:
|
||||
url: "https://releases.hashicorp.com/vault/{{ vault_version }}/vault_{{ vault_version }}_linux_amd64.zip"
|
||||
dest: /tmp/vault.zip
|
||||
|
||||
- name: Extract Vault
|
||||
unarchive:
|
||||
src: /tmp/vault.zip
|
||||
dest: /usr/local/bin
|
||||
remote_src: yes
|
||||
|
||||
- name: Vault configuration
|
||||
template:
|
||||
src: templates/vault.hcl.j2
|
||||
dest: /opt/vault/config/vault.hcl
|
||||
owner: vault
|
||||
group: vault
|
||||
notify: restart vault
|
||||
|
||||
- name: Vault systemd service
|
||||
template:
|
||||
src: templates/vault.service.j2
|
||||
dest: /etc/systemd/system/vault.service
|
||||
notify:
|
||||
- reload systemd
|
||||
- restart vault
|
||||
|
||||
- name: Enable and start Vault
|
||||
service:
|
||||
name: vault
|
||||
enabled: yes
|
||||
state: started
|
||||
|
||||
handlers:
|
||||
- name: reload systemd
|
||||
systemd:
|
||||
daemon_reload: yes
|
||||
|
||||
- name: restart vault
|
||||
service:
|
||||
name: vault
|
||||
state: restarted
|
||||
```
|
||||
|
||||
### 2.2 Vault Configuration
|
||||
|
||||
```hcl
|
||||
# ansible/templates/vault.hcl.j2
|
||||
ui = true
|
||||
|
||||
storage "file" {
|
||||
path = "/opt/vault/data"
|
||||
}
|
||||
|
||||
listener "tcp" {
|
||||
address = "0.0.0.0:8200"
|
||||
tls_disable = "true" # Enable TLS in production!
|
||||
}
|
||||
|
||||
api_addr = "http://{{ ansible_host }}:8200"
|
||||
cluster_addr = "https://{{ ansible_host }}:8201"
|
||||
```
|
||||
|
||||
### 2.3 External Secrets Operator
|
||||
|
||||
```yaml
|
||||
# kubernetes/external-secrets/vault-backend.yml
|
||||
apiVersion: external-secrets.io/v1beta1
|
||||
kind: ClusterSecretStore
|
||||
metadata:
|
||||
name: vault-backend
|
||||
spec:
|
||||
provider:
|
||||
vault:
|
||||
server: "http://node1:8200"
|
||||
path: "secret"
|
||||
version: "v2"
|
||||
auth:
|
||||
kubernetes:
|
||||
mountPath: "kubernetes"
|
||||
role: "external-secrets"
|
||||
serviceAccountRef:
|
||||
name: "external-secrets"
|
||||
namespace: "external-secrets"
|
||||
|
||||
---
|
||||
apiVersion: external-secrets.io/v1beta1
|
||||
kind: ExternalSecret
|
||||
metadata:
|
||||
name: postgres-credentials
|
||||
namespace: daarion
|
||||
spec:
|
||||
refreshInterval: "1h"
|
||||
secretStoreRef:
|
||||
name: vault-backend
|
||||
kind: ClusterSecretStore
|
||||
target:
|
||||
name: postgres-credentials
|
||||
creationPolicy: Owner
|
||||
data:
|
||||
- secretKey: username
|
||||
remoteRef:
|
||||
key: secret/data/postgres
|
||||
property: username
|
||||
- secretKey: password
|
||||
remoteRef:
|
||||
key: secret/data/postgres
|
||||
property: password
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔍 Phase 3: Consul (Multi-DC)
|
||||
|
||||
### 3.1 Consul Installation
|
||||
|
||||
```yaml
|
||||
# ansible/playbooks/consul-setup.yml
|
||||
---
|
||||
- name: Install Consul
|
||||
hosts: all
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
consul_version: "1.17.1"
|
||||
consul_datacenter: "{{ datacenter }}"
|
||||
consul_is_server: "{{ 'masters' in group_names }}"
|
||||
|
||||
tasks:
|
||||
- name: Create consul user
|
||||
user:
|
||||
name: consul
|
||||
system: yes
|
||||
shell: /bin/false
|
||||
|
||||
- name: Create consul directories
|
||||
file:
|
||||
path: "{{ item }}"
|
||||
state: directory
|
||||
owner: consul
|
||||
group: consul
|
||||
loop:
|
||||
- /opt/consul
|
||||
- /opt/consul/data
|
||||
- /opt/consul/config
|
||||
|
||||
- name: Download Consul
|
||||
get_url:
|
||||
url: "https://releases.hashicorp.com/consul/{{ consul_version }}/consul_{{ consul_version }}_linux_amd64.zip"
|
||||
dest: /tmp/consul.zip
|
||||
|
||||
- name: Extract Consul
|
||||
unarchive:
|
||||
src: /tmp/consul.zip
|
||||
dest: /usr/local/bin
|
||||
remote_src: yes
|
||||
|
||||
- name: Consul configuration
|
||||
template:
|
||||
src: templates/consul.hcl.j2
|
||||
dest: /opt/consul/config/consul.hcl
|
||||
owner: consul
|
||||
group: consul
|
||||
notify: restart consul
|
||||
|
||||
- name: Consul systemd service
|
||||
template:
|
||||
src: templates/consul.service.j2
|
||||
dest: /etc/systemd/system/consul.service
|
||||
notify:
|
||||
- reload systemd
|
||||
- restart consul
|
||||
|
||||
- name: Enable and start Consul
|
||||
service:
|
||||
name: consul
|
||||
enabled: yes
|
||||
state: started
|
||||
|
||||
handlers:
|
||||
- name: reload systemd
|
||||
systemd:
|
||||
daemon_reload: yes
|
||||
|
||||
- name: restart consul
|
||||
service:
|
||||
name: consul
|
||||
state: restarted
|
||||
```
|
||||
|
||||
### 3.2 Consul Configuration
|
||||
|
||||
```hcl
|
||||
# ansible/templates/consul.hcl.j2
|
||||
datacenter = "{{ consul_datacenter }}"
|
||||
data_dir = "/opt/consul/data"
|
||||
log_level = "INFO"
|
||||
node_name = "{{ inventory_hostname }}"
|
||||
bind_addr = "{{ ansible_host }}"
|
||||
client_addr = "0.0.0.0"
|
||||
|
||||
{% if consul_is_server %}
|
||||
server = true
|
||||
bootstrap_expect = {{ groups['masters'] | length }}
|
||||
ui_config {
|
||||
enabled = true
|
||||
}
|
||||
{% endif %}
|
||||
|
||||
# Join other servers
|
||||
retry_join = [
|
||||
{% for host in groups['masters'] %}
|
||||
"{{ hostvars[host].ansible_host }}"{% if not loop.last %},{% endif %}
|
||||
|
||||
{% endfor %}
|
||||
]
|
||||
|
||||
# WAN federation for multi-DC
|
||||
{% if groups['masters'] | length > 1 %}
|
||||
retry_join_wan = [
|
||||
{% for host in groups['masters'] %}
|
||||
"{{ hostvars[host].ansible_host }}"{% if not loop.last %},{% endif %}
|
||||
|
||||
{% endfor %}
|
||||
]
|
||||
{% endif %}
|
||||
|
||||
# Service mesh
|
||||
connect {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# DNS
|
||||
ports {
|
||||
dns = 8600
|
||||
}
|
||||
|
||||
# ACL (enable in production)
|
||||
acl {
|
||||
enabled = false
|
||||
default_policy = "allow"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Phase 4: Observability Stack
|
||||
|
||||
### 4.1 Prometheus + Grafana + Loki + Tempo
|
||||
|
||||
```yaml
|
||||
# ansible/playbooks/observability.yml
|
||||
---
|
||||
- name: Deploy Observability Stack
|
||||
hosts: masters
|
||||
become: yes
|
||||
|
||||
tasks:
|
||||
- name: Create monitoring namespace
|
||||
kubernetes.core.k8s:
|
||||
state: present
|
||||
definition:
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: monitoring
|
||||
|
||||
- name: Add Prometheus Helm repo
|
||||
kubernetes.core.helm_repository:
|
||||
name: prometheus-community
|
||||
repo_url: https://prometheus-community.github.io/helm-charts
|
||||
|
||||
- name: Add Grafana Helm repo
|
||||
kubernetes.core.helm_repository:
|
||||
name: grafana
|
||||
repo_url: https://grafana.github.io/helm-charts
|
||||
|
||||
- name: Install kube-prometheus-stack
|
||||
kubernetes.core.helm:
|
||||
name: prometheus
|
||||
chart_ref: prometheus-community/kube-prometheus-stack
|
||||
release_namespace: monitoring
|
||||
create_namespace: yes
|
||||
values:
|
||||
prometheus:
|
||||
prometheusSpec:
|
||||
retention: 30d
|
||||
storageSpec:
|
||||
volumeClaimTemplate:
|
||||
spec:
|
||||
accessModes: ["ReadWriteOnce"]
|
||||
resources:
|
||||
requests:
|
||||
storage: 50Gi
|
||||
grafana:
|
||||
adminPassword: "{{ vault_grafana_password }}"
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 10Gi
|
||||
|
||||
- name: Install Loki
|
||||
kubernetes.core.helm:
|
||||
name: loki
|
||||
chart_ref: grafana/loki-stack
|
||||
release_namespace: monitoring
|
||||
values:
|
||||
loki:
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 50Gi
|
||||
promtail:
|
||||
enabled: true
|
||||
|
||||
- name: Install Tempo
|
||||
kubernetes.core.helm:
|
||||
name: tempo
|
||||
chart_ref: grafana/tempo
|
||||
release_namespace: monitoring
|
||||
values:
|
||||
tempo:
|
||||
retention: 168h # 7 days
|
||||
```
|
||||
|
||||
### 4.2 Grafana Dashboards
|
||||
|
||||
```yaml
|
||||
# kubernetes/apps/monitoring/grafana-dashboards.yml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: daarion-dashboards
|
||||
namespace: monitoring
|
||||
labels:
|
||||
grafana_dashboard: "1"
|
||||
data:
|
||||
daarion-network.json: |
|
||||
{
|
||||
"dashboard": {
|
||||
"title": "DAARION Network Overview",
|
||||
"panels": [
|
||||
{
|
||||
"title": "Total Nodes",
|
||||
"type": "stat",
|
||||
"targets": [{"expr": "count(up{job=\"node-exporter\"})"}]
|
||||
},
|
||||
{
|
||||
"title": "Nodes by Datacenter",
|
||||
"type": "piechart",
|
||||
"targets": [{"expr": "count by (datacenter) (up{job=\"node-exporter\"})"}]
|
||||
},
|
||||
{
|
||||
"title": "GPU Nodes",
|
||||
"type": "stat",
|
||||
"targets": [{"expr": "count(up{job=\"node-exporter\", gpu=\"true\"})"}]
|
||||
},
|
||||
{
|
||||
"title": "K3s Cluster Status",
|
||||
"type": "stat",
|
||||
"targets": [{"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\"})"}]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Quick Start
|
||||
|
||||
### Крок 1: Підготовка
|
||||
|
||||
```bash
|
||||
# Клонувати репозиторій
|
||||
git clone git@github.com:IvanTytar/microdao-daarion.git
|
||||
cd microdao-daarion/infrastructure
|
||||
|
||||
# Створити SSH ключ для мережі
|
||||
ssh-keygen -t ed25519 -f ~/.ssh/daarion_network -C "daarion-network"
|
||||
|
||||
# Встановити Ansible
|
||||
pip install ansible ansible-lint
|
||||
|
||||
# Встановити Terraform
|
||||
brew install terraform # macOS
|
||||
```
|
||||
|
||||
### Крок 2: Налаштування inventory
|
||||
|
||||
```bash
|
||||
# Скопіювати приклад
|
||||
cp ansible/inventory/example.yml ansible/inventory/production.yml
|
||||
|
||||
# Відредагувати під свої ноди
|
||||
vim ansible/inventory/production.yml
|
||||
```
|
||||
|
||||
### Крок 3: Bootstrap нод
|
||||
|
||||
```bash
|
||||
cd ansible
|
||||
|
||||
# Перевірити з'єднання
|
||||
ansible all -i inventory/production.yml -m ping
|
||||
|
||||
# Bootstrap
|
||||
ansible-playbook -i inventory/production.yml playbooks/bootstrap.yml
|
||||
|
||||
# Hardening
|
||||
ansible-playbook -i inventory/production.yml playbooks/hardening.yml
|
||||
```
|
||||
|
||||
### Крок 4: K3s кластер
|
||||
|
||||
```bash
|
||||
# Встановити K3s
|
||||
ansible-playbook -i inventory/production.yml playbooks/k3s-install.yml
|
||||
|
||||
# Перевірити
|
||||
export KUBECONFIG=kubeconfig/node1.yaml
|
||||
kubectl get nodes
|
||||
```
|
||||
|
||||
### Крок 5: Vault + Consul
|
||||
|
||||
```bash
|
||||
# Vault
|
||||
ansible-playbook -i inventory/production.yml playbooks/vault-setup.yml
|
||||
|
||||
# Consul (якщо multi-DC)
|
||||
ansible-playbook -i inventory/production.yml playbooks/consul-setup.yml
|
||||
```
|
||||
|
||||
### Крок 6: Observability
|
||||
|
||||
```bash
|
||||
# Prometheus + Grafana + Loki + Tempo
|
||||
ansible-playbook -i inventory/production.yml playbooks/observability.yml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📋 Checklist
|
||||
|
||||
### Phase 1: Foundation
|
||||
- [x] NODE1 security hardening
|
||||
- [x] NODE3 security hardening
|
||||
- [x] PostgreSQL on NODE1 & NODE3
|
||||
- [ ] Ansible repository structure
|
||||
- [ ] SSH key distribution
|
||||
- [ ] Bootstrap playbook tested
|
||||
|
||||
### Phase 2: K3s Cluster
|
||||
- [ ] K3s on NODE1 (master)
|
||||
- [ ] K3s on NODE3 (worker + GPU)
|
||||
- [ ] CoreDNS configured
|
||||
- [ ] Network policies
|
||||
|
||||
### Phase 3: Secrets & Discovery
|
||||
- [ ] Vault installed
|
||||
- [ ] External Secrets Operator
|
||||
- [ ] Consul (if needed for multi-DC)
|
||||
|
||||
### Phase 4: Observability
|
||||
- [ ] Prometheus
|
||||
- [ ] Grafana
|
||||
- [ ] Loki
|
||||
- [ ] Tempo
|
||||
- [ ] Alerting rules
|
||||
|
||||
---
|
||||
|
||||
**Автор:** Ivan Tytar & AI Assistant
|
||||
**Останнє оновлення:** 2026-01-10
|
||||
@@ -1,633 +0,0 @@
|
||||
# 🌐 План розгортання мережі 150 нод — DAARION Network
|
||||
|
||||
**Версія:** 1.0.0
|
||||
**Дата:** 2026-01-10
|
||||
**Статус:** Планування
|
||||
|
||||
---
|
||||
|
||||
## 📋 Зміст
|
||||
|
||||
1. [Архітектура мережі](#архітектура-мережі)
|
||||
2. [Централізоване управління](#централізоване-управління)
|
||||
3. [Автоматизація розгортання](#автоматизація-розгортання)
|
||||
4. [Безпека мережі](#безпека-мережі)
|
||||
5. [Моніторинг та алерти](#моніторинг-та-алерти)
|
||||
6. [Roadmap](#roadmap)
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Архітектура мережі
|
||||
|
||||
### Ієрархія нод
|
||||
|
||||
```
|
||||
┌─────────────────┐
|
||||
│ MASTER NODE │
|
||||
│ (NODE1) │
|
||||
│ Hetzner │
|
||||
└────────┬────────┘
|
||||
│
|
||||
┌────────────────┼────────────────┐
|
||||
│ │ │
|
||||
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
|
||||
│ REGION EU │ │ REGION US │ │ REGION ASIA │
|
||||
│ Controller │ │ Controller │ │ Controller │
|
||||
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
|
||||
│ │ │
|
||||
┌───────┼───────┐ ┌────┼────┐ ┌──────┼──────┐
|
||||
│ │ │ │ │ │ │ │ │
|
||||
50 50 50 25 25 25 25 25 25
|
||||
nodes nodes nodes nodes nodes nodes nodes nodes nodes
|
||||
```
|
||||
|
||||
### Типи нод
|
||||
|
||||
| Тип | Кількість | Роль | Ресурси |
|
||||
|-----|-----------|------|---------|
|
||||
| **Master** | 1 | Центральне управління, GitOps | 8 CPU, 32GB RAM |
|
||||
| **Region Controller** | 3-5 | Регіональне управління | 4 CPU, 16GB RAM |
|
||||
| **Compute Node** | ~140 | Обчислення, AI workloads | 2-8 CPU, 8-64GB RAM |
|
||||
| **GPU Node** | ~5 | AI/ML inference | GPU + 32GB+ RAM |
|
||||
|
||||
---
|
||||
|
||||
## 🎛️ Централізоване управління
|
||||
|
||||
### Інструменти
|
||||
|
||||
| Інструмент | Призначення | Альтернатива |
|
||||
|------------|-------------|--------------|
|
||||
| **Ansible** | Configuration Management | Salt, Puppet |
|
||||
| **Terraform** | Infrastructure as Code | Pulumi |
|
||||
| **Kubernetes** | Container Orchestration | Docker Swarm |
|
||||
| **Consul** | Service Discovery | etcd |
|
||||
| **Vault** | Secrets Management | AWS Secrets Manager |
|
||||
| **Prometheus** | Metrics | InfluxDB |
|
||||
| **Grafana** | Dashboards | - |
|
||||
| **Loki** | Logs | ELK Stack |
|
||||
|
||||
### Ansible Inventory Structure
|
||||
|
||||
```yaml
|
||||
# inventory/production.yml
|
||||
all:
|
||||
children:
|
||||
masters:
|
||||
hosts:
|
||||
node1-master:
|
||||
ansible_host: 144.76.224.179
|
||||
ansible_user: root
|
||||
|
||||
region_controllers:
|
||||
hosts:
|
||||
node3-eu:
|
||||
ansible_host: 80.77.35.151
|
||||
ansible_port: 33147
|
||||
ansible_user: zevs
|
||||
ansible_become_pass: "{{ vault_node3_password }}"
|
||||
|
||||
compute_nodes:
|
||||
children:
|
||||
eu_nodes:
|
||||
hosts:
|
||||
node-eu-[001:050]:
|
||||
ansible_host: "{{ inventory_hostname }}.daarion.network"
|
||||
us_nodes:
|
||||
hosts:
|
||||
node-us-[001:050]:
|
||||
ansible_host: "{{ inventory_hostname }}.daarion.network"
|
||||
asia_nodes:
|
||||
hosts:
|
||||
node-asia-[001:050]:
|
||||
ansible_host: "{{ inventory_hostname }}.daarion.network"
|
||||
|
||||
gpu_nodes:
|
||||
hosts:
|
||||
gpu-[01:05]:
|
||||
ansible_host: "{{ inventory_hostname }}.daarion.network"
|
||||
```
|
||||
|
||||
### Ansible Playbook: Security Setup
|
||||
|
||||
```yaml
|
||||
# playbooks/security-setup.yml
|
||||
---
|
||||
- name: Security Setup for All Nodes
|
||||
hosts: all
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
security_packages:
|
||||
- fail2ban
|
||||
- auditd
|
||||
- rkhunter
|
||||
- chkrootkit
|
||||
- ufw
|
||||
|
||||
tasks:
|
||||
- name: Update apt cache
|
||||
apt:
|
||||
update_cache: yes
|
||||
cache_valid_time: 3600
|
||||
|
||||
- name: Install security packages
|
||||
apt:
|
||||
name: "{{ security_packages }}"
|
||||
state: present
|
||||
|
||||
- name: Install Trivy
|
||||
shell: |
|
||||
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
|
||||
args:
|
||||
creates: /usr/local/bin/trivy
|
||||
|
||||
- name: Configure fail2ban
|
||||
template:
|
||||
src: templates/jail.local.j2
|
||||
dest: /etc/fail2ban/jail.local
|
||||
notify: restart fail2ban
|
||||
|
||||
- name: Configure UFW defaults
|
||||
ufw:
|
||||
direction: "{{ item.direction }}"
|
||||
policy: "{{ item.policy }}"
|
||||
loop:
|
||||
- { direction: incoming, policy: deny }
|
||||
- { direction: outgoing, policy: deny }
|
||||
|
||||
- name: Allow SSH
|
||||
ufw:
|
||||
rule: allow
|
||||
port: "{{ ansible_port | default(22) }}"
|
||||
proto: tcp
|
||||
|
||||
- name: Allow necessary outgoing
|
||||
ufw:
|
||||
rule: allow
|
||||
direction: out
|
||||
port: "{{ item }}"
|
||||
proto: "{{ item.proto | default('tcp') }}"
|
||||
loop:
|
||||
- { port: 53, proto: udp }
|
||||
- { port: 80 }
|
||||
- { port: 443 }
|
||||
- { port: 123, proto: udp }
|
||||
|
||||
- name: Block internal networks
|
||||
ufw:
|
||||
rule: deny
|
||||
direction: out
|
||||
to_ip: "{{ item }}"
|
||||
loop:
|
||||
- 10.0.0.0/8
|
||||
- 172.16.0.0/12
|
||||
|
||||
- name: Enable UFW
|
||||
ufw:
|
||||
state: enabled
|
||||
|
||||
- name: Copy security check script
|
||||
copy:
|
||||
src: files/security-check.sh
|
||||
dest: /opt/scripts/security-check.sh
|
||||
mode: '0755'
|
||||
|
||||
- name: Setup security cron
|
||||
cron:
|
||||
name: "Security check"
|
||||
minute: "0"
|
||||
job: "/opt/scripts/security-check.sh"
|
||||
|
||||
handlers:
|
||||
- name: restart fail2ban
|
||||
service:
|
||||
name: fail2ban
|
||||
state: restarted
|
||||
```
|
||||
|
||||
### Ansible Playbook: PostgreSQL Deployment
|
||||
|
||||
```yaml
|
||||
# playbooks/postgresql-deploy.yml
|
||||
---
|
||||
- name: Deploy PostgreSQL to Nodes
|
||||
hosts: database_nodes
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
postgres_image: "postgres@sha256:23e88eb049fd5d54894d70100df61d38a49ed97909263f79d4ff4c30a5d5fca2"
|
||||
postgres_user: "daarion"
|
||||
postgres_password: "{{ vault_postgres_password }}"
|
||||
postgres_db: "daarion_main"
|
||||
|
||||
tasks:
|
||||
- name: Pull PostgreSQL image
|
||||
docker_image:
|
||||
name: "{{ postgres_image }}"
|
||||
source: pull
|
||||
|
||||
- name: Scan image with Trivy
|
||||
command: trivy image --severity HIGH,CRITICAL --exit-code 1 {{ postgres_image }}
|
||||
register: trivy_result
|
||||
failed_when: trivy_result.rc != 0
|
||||
|
||||
- name: Create PostgreSQL volume
|
||||
docker_volume:
|
||||
name: "postgres_data_{{ inventory_hostname }}"
|
||||
|
||||
- name: Run PostgreSQL container
|
||||
docker_container:
|
||||
name: dagi-postgres
|
||||
image: "{{ postgres_image }}"
|
||||
state: started
|
||||
restart_policy: "no"
|
||||
security_opts:
|
||||
- no-new-privileges:true
|
||||
read_only: yes
|
||||
tmpfs:
|
||||
- /tmp:noexec,nosuid,nodev,size=100m
|
||||
- /var/run/postgresql:noexec,nosuid,nodev,size=10m
|
||||
volumes:
|
||||
- "postgres_data_{{ inventory_hostname }}:/var/lib/postgresql/data"
|
||||
env:
|
||||
POSTGRES_USER: "{{ postgres_user }}"
|
||||
POSTGRES_PASSWORD: "{{ postgres_password }}"
|
||||
POSTGRES_DB: "{{ postgres_db }}"
|
||||
cpus: 2
|
||||
memory: 2g
|
||||
ports:
|
||||
- "5432:5432"
|
||||
|
||||
- name: Wait for PostgreSQL to be ready
|
||||
wait_for:
|
||||
host: localhost
|
||||
port: 5432
|
||||
delay: 5
|
||||
timeout: 60
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Автоматизація розгортання
|
||||
|
||||
### GitOps Workflow
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ GitHub │────▶│ ArgoCD │────▶│ Kubernetes │
|
||||
│ (configs) │ │ (GitOps) │ │ (runtime) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│ │ │
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Terraform │────▶│ Ansible │────▶│ Nodes │
|
||||
│ (infra) │ │ (config) │ │ (150) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
### Terraform: Node Provisioning
|
||||
|
||||
```hcl
|
||||
# terraform/main.tf
|
||||
terraform {
|
||||
required_providers {
|
||||
hcloud = {
|
||||
source = "hetznercloud/hcloud"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
variable "hcloud_token" {
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "node_count" {
|
||||
default = 50
|
||||
}
|
||||
|
||||
provider "hcloud" {
|
||||
token = var.hcloud_token
|
||||
}
|
||||
|
||||
resource "hcloud_ssh_key" "default" {
|
||||
name = "daarion-network"
|
||||
public_key = file("~/.ssh/daarion_network.pub")
|
||||
}
|
||||
|
||||
resource "hcloud_server" "compute_nodes" {
|
||||
count = var.node_count
|
||||
name = "node-eu-${format("%03d", count.index + 1)}"
|
||||
server_type = "cx31" # 2 vCPU, 8GB RAM
|
||||
image = "ubuntu-24.04"
|
||||
location = "nbg1"
|
||||
ssh_keys = [hcloud_ssh_key.default.id]
|
||||
|
||||
labels = {
|
||||
role = "compute"
|
||||
region = "eu"
|
||||
managed = "terraform"
|
||||
}
|
||||
|
||||
user_data = <<-EOF
|
||||
#cloud-config
|
||||
packages:
|
||||
- docker.io
|
||||
- fail2ban
|
||||
- ufw
|
||||
runcmd:
|
||||
- systemctl enable docker
|
||||
- systemctl start docker
|
||||
- ufw default deny incoming
|
||||
- ufw default deny outgoing
|
||||
- ufw allow 22/tcp
|
||||
- ufw allow out 53/udp
|
||||
- ufw allow out 443/tcp
|
||||
- ufw --force enable
|
||||
EOF
|
||||
}
|
||||
|
||||
output "node_ips" {
|
||||
value = hcloud_server.compute_nodes[*].ipv4_address
|
||||
}
|
||||
```
|
||||
|
||||
### Deployment Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# scripts/deploy-network.sh
|
||||
|
||||
set -e
|
||||
|
||||
NODES_COUNT=${1:-10}
|
||||
REGION=${2:-eu}
|
||||
|
||||
echo "🚀 Deploying $NODES_COUNT nodes in $REGION region..."
|
||||
|
||||
# 1. Provision infrastructure
|
||||
echo "[1/5] Provisioning infrastructure..."
|
||||
cd terraform
|
||||
terraform init
|
||||
terraform apply -var="node_count=$NODES_COUNT" -auto-approve
|
||||
cd ..
|
||||
|
||||
# 2. Wait for nodes to be ready
|
||||
echo "[2/5] Waiting for nodes..."
|
||||
sleep 60
|
||||
|
||||
# 3. Update Ansible inventory
|
||||
echo "[3/5] Updating inventory..."
|
||||
terraform output -json node_ips | jq -r '.[]' > inventory/hosts_$REGION.txt
|
||||
|
||||
# 4. Run security setup
|
||||
echo "[4/5] Running security setup..."
|
||||
ansible-playbook -i inventory/production.yml playbooks/security-setup.yml --limit "$REGION_nodes"
|
||||
|
||||
# 5. Deploy services
|
||||
echo "[5/5] Deploying services..."
|
||||
ansible-playbook -i inventory/production.yml playbooks/services-deploy.yml --limit "$REGION_nodes"
|
||||
|
||||
echo "✅ Deployment complete!"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Безпека мережі
|
||||
|
||||
### Zero Trust Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ ZERO TRUST LAYER │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ mTLS │ │ RBAC │ │ Network │ │ Secrets │ │
|
||||
│ │ │ │ │ │ Policy │ │ Vault │ │
|
||||
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ SERVICE MESH (Istio) │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ Node N │ │
|
||||
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Security Policies
|
||||
|
||||
```yaml
|
||||
# k8s/network-policy.yml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: default-deny-all
|
||||
spec:
|
||||
podSelector: {}
|
||||
policyTypes:
|
||||
- Ingress
|
||||
- Egress
|
||||
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: allow-postgres
|
||||
spec:
|
||||
podSelector:
|
||||
matchLabels:
|
||||
app: postgres
|
||||
ingress:
|
||||
- from:
|
||||
- podSelector:
|
||||
matchLabels:
|
||||
access: postgres
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 5432
|
||||
```
|
||||
|
||||
### Vault Integration
|
||||
|
||||
```yaml
|
||||
# vault/postgres-policy.hcl
|
||||
path "database/creds/daarion-db" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
|
||||
path "secret/data/postgres/*" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
# Отримання credentials
|
||||
vault read database/creds/daarion-db
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Моніторинг та алерти
|
||||
|
||||
### Prometheus Federation
|
||||
|
||||
```yaml
|
||||
# prometheus/federation.yml
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: 'federate'
|
||||
scrape_interval: 30s
|
||||
honor_labels: true
|
||||
metrics_path: '/federate'
|
||||
params:
|
||||
'match[]':
|
||||
- '{job="node"}'
|
||||
- '{job="docker"}'
|
||||
- '{job="postgres"}'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'node-eu-001:9090'
|
||||
- 'node-eu-002:9090'
|
||||
# ... all nodes
|
||||
```
|
||||
|
||||
### Grafana Dashboard
|
||||
|
||||
```json
|
||||
{
|
||||
"dashboard": {
|
||||
"title": "DAARION Network Overview",
|
||||
"panels": [
|
||||
{
|
||||
"title": "Total Nodes",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(up{job=\"node\"})"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Healthy Nodes",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(up{job=\"node\"} == 1)"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Security Alerts",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(security_alerts_total)"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Alert Rules
|
||||
|
||||
```yaml
|
||||
# prometheus/alerts.yml
|
||||
groups:
|
||||
- name: network
|
||||
rules:
|
||||
- alert: NodeDown
|
||||
expr: up{job="node"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Node {{ $labels.instance }} is down"
|
||||
|
||||
- alert: HighCPU
|
||||
expr: node_cpu_seconds_total{mode="idle"} < 20
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
|
||||
- alert: SuspiciousProcess
|
||||
expr: security_suspicious_process > 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Suspicious process on {{ $labels.instance }}"
|
||||
|
||||
- alert: PostgresDown
|
||||
expr: pg_up == 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📅 Roadmap
|
||||
|
||||
### Phase 1: Foundation (Тиждень 1-2)
|
||||
- [x] NODE1 rebuild + security
|
||||
- [x] NODE3 setup + security
|
||||
- [x] PostgreSQL на NODE1 та NODE3
|
||||
- [ ] Ansible repository setup
|
||||
- [ ] Terraform configs
|
||||
- [ ] CI/CD pipeline
|
||||
|
||||
### Phase 2: Regional Controllers (Тиждень 3-4)
|
||||
- [ ] Deploy 3 region controllers
|
||||
- [ ] Consul cluster setup
|
||||
- [ ] Vault setup
|
||||
- [ ] Prometheus federation
|
||||
|
||||
### Phase 3: First 50 Nodes (Тиждень 5-8)
|
||||
- [ ] EU region: 50 nodes
|
||||
- [ ] Automated deployment testing
|
||||
- [ ] Security audit
|
||||
- [ ] Performance testing
|
||||
|
||||
### Phase 4: Scale to 150 (Тиждень 9-12)
|
||||
- [ ] US region: 50 nodes
|
||||
- [ ] Asia region: 50 nodes
|
||||
- [ ] Global monitoring
|
||||
- [ ] Disaster recovery testing
|
||||
|
||||
### Phase 5: Production (Тиждень 13+)
|
||||
- [ ] Full production workloads
|
||||
- [ ] 24/7 monitoring
|
||||
- [ ] Automated incident response
|
||||
- [ ] Continuous security audits
|
||||
|
||||
---
|
||||
|
||||
## 💰 Estimated Costs
|
||||
|
||||
| Resource | Per Node | 50 Nodes | 150 Nodes |
|
||||
|----------|----------|----------|-----------|
|
||||
| Hetzner CX31 | €10/mo | €500/mo | €1,500/mo |
|
||||
| Storage (100GB) | €5/mo | €250/mo | €750/mo |
|
||||
| Bandwidth | ~€5/mo | €250/mo | €750/mo |
|
||||
| **Total** | **€20/mo** | **€1,000/mo** | **€3,000/mo** |
|
||||
|
||||
---
|
||||
|
||||
## 📚 Додаткові ресурси
|
||||
|
||||
- [Ansible Best Practices](https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html)
|
||||
- [Terraform Hetzner Provider](https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs)
|
||||
- [Kubernetes Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
- [HashiCorp Vault](https://www.vaultproject.io/docs)
|
||||
- [Prometheus Federation](https://prometheus.io/docs/prometheus/latest/federation/)
|
||||
|
||||
---
|
||||
|
||||
**Автор:** Ivan Tytar & AI Assistant
|
||||
**Останнє оновлення:** 2026-01-10
|
||||
1
infrastructure/ansible/.vault_pass.example
Normal file
1
infrastructure/ansible/.vault_pass.example
Normal file
@@ -0,0 +1 @@
|
||||
# Create .vault_pass file with your vault password
|
||||
31
infrastructure/ansible/ansible.cfg
Normal file
31
infrastructure/ansible/ansible.cfg
Normal file
@@ -0,0 +1,31 @@
|
||||
# DAARION Network - Ansible Configuration
|
||||
[defaults]
|
||||
inventory = inventory/production.yml
|
||||
remote_user = root
|
||||
host_key_checking = False
|
||||
retry_files_enabled = False
|
||||
gathering = smart
|
||||
fact_caching = jsonfile
|
||||
fact_caching_connection = /tmp/ansible_facts
|
||||
fact_caching_timeout = 86400
|
||||
|
||||
# Parallelism
|
||||
forks = 20
|
||||
|
||||
# Output
|
||||
stdout_callback = yaml
|
||||
callback_whitelist = profile_tasks
|
||||
|
||||
# Vault
|
||||
vault_password_file = .vault_pass
|
||||
|
||||
[ssh_connection]
|
||||
pipelining = True
|
||||
control_path = /tmp/ansible-%%h-%%p-%%r
|
||||
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no
|
||||
|
||||
[privilege_escalation]
|
||||
become = True
|
||||
become_method = sudo
|
||||
become_user = root
|
||||
become_ask_pass = False
|
||||
93
infrastructure/ansible/inventory/group_vars/all.yml
Normal file
93
infrastructure/ansible/inventory/group_vars/all.yml
Normal file
@@ -0,0 +1,93 @@
|
||||
# DAARION Network - Global Variables
|
||||
# These variables apply to all hosts
|
||||
|
||||
# =============================================================================
|
||||
# SECURITY
|
||||
# =============================================================================
|
||||
security_packages:
|
||||
- fail2ban
|
||||
- ufw
|
||||
- auditd
|
||||
- rkhunter
|
||||
- unattended-upgrades
|
||||
- ca-certificates
|
||||
|
||||
# Firewall - allowed ports (in addition to SSH)
|
||||
firewall_allowed_tcp_ports:
|
||||
- 6443 # K3s API
|
||||
- 10250 # Kubelet
|
||||
- 8200 # Vault
|
||||
- 8500 # Consul HTTP
|
||||
- 8600 # Consul DNS
|
||||
- 9090 # Prometheus
|
||||
- 3000 # Grafana
|
||||
- 5432 # PostgreSQL
|
||||
|
||||
firewall_allowed_outgoing:
|
||||
- { port: 53, proto: udp } # DNS
|
||||
- { port: 80, proto: tcp } # HTTP
|
||||
- { port: 443, proto: tcp } # HTTPS
|
||||
- { port: 123, proto: udp } # NTP
|
||||
|
||||
# Blocked networks (internal/private)
|
||||
firewall_blocked_networks:
|
||||
- 10.0.0.0/8
|
||||
- 172.16.0.0/12
|
||||
|
||||
# =============================================================================
|
||||
# DOCKER
|
||||
# =============================================================================
|
||||
docker_users:
|
||||
- "{{ ansible_user }}"
|
||||
|
||||
docker_daemon_options:
|
||||
storage-driver: "overlay2"
|
||||
log-driver: "json-file"
|
||||
log-opts:
|
||||
max-size: "100m"
|
||||
max-file: "3"
|
||||
|
||||
# =============================================================================
|
||||
# K3S / KUBERNETES
|
||||
# =============================================================================
|
||||
k3s_version: "v1.29.0+k3s1"
|
||||
k3s_disable:
|
||||
- traefik
|
||||
- servicelb
|
||||
|
||||
# =============================================================================
|
||||
# VAULT
|
||||
# =============================================================================
|
||||
vault_version: "1.15.4"
|
||||
vault_addr: "http://node1:8200"
|
||||
vault_data_dir: "/opt/vault/data"
|
||||
|
||||
# =============================================================================
|
||||
# CONSUL
|
||||
# =============================================================================
|
||||
consul_version: "1.17.1"
|
||||
consul_data_dir: "/opt/consul/data"
|
||||
consul_enable_connect: true
|
||||
|
||||
# =============================================================================
|
||||
# OBSERVABILITY
|
||||
# =============================================================================
|
||||
prometheus_retention: "30d"
|
||||
prometheus_storage_size: "50Gi"
|
||||
loki_retention: "168h" # 7 days
|
||||
tempo_retention: "168h" # 7 days
|
||||
|
||||
# =============================================================================
|
||||
# POSTGRESQL
|
||||
# =============================================================================
|
||||
postgres_image: "postgres@sha256:23e88eb049fd5d54894d70100df61d38a49ed97909263f79d4ff4c30a5d5fca2"
|
||||
postgres_user: "daarion"
|
||||
postgres_db: "daarion_main"
|
||||
|
||||
# =============================================================================
|
||||
# PATHS
|
||||
# =============================================================================
|
||||
scripts_dir: "/opt/scripts"
|
||||
config_dir: "/opt/config"
|
||||
logs_dir: "/var/log/daarion"
|
||||
backup_dir: "/opt/backups"
|
||||
65
infrastructure/ansible/inventory/production.yml
Normal file
65
infrastructure/ansible/inventory/production.yml
Normal file
@@ -0,0 +1,65 @@
|
||||
# DAARION Network - Production Inventory
|
||||
# Version: 1.0.0
|
||||
# Updated: 2026-01-10
|
||||
|
||||
all:
|
||||
vars:
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
timezone: "UTC"
|
||||
|
||||
# K3s configuration
|
||||
k3s_version: "v1.29.0+k3s1"
|
||||
k3s_token: "{{ vault_k3s_token }}"
|
||||
|
||||
# Network
|
||||
daarion_network_cidr: "10.42.0.0/16"
|
||||
daarion_service_cidr: "10.43.0.0/16"
|
||||
|
||||
children:
|
||||
# Master nodes - control plane
|
||||
masters:
|
||||
hosts:
|
||||
node1:
|
||||
ansible_host: 144.76.224.179
|
||||
ansible_user: root
|
||||
ansible_ssh_pass: "{{ vault_node1_password }}"
|
||||
node_role: master
|
||||
datacenter: hetzner-de
|
||||
location: "Nuremberg, Germany"
|
||||
|
||||
# Worker nodes - compute
|
||||
workers:
|
||||
hosts:
|
||||
node3:
|
||||
ansible_host: 80.77.35.151
|
||||
ansible_port: 33147
|
||||
ansible_user: zevs
|
||||
ansible_become: yes
|
||||
ansible_become_pass: "{{ vault_node3_password }}"
|
||||
node_role: worker
|
||||
datacenter: remote-dc
|
||||
location: "Remote Datacenter"
|
||||
gpu: true
|
||||
gpu_type: "rtx3090"
|
||||
gpu_memory: "24GB"
|
||||
|
||||
# GPU nodes (subset of workers)
|
||||
gpu_nodes:
|
||||
hosts:
|
||||
node3:
|
||||
|
||||
# Database nodes
|
||||
database_nodes:
|
||||
hosts:
|
||||
node1:
|
||||
node3:
|
||||
|
||||
# Local development
|
||||
local_dev:
|
||||
hosts:
|
||||
node2:
|
||||
ansible_host: localhost
|
||||
ansible_connection: local
|
||||
node_role: development
|
||||
datacenter: local
|
||||
location: "MacBook Pro M4"
|
||||
2
infrastructure/ansible/kubeconfig/.gitignore
vendored
Normal file
2
infrastructure/ansible/kubeconfig/.gitignore
vendored
Normal file
@@ -0,0 +1,2 @@
|
||||
*.yaml
|
||||
!.gitkeep
|
||||
0
infrastructure/ansible/kubeconfig/.gitkeep
Normal file
0
infrastructure/ansible/kubeconfig/.gitkeep
Normal file
143
infrastructure/ansible/playbooks/bootstrap.yml
Normal file
143
infrastructure/ansible/playbooks/bootstrap.yml
Normal file
@@ -0,0 +1,143 @@
|
||||
# DAARION Network - Bootstrap Playbook
|
||||
# Initial setup for all nodes: packages, SSH, hostname, etc.
|
||||
---
|
||||
- name: Bootstrap all nodes
|
||||
hosts: all
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
common_packages:
|
||||
- curl
|
||||
- wget
|
||||
- git
|
||||
- htop
|
||||
- vim
|
||||
- jq
|
||||
- unzip
|
||||
- ca-certificates
|
||||
- gnupg
|
||||
- lsb-release
|
||||
- net-tools
|
||||
- dnsutils
|
||||
- bc
|
||||
|
||||
tasks:
|
||||
# =========================================================================
|
||||
# BASIC SETUP
|
||||
# =========================================================================
|
||||
- name: Set timezone
|
||||
timezone:
|
||||
name: "{{ timezone }}"
|
||||
|
||||
- name: Set hostname
|
||||
hostname:
|
||||
name: "{{ inventory_hostname }}"
|
||||
|
||||
- name: Update /etc/hosts with all nodes
|
||||
lineinfile:
|
||||
path: /etc/hosts
|
||||
line: "{{ hostvars[item].ansible_host }} {{ item }}"
|
||||
state: present
|
||||
loop: "{{ groups['all'] }}"
|
||||
when:
|
||||
- hostvars[item].ansible_host is defined
|
||||
- hostvars[item].ansible_host != 'localhost'
|
||||
|
||||
# =========================================================================
|
||||
# PACKAGES
|
||||
# =========================================================================
|
||||
- name: Update apt cache
|
||||
apt:
|
||||
update_cache: yes
|
||||
cache_valid_time: 3600
|
||||
when: ansible_os_family == "Debian"
|
||||
|
||||
- name: Upgrade all packages
|
||||
apt:
|
||||
upgrade: safe
|
||||
when: ansible_os_family == "Debian"
|
||||
|
||||
- name: Install common packages
|
||||
apt:
|
||||
name: "{{ common_packages }}"
|
||||
state: present
|
||||
when: ansible_os_family == "Debian"
|
||||
|
||||
# =========================================================================
|
||||
# USERS & SSH
|
||||
# =========================================================================
|
||||
- name: Create admin group
|
||||
group:
|
||||
name: daarion-admin
|
||||
state: present
|
||||
|
||||
- name: Create directories
|
||||
file:
|
||||
path: "{{ item }}"
|
||||
state: directory
|
||||
mode: '0755'
|
||||
loop:
|
||||
- "{{ scripts_dir }}"
|
||||
- "{{ config_dir }}"
|
||||
- "{{ logs_dir }}"
|
||||
- "{{ backup_dir }}"
|
||||
|
||||
# =========================================================================
|
||||
# SSH HARDENING
|
||||
# =========================================================================
|
||||
- name: Disable root login via SSH (workers only)
|
||||
lineinfile:
|
||||
path: /etc/ssh/sshd_config
|
||||
regexp: '^#?PermitRootLogin'
|
||||
line: 'PermitRootLogin prohibit-password'
|
||||
notify: restart sshd
|
||||
when: "'workers' in group_names"
|
||||
|
||||
- name: Set SSH MaxAuthTries
|
||||
lineinfile:
|
||||
path: /etc/ssh/sshd_config
|
||||
regexp: '^#?MaxAuthTries'
|
||||
line: 'MaxAuthTries 3'
|
||||
notify: restart sshd
|
||||
|
||||
- name: Set SSH ClientAliveInterval
|
||||
lineinfile:
|
||||
path: /etc/ssh/sshd_config
|
||||
regexp: '^#?ClientAliveInterval'
|
||||
line: 'ClientAliveInterval 300'
|
||||
notify: restart sshd
|
||||
|
||||
# =========================================================================
|
||||
# KERNEL PARAMETERS
|
||||
# =========================================================================
|
||||
- name: Set kernel parameters for containers
|
||||
sysctl:
|
||||
name: "{{ item.name }}"
|
||||
value: "{{ item.value }}"
|
||||
state: present
|
||||
reload: yes
|
||||
loop:
|
||||
- { name: 'net.ipv4.ip_forward', value: '1' }
|
||||
- { name: 'net.bridge.bridge-nf-call-iptables', value: '1' }
|
||||
- { name: 'net.bridge.bridge-nf-call-ip6tables', value: '1' }
|
||||
- { name: 'fs.inotify.max_user_watches', value: '524288' }
|
||||
- { name: 'fs.inotify.max_user_instances', value: '512' }
|
||||
ignore_errors: yes # Some params may not exist on all systems
|
||||
|
||||
# =========================================================================
|
||||
# VERIFICATION
|
||||
# =========================================================================
|
||||
- name: Verify setup
|
||||
debug:
|
||||
msg: |
|
||||
Node: {{ inventory_hostname }}
|
||||
Host: {{ ansible_host }}
|
||||
Datacenter: {{ datacenter | default('unknown') }}
|
||||
Role: {{ node_role | default('unknown') }}
|
||||
GPU: {{ gpu | default(false) }}
|
||||
|
||||
handlers:
|
||||
- name: restart sshd
|
||||
service:
|
||||
name: sshd
|
||||
state: restarted
|
||||
288
infrastructure/ansible/playbooks/hardening.yml
Normal file
288
infrastructure/ansible/playbooks/hardening.yml
Normal file
@@ -0,0 +1,288 @@
|
||||
# DAARION Network - Security Hardening Playbook
|
||||
# Comprehensive security setup for all nodes
|
||||
---
|
||||
- name: Security Hardening
|
||||
hosts: all
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
allowed_ssh_port: "{{ ansible_port | default(22) }}"
|
||||
|
||||
tasks:
|
||||
# =========================================================================
|
||||
# SECURITY PACKAGES
|
||||
# =========================================================================
|
||||
- name: Install security packages
|
||||
apt:
|
||||
name: "{{ security_packages }}"
|
||||
state: present
|
||||
when: ansible_os_family == "Debian"
|
||||
|
||||
- name: Install Trivy (vulnerability scanner)
|
||||
shell: |
|
||||
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
|
||||
args:
|
||||
creates: /usr/local/bin/trivy
|
||||
|
||||
# =========================================================================
|
||||
# UFW FIREWALL
|
||||
# =========================================================================
|
||||
- name: UFW - Reset to defaults
|
||||
ufw:
|
||||
state: reset
|
||||
|
||||
- name: UFW - Default deny incoming
|
||||
ufw:
|
||||
direction: incoming
|
||||
policy: deny
|
||||
|
||||
- name: UFW - Default deny outgoing
|
||||
ufw:
|
||||
direction: outgoing
|
||||
policy: deny
|
||||
|
||||
- name: UFW - Allow SSH
|
||||
ufw:
|
||||
rule: allow
|
||||
port: "{{ allowed_ssh_port }}"
|
||||
proto: tcp
|
||||
|
||||
- name: UFW - Allow necessary TCP ports
|
||||
ufw:
|
||||
rule: allow
|
||||
port: "{{ item }}"
|
||||
proto: tcp
|
||||
loop: "{{ firewall_allowed_tcp_ports }}"
|
||||
when: firewall_allowed_tcp_ports is defined
|
||||
|
||||
- name: UFW - Allow necessary outgoing
|
||||
ufw:
|
||||
rule: allow
|
||||
direction: out
|
||||
port: "{{ item.port }}"
|
||||
proto: "{{ item.proto }}"
|
||||
loop: "{{ firewall_allowed_outgoing }}"
|
||||
|
||||
- name: UFW - Block internal networks
|
||||
ufw:
|
||||
rule: deny
|
||||
direction: out
|
||||
to_ip: "{{ item }}"
|
||||
loop: "{{ firewall_blocked_networks }}"
|
||||
when: firewall_blocked_networks is defined
|
||||
|
||||
- name: UFW - Enable
|
||||
ufw:
|
||||
state: enabled
|
||||
|
||||
# =========================================================================
|
||||
# FAIL2BAN
|
||||
# =========================================================================
|
||||
- name: Configure fail2ban
|
||||
copy:
|
||||
dest: /etc/fail2ban/jail.local
|
||||
content: |
|
||||
[DEFAULT]
|
||||
bantime = 3600
|
||||
findtime = 600
|
||||
maxretry = 3
|
||||
|
||||
[sshd]
|
||||
enabled = true
|
||||
port = {{ allowed_ssh_port }}
|
||||
filter = sshd
|
||||
logpath = /var/log/auth.log
|
||||
maxretry = 3
|
||||
bantime = 86400
|
||||
notify: restart fail2ban
|
||||
|
||||
- name: Enable fail2ban
|
||||
service:
|
||||
name: fail2ban
|
||||
enabled: yes
|
||||
state: started
|
||||
|
||||
# =========================================================================
|
||||
# AUDITD
|
||||
# =========================================================================
|
||||
- name: Configure auditd rules
|
||||
copy:
|
||||
dest: /etc/audit/rules.d/daarion.rules
|
||||
content: |
|
||||
# Monitor file changes in critical directories
|
||||
-w /etc/passwd -p wa -k passwd_changes
|
||||
-w /etc/shadow -p wa -k shadow_changes
|
||||
-w /etc/ssh/sshd_config -p wa -k sshd_config
|
||||
|
||||
# Monitor Docker
|
||||
-w /var/lib/docker -p wa -k docker
|
||||
-w /etc/docker -p wa -k docker_config
|
||||
|
||||
# Monitor cron
|
||||
-w /etc/crontab -p wa -k cron
|
||||
-w /etc/cron.d -p wa -k cron
|
||||
|
||||
# Monitor tmp (malware indicator)
|
||||
-w /tmp -p x -k tmp_exec
|
||||
-w /var/tmp -p x -k var_tmp_exec
|
||||
notify: restart auditd
|
||||
|
||||
- name: Enable auditd
|
||||
service:
|
||||
name: auditd
|
||||
enabled: yes
|
||||
state: started
|
||||
|
||||
# =========================================================================
|
||||
# KERNEL HARDENING
|
||||
# =========================================================================
|
||||
- name: Kernel security parameters
|
||||
sysctl:
|
||||
name: "{{ item.name }}"
|
||||
value: "{{ item.value }}"
|
||||
state: present
|
||||
reload: yes
|
||||
loop:
|
||||
- { name: 'net.ipv4.conf.all.accept_redirects', value: '0' }
|
||||
- { name: 'net.ipv4.conf.default.accept_redirects', value: '0' }
|
||||
- { name: 'net.ipv4.conf.all.send_redirects', value: '0' }
|
||||
- { name: 'net.ipv4.conf.default.send_redirects', value: '0' }
|
||||
- { name: 'net.ipv4.tcp_syncookies', value: '1' }
|
||||
- { name: 'net.ipv4.icmp_echo_ignore_broadcasts', value: '1' }
|
||||
- { name: 'kernel.randomize_va_space', value: '2' }
|
||||
- { name: 'kernel.kptr_restrict', value: '2' }
|
||||
- { name: 'kernel.dmesg_restrict', value: '1' }
|
||||
|
||||
# =========================================================================
|
||||
# SECURITY CHECK SCRIPT
|
||||
# =========================================================================
|
||||
- name: Deploy security check script
|
||||
copy:
|
||||
dest: "{{ scripts_dir }}/security-check.sh"
|
||||
mode: '0755'
|
||||
content: |
|
||||
#!/bin/bash
|
||||
# DAARION Security Check Script
|
||||
# Runs hourly via cron
|
||||
|
||||
LOG="{{ logs_dir }}/security-$(date +%Y%m%d).log"
|
||||
ALERT_FILE="/tmp/security_alert"
|
||||
|
||||
log() {
|
||||
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG"
|
||||
}
|
||||
|
||||
log "=== Security Check Started ==="
|
||||
|
||||
# Check for suspicious processes
|
||||
SUSPICIOUS=$(ps aux | grep -E "(xmrig|kdevtmp|kinsing|perfctl|httpd.*tmp|mysql.*tmp)" | grep -v grep)
|
||||
if [ -n "$SUSPICIOUS" ]; then
|
||||
log "CRITICAL: Suspicious process detected!"
|
||||
log "$SUSPICIOUS"
|
||||
pkill -9 -f "xmrig|kdevtmp|kinsing|perfctl"
|
||||
touch "$ALERT_FILE"
|
||||
fi
|
||||
|
||||
# Check for executables in /tmp
|
||||
TMP_EXEC=$(find /tmp /var/tmp /dev/shm -type f -executable 2>/dev/null)
|
||||
if [ -n "$TMP_EXEC" ]; then
|
||||
log "WARNING: Executable files in tmp directories!"
|
||||
log "$TMP_EXEC"
|
||||
rm -f $TMP_EXEC 2>/dev/null
|
||||
fi
|
||||
|
||||
# Check CPU usage (potential mining)
|
||||
LOAD=$(cat /proc/loadavg | cut -d' ' -f1)
|
||||
CPU_COUNT=$(nproc)
|
||||
THRESHOLD=$(echo "$CPU_COUNT * 2" | bc)
|
||||
if (( $(echo "$LOAD > $THRESHOLD" | bc -l) )); then
|
||||
log "WARNING: High CPU load: $LOAD (threshold: $THRESHOLD)"
|
||||
fi
|
||||
|
||||
# Check for unauthorized SSH keys
|
||||
for user_home in /root /home/*; do
|
||||
if [ -f "$user_home/.ssh/authorized_keys" ]; then
|
||||
KEY_COUNT=$(wc -l < "$user_home/.ssh/authorized_keys")
|
||||
log "INFO: $user_home has $KEY_COUNT SSH keys"
|
||||
fi
|
||||
done
|
||||
|
||||
# Check failed SSH attempts
|
||||
FAILED_SSH=$(grep "Failed password" /var/log/auth.log 2>/dev/null | wc -l)
|
||||
log "INFO: Failed SSH attempts today: $FAILED_SSH"
|
||||
|
||||
# Check Docker containers
|
||||
if command -v docker &> /dev/null; then
|
||||
CONTAINER_COUNT=$(docker ps -q | wc -l)
|
||||
log "INFO: Running Docker containers: $CONTAINER_COUNT"
|
||||
|
||||
# Check for containers running as root
|
||||
docker ps -q | while read cid; do
|
||||
USER=$(docker inspect --format '{{.Config.User}}' $cid)
|
||||
NAME=$(docker inspect --format '{{.Name}}' $cid)
|
||||
if [ -z "$USER" ] || [ "$USER" = "root" ] || [ "$USER" = "0" ]; then
|
||||
log "WARNING: Container $NAME running as root"
|
||||
fi
|
||||
done
|
||||
fi
|
||||
|
||||
log "=== Security Check Completed ==="
|
||||
|
||||
- name: Setup security cron
|
||||
cron:
|
||||
name: "Hourly security check"
|
||||
minute: "0"
|
||||
job: "{{ scripts_dir }}/security-check.sh"
|
||||
|
||||
- name: Setup daily rkhunter scan
|
||||
cron:
|
||||
name: "Daily rkhunter scan"
|
||||
hour: "3"
|
||||
minute: "0"
|
||||
job: "rkhunter --update && rkhunter --check --skip-keypress > {{ logs_dir }}/rkhunter.log 2>&1"
|
||||
|
||||
# =========================================================================
|
||||
# AUTO UPDATES
|
||||
# =========================================================================
|
||||
- name: Configure unattended-upgrades
|
||||
copy:
|
||||
dest: /etc/apt/apt.conf.d/50unattended-upgrades
|
||||
content: |
|
||||
Unattended-Upgrade::Allowed-Origins {
|
||||
"${distro_id}:${distro_codename}";
|
||||
"${distro_id}:${distro_codename}-security";
|
||||
"${distro_id}ESMApps:${distro_codename}-apps-security";
|
||||
"${distro_id}ESM:${distro_codename}-infra-security";
|
||||
};
|
||||
Unattended-Upgrade::AutoFixInterruptedDpkg "true";
|
||||
Unattended-Upgrade::Remove-Unused-Dependencies "true";
|
||||
Unattended-Upgrade::Automatic-Reboot "false";
|
||||
when: ansible_os_family == "Debian"
|
||||
|
||||
# =========================================================================
|
||||
# VERIFICATION
|
||||
# =========================================================================
|
||||
- name: Verify security setup
|
||||
shell: |
|
||||
echo "=== Security Status ==="
|
||||
echo "UFW: $(ufw status | head -1)"
|
||||
echo "Fail2ban: $(systemctl is-active fail2ban)"
|
||||
echo "Auditd: $(systemctl is-active auditd)"
|
||||
echo "Trivy: $(trivy --version 2>/dev/null | head -1 || echo 'not installed')"
|
||||
register: security_status
|
||||
changed_when: false
|
||||
|
||||
- name: Show security status
|
||||
debug:
|
||||
var: security_status.stdout_lines
|
||||
|
||||
handlers:
|
||||
- name: restart fail2ban
|
||||
service:
|
||||
name: fail2ban
|
||||
state: restarted
|
||||
|
||||
- name: restart auditd
|
||||
service:
|
||||
name: auditd
|
||||
state: restarted
|
||||
183
infrastructure/ansible/playbooks/k3s-install.yml
Normal file
183
infrastructure/ansible/playbooks/k3s-install.yml
Normal file
@@ -0,0 +1,183 @@
|
||||
# DAARION Network - K3s Installation Playbook
|
||||
# Lightweight Kubernetes cluster setup
|
||||
---
|
||||
# =============================================================================
|
||||
# INSTALL K3S SERVER (MASTERS)
|
||||
# =============================================================================
|
||||
- name: Install K3s Server on Masters
|
||||
hosts: masters
|
||||
become: yes
|
||||
|
||||
tasks:
|
||||
- name: Check if K3s is already installed
|
||||
stat:
|
||||
path: /etc/rancher/k3s/k3s.yaml
|
||||
register: k3s_installed
|
||||
|
||||
- name: Download K3s installer
|
||||
get_url:
|
||||
url: https://get.k3s.io
|
||||
dest: /tmp/k3s-install.sh
|
||||
mode: '0755'
|
||||
when: not k3s_installed.stat.exists
|
||||
|
||||
- name: Install K3s server
|
||||
shell: |
|
||||
INSTALL_K3S_VERSION={{ k3s_version }} \
|
||||
sh /tmp/k3s-install.sh server \
|
||||
--disable traefik \
|
||||
--disable servicelb \
|
||||
--write-kubeconfig-mode 644 \
|
||||
--tls-san {{ ansible_host }} \
|
||||
--tls-san {{ inventory_hostname }} \
|
||||
--node-label "datacenter={{ datacenter }}" \
|
||||
--node-label "node-role={{ node_role }}" \
|
||||
--cluster-cidr {{ daarion_network_cidr | default('10.42.0.0/16') }} \
|
||||
--service-cidr {{ daarion_service_cidr | default('10.43.0.0/16') }}
|
||||
args:
|
||||
creates: /etc/rancher/k3s/k3s.yaml
|
||||
register: k3s_install
|
||||
|
||||
- name: Wait for K3s to be ready
|
||||
wait_for:
|
||||
port: 6443
|
||||
delay: 10
|
||||
timeout: 300
|
||||
|
||||
- name: Wait for node to be ready
|
||||
shell: |
|
||||
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
|
||||
kubectl wait --for=condition=Ready node/{{ inventory_hostname }} --timeout=300s
|
||||
register: node_ready
|
||||
retries: 10
|
||||
delay: 10
|
||||
until: node_ready.rc == 0
|
||||
|
||||
- name: Get K3s token
|
||||
slurp:
|
||||
src: /var/lib/rancher/k3s/server/node-token
|
||||
register: k3s_token_file
|
||||
|
||||
- name: Save K3s token as fact
|
||||
set_fact:
|
||||
k3s_join_token: "{{ k3s_token_file.content | b64decode | trim }}"
|
||||
|
||||
- name: Fetch kubeconfig
|
||||
fetch:
|
||||
src: /etc/rancher/k3s/k3s.yaml
|
||||
dest: "{{ playbook_dir }}/../kubeconfig/{{ inventory_hostname }}.yaml"
|
||||
flat: yes
|
||||
|
||||
- name: Update kubeconfig with external IP
|
||||
delegate_to: localhost
|
||||
become: no
|
||||
replace:
|
||||
path: "{{ playbook_dir }}/../kubeconfig/{{ inventory_hostname }}.yaml"
|
||||
regexp: '127.0.0.1'
|
||||
replace: "{{ ansible_host }}"
|
||||
|
||||
- name: Show K3s status
|
||||
shell: |
|
||||
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
|
||||
kubectl get nodes -o wide
|
||||
register: k3s_status
|
||||
changed_when: false
|
||||
|
||||
- name: Display K3s status
|
||||
debug:
|
||||
var: k3s_status.stdout_lines
|
||||
|
||||
# =============================================================================
|
||||
# INSTALL K3S AGENT (WORKERS)
|
||||
# =============================================================================
|
||||
- name: Install K3s Agent on Workers
|
||||
hosts: workers
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
k3s_master_host: "{{ hostvars[groups['masters'][0]].ansible_host }}"
|
||||
k3s_master_token: "{{ hostvars[groups['masters'][0]].k3s_join_token }}"
|
||||
|
||||
tasks:
|
||||
- name: Check if K3s agent is already installed
|
||||
stat:
|
||||
path: /var/lib/rancher/k3s/agent
|
||||
register: k3s_agent_installed
|
||||
|
||||
- name: Download K3s installer
|
||||
get_url:
|
||||
url: https://get.k3s.io
|
||||
dest: /tmp/k3s-install.sh
|
||||
mode: '0755'
|
||||
when: not k3s_agent_installed.stat.exists
|
||||
|
||||
- name: Build node labels
|
||||
set_fact:
|
||||
node_labels: >-
|
||||
--node-label datacenter={{ datacenter }}
|
||||
--node-label node-role={{ node_role }}
|
||||
{% if gpu is defined and gpu %}
|
||||
--node-label gpu=true
|
||||
--node-label gpu-type={{ gpu_type | default('unknown') }}
|
||||
--node-label gpu-memory={{ gpu_memory | default('unknown') }}
|
||||
{% endif %}
|
||||
|
||||
- name: Install K3s agent
|
||||
shell: |
|
||||
INSTALL_K3S_VERSION={{ k3s_version }} \
|
||||
K3S_URL=https://{{ k3s_master_host }}:6443 \
|
||||
K3S_TOKEN={{ k3s_master_token }} \
|
||||
sh /tmp/k3s-install.sh agent \
|
||||
{{ node_labels }}
|
||||
args:
|
||||
creates: /var/lib/rancher/k3s/agent
|
||||
register: k3s_agent_install
|
||||
|
||||
- name: Wait for agent to connect
|
||||
pause:
|
||||
seconds: 30
|
||||
when: k3s_agent_install.changed
|
||||
|
||||
# =============================================================================
|
||||
# VERIFY CLUSTER
|
||||
# =============================================================================
|
||||
- name: Verify K3s Cluster
|
||||
hosts: masters
|
||||
become: yes
|
||||
|
||||
tasks:
|
||||
- name: Get cluster nodes
|
||||
shell: |
|
||||
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
|
||||
kubectl get nodes -o wide
|
||||
register: cluster_nodes
|
||||
changed_when: false
|
||||
|
||||
- name: Display cluster nodes
|
||||
debug:
|
||||
var: cluster_nodes.stdout_lines
|
||||
|
||||
- name: Get cluster info
|
||||
shell: |
|
||||
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
|
||||
kubectl cluster-info
|
||||
register: cluster_info
|
||||
changed_when: false
|
||||
|
||||
- name: Display cluster info
|
||||
debug:
|
||||
var: cluster_info.stdout_lines
|
||||
|
||||
- name: Create daarion namespace
|
||||
shell: |
|
||||
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
|
||||
kubectl create namespace daarion --dry-run=client -o yaml | kubectl apply -f -
|
||||
changed_when: false
|
||||
|
||||
- name: Label GPU nodes
|
||||
shell: |
|
||||
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
|
||||
kubectl label nodes {{ item }} nvidia.com/gpu=true --overwrite
|
||||
loop: "{{ groups['gpu_nodes'] | default([]) }}"
|
||||
when: groups['gpu_nodes'] is defined
|
||||
ignore_errors: yes
|
||||
Reference in New Issue
Block a user