🌐 Add 150 nodes network deployment plan

- Created NETWORK-150-NODES-PLAN.md with complete architecture
- Ansible playbooks for automated security and deployment
- Terraform configs for Hetzner infrastructure
- Zero Trust security architecture
- Prometheus federation for monitoring
- Estimated costs and roadmap
- PostgreSQL deployed on NODE1 and NODE3
This commit is contained in:
Apple
2026-01-10 05:11:45 -08:00
parent 1231647f94
commit 02cfd90b6f

633
NETWORK-150-NODES-PLAN.md Normal file
View File

@@ -0,0 +1,633 @@
# 🌐 План розгортання мережі 150 нод — DAARION Network
**Версія:** 1.0.0
**Дата:** 2026-01-10
**Статус:** Планування
---
## 📋 Зміст
1. [Архітектура мережі](#архітектура-мережі)
2. [Централізоване управління](#централізоване-управління)
3. [Автоматизація розгортання](#автоматизація-розгортання)
4. [Безпека мережі](#безпека-мережі)
5. [Моніторинг та алерти](#моніторинг-та-алерти)
6. [Roadmap](#roadmap)
---
## 🏗️ Архітектура мережі
### Ієрархія нод
```
┌─────────────────┐
│ MASTER NODE │
│ (NODE1) │
│ Hetzner │
└────────┬────────┘
┌────────────────┼────────────────┐
│ │ │
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
│ REGION EU │ │ REGION US │ │ REGION ASIA │
│ Controller │ │ Controller │ │ Controller │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
┌───────┼───────┐ ┌────┼────┐ ┌──────┼──────┐
│ │ │ │ │ │ │ │ │
50 50 50 25 25 25 25 25 25
nodes nodes nodes nodes nodes nodes nodes nodes nodes
```
### Типи нод
| Тип | Кількість | Роль | Ресурси |
|-----|-----------|------|---------|
| **Master** | 1 | Центральне управління, GitOps | 8 CPU, 32GB RAM |
| **Region Controller** | 3-5 | Регіональне управління | 4 CPU, 16GB RAM |
| **Compute Node** | ~140 | Обчислення, AI workloads | 2-8 CPU, 8-64GB RAM |
| **GPU Node** | ~5 | AI/ML inference | GPU + 32GB+ RAM |
---
## 🎛️ Централізоване управління
### Інструменти
| Інструмент | Призначення | Альтернатива |
|------------|-------------|--------------|
| **Ansible** | Configuration Management | Salt, Puppet |
| **Terraform** | Infrastructure as Code | Pulumi |
| **Kubernetes** | Container Orchestration | Docker Swarm |
| **Consul** | Service Discovery | etcd |
| **Vault** | Secrets Management | AWS Secrets Manager |
| **Prometheus** | Metrics | InfluxDB |
| **Grafana** | Dashboards | - |
| **Loki** | Logs | ELK Stack |
### Ansible Inventory Structure
```yaml
# inventory/production.yml
all:
children:
masters:
hosts:
node1-master:
ansible_host: 144.76.224.179
ansible_user: root
region_controllers:
hosts:
node3-eu:
ansible_host: 80.77.35.151
ansible_port: 33147
ansible_user: zevs
ansible_become_pass: "{{ vault_node3_password }}"
compute_nodes:
children:
eu_nodes:
hosts:
node-eu-[001:050]:
ansible_host: "{{ inventory_hostname }}.daarion.network"
us_nodes:
hosts:
node-us-[001:050]:
ansible_host: "{{ inventory_hostname }}.daarion.network"
asia_nodes:
hosts:
node-asia-[001:050]:
ansible_host: "{{ inventory_hostname }}.daarion.network"
gpu_nodes:
hosts:
gpu-[01:05]:
ansible_host: "{{ inventory_hostname }}.daarion.network"
```
### Ansible Playbook: Security Setup
```yaml
# playbooks/security-setup.yml
---
- name: Security Setup for All Nodes
hosts: all
become: yes
vars:
security_packages:
- fail2ban
- auditd
- rkhunter
- chkrootkit
- ufw
tasks:
- name: Update apt cache
apt:
update_cache: yes
cache_valid_time: 3600
- name: Install security packages
apt:
name: "{{ security_packages }}"
state: present
- name: Install Trivy
shell: |
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
args:
creates: /usr/local/bin/trivy
- name: Configure fail2ban
template:
src: templates/jail.local.j2
dest: /etc/fail2ban/jail.local
notify: restart fail2ban
- name: Configure UFW defaults
ufw:
direction: "{{ item.direction }}"
policy: "{{ item.policy }}"
loop:
- { direction: incoming, policy: deny }
- { direction: outgoing, policy: deny }
- name: Allow SSH
ufw:
rule: allow
port: "{{ ansible_port | default(22) }}"
proto: tcp
- name: Allow necessary outgoing
ufw:
rule: allow
direction: out
port: "{{ item }}"
proto: "{{ item.proto | default('tcp') }}"
loop:
- { port: 53, proto: udp }
- { port: 80 }
- { port: 443 }
- { port: 123, proto: udp }
- name: Block internal networks
ufw:
rule: deny
direction: out
to_ip: "{{ item }}"
loop:
- 10.0.0.0/8
- 172.16.0.0/12
- name: Enable UFW
ufw:
state: enabled
- name: Copy security check script
copy:
src: files/security-check.sh
dest: /opt/scripts/security-check.sh
mode: '0755'
- name: Setup security cron
cron:
name: "Security check"
minute: "0"
job: "/opt/scripts/security-check.sh"
handlers:
- name: restart fail2ban
service:
name: fail2ban
state: restarted
```
### Ansible Playbook: PostgreSQL Deployment
```yaml
# playbooks/postgresql-deploy.yml
---
- name: Deploy PostgreSQL to Nodes
hosts: database_nodes
become: yes
vars:
postgres_image: "postgres@sha256:23e88eb049fd5d54894d70100df61d38a49ed97909263f79d4ff4c30a5d5fca2"
postgres_user: "daarion"
postgres_password: "{{ vault_postgres_password }}"
postgres_db: "daarion_main"
tasks:
- name: Pull PostgreSQL image
docker_image:
name: "{{ postgres_image }}"
source: pull
- name: Scan image with Trivy
command: trivy image --severity HIGH,CRITICAL --exit-code 1 {{ postgres_image }}
register: trivy_result
failed_when: trivy_result.rc != 0
- name: Create PostgreSQL volume
docker_volume:
name: "postgres_data_{{ inventory_hostname }}"
- name: Run PostgreSQL container
docker_container:
name: dagi-postgres
image: "{{ postgres_image }}"
state: started
restart_policy: "no"
security_opts:
- no-new-privileges:true
read_only: yes
tmpfs:
- /tmp:noexec,nosuid,nodev,size=100m
- /var/run/postgresql:noexec,nosuid,nodev,size=10m
volumes:
- "postgres_data_{{ inventory_hostname }}:/var/lib/postgresql/data"
env:
POSTGRES_USER: "{{ postgres_user }}"
POSTGRES_PASSWORD: "{{ postgres_password }}"
POSTGRES_DB: "{{ postgres_db }}"
cpus: 2
memory: 2g
ports:
- "5432:5432"
- name: Wait for PostgreSQL to be ready
wait_for:
host: localhost
port: 5432
delay: 5
timeout: 60
```
---
## 🚀 Автоматизація розгортання
### GitOps Workflow
```
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ GitHub │────▶│ ArgoCD │────▶│ Kubernetes │
│ (configs) │ │ (GitOps) │ │ (runtime) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Terraform │────▶│ Ansible │────▶│ Nodes │
│ (infra) │ │ (config) │ │ (150) │
└─────────────┘ └─────────────┘ └─────────────┘
```
### Terraform: Node Provisioning
```hcl
# terraform/main.tf
terraform {
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
}
}
}
variable "hcloud_token" {
sensitive = true
}
variable "node_count" {
default = 50
}
provider "hcloud" {
token = var.hcloud_token
}
resource "hcloud_ssh_key" "default" {
name = "daarion-network"
public_key = file("~/.ssh/daarion_network.pub")
}
resource "hcloud_server" "compute_nodes" {
count = var.node_count
name = "node-eu-${format("%03d", count.index + 1)}"
server_type = "cx31" # 2 vCPU, 8GB RAM
image = "ubuntu-24.04"
location = "nbg1"
ssh_keys = [hcloud_ssh_key.default.id]
labels = {
role = "compute"
region = "eu"
managed = "terraform"
}
user_data = <<-EOF
#cloud-config
packages:
- docker.io
- fail2ban
- ufw
runcmd:
- systemctl enable docker
- systemctl start docker
- ufw default deny incoming
- ufw default deny outgoing
- ufw allow 22/tcp
- ufw allow out 53/udp
- ufw allow out 443/tcp
- ufw --force enable
EOF
}
output "node_ips" {
value = hcloud_server.compute_nodes[*].ipv4_address
}
```
### Deployment Script
```bash
#!/bin/bash
# scripts/deploy-network.sh
set -e
NODES_COUNT=${1:-10}
REGION=${2:-eu}
echo "🚀 Deploying $NODES_COUNT nodes in $REGION region..."
# 1. Provision infrastructure
echo "[1/5] Provisioning infrastructure..."
cd terraform
terraform init
terraform apply -var="node_count=$NODES_COUNT" -auto-approve
cd ..
# 2. Wait for nodes to be ready
echo "[2/5] Waiting for nodes..."
sleep 60
# 3. Update Ansible inventory
echo "[3/5] Updating inventory..."
terraform output -json node_ips | jq -r '.[]' > inventory/hosts_$REGION.txt
# 4. Run security setup
echo "[4/5] Running security setup..."
ansible-playbook -i inventory/production.yml playbooks/security-setup.yml --limit "$REGION_nodes"
# 5. Deploy services
echo "[5/5] Deploying services..."
ansible-playbook -i inventory/production.yml playbooks/services-deploy.yml --limit "$REGION_nodes"
echo "✅ Deployment complete!"
```
---
## 🔒 Безпека мережі
### Zero Trust Architecture
```
┌─────────────────────────────────────────────────────────┐
│ ZERO TRUST LAYER │
├─────────────────────────────────────────────────────────┤
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ mTLS │ │ RBAC │ │ Network │ │ Secrets │ │
│ │ │ │ │ │ Policy │ │ Vault │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
├─────────────────────────────────────────────────────────┤
│ SERVICE MESH (Istio) │
├─────────────────────────────────────────────────────────┤
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ Node N │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────┘
```
### Security Policies
```yaml
# k8s/network-policy.yml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-postgres
spec:
podSelector:
matchLabels:
app: postgres
ingress:
- from:
- podSelector:
matchLabels:
access: postgres
ports:
- protocol: TCP
port: 5432
```
### Vault Integration
```yaml
# vault/postgres-policy.hcl
path "database/creds/daarion-db" {
capabilities = ["read"]
}
path "secret/data/postgres/*" {
capabilities = ["read"]
}
```
```bash
# Отримання credentials
vault read database/creds/daarion-db
```
---
## 📊 Моніторинг та алерти
### Prometheus Federation
```yaml
# prometheus/federation.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'federate'
scrape_interval: 30s
honor_labels: true
metrics_path: '/federate'
params:
'match[]':
- '{job="node"}'
- '{job="docker"}'
- '{job="postgres"}'
static_configs:
- targets:
- 'node-eu-001:9090'
- 'node-eu-002:9090'
# ... all nodes
```
### Grafana Dashboard
```json
{
"dashboard": {
"title": "DAARION Network Overview",
"panels": [
{
"title": "Total Nodes",
"type": "stat",
"targets": [
{
"expr": "count(up{job=\"node\"})"
}
]
},
{
"title": "Healthy Nodes",
"type": "stat",
"targets": [
{
"expr": "count(up{job=\"node\"} == 1)"
}
]
},
{
"title": "Security Alerts",
"type": "stat",
"targets": [
{
"expr": "sum(security_alerts_total)"
}
]
}
]
}
}
```
### Alert Rules
```yaml
# prometheus/alerts.yml
groups:
- name: network
rules:
- alert: NodeDown
expr: up{job="node"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} is down"
- alert: HighCPU
expr: node_cpu_seconds_total{mode="idle"} < 20
for: 10m
labels:
severity: warning
- alert: SuspiciousProcess
expr: security_suspicious_process > 0
for: 1m
labels:
severity: critical
annotations:
summary: "Suspicious process on {{ $labels.instance }}"
- alert: PostgresDown
expr: pg_up == 0
for: 1m
labels:
severity: critical
```
---
## 📅 Roadmap
### Phase 1: Foundation (Тиждень 1-2)
- [x] NODE1 rebuild + security
- [x] NODE3 setup + security
- [x] PostgreSQL на NODE1 та NODE3
- [ ] Ansible repository setup
- [ ] Terraform configs
- [ ] CI/CD pipeline
### Phase 2: Regional Controllers (Тиждень 3-4)
- [ ] Deploy 3 region controllers
- [ ] Consul cluster setup
- [ ] Vault setup
- [ ] Prometheus federation
### Phase 3: First 50 Nodes (Тиждень 5-8)
- [ ] EU region: 50 nodes
- [ ] Automated deployment testing
- [ ] Security audit
- [ ] Performance testing
### Phase 4: Scale to 150 (Тиждень 9-12)
- [ ] US region: 50 nodes
- [ ] Asia region: 50 nodes
- [ ] Global monitoring
- [ ] Disaster recovery testing
### Phase 5: Production (Тиждень 13+)
- [ ] Full production workloads
- [ ] 24/7 monitoring
- [ ] Automated incident response
- [ ] Continuous security audits
---
## 💰 Estimated Costs
| Resource | Per Node | 50 Nodes | 150 Nodes |
|----------|----------|----------|-----------|
| Hetzner CX31 | €10/mo | €500/mo | €1,500/mo |
| Storage (100GB) | €5/mo | €250/mo | €750/mo |
| Bandwidth | ~€5/mo | €250/mo | €750/mo |
| **Total** | **€20/mo** | **€1,000/mo** | **€3,000/mo** |
---
## 📚 Додаткові ресурси
- [Ansible Best Practices](https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html)
- [Terraform Hetzner Provider](https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs)
- [Kubernetes Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
- [HashiCorp Vault](https://www.vaultproject.io/docs)
- [Prometheus Federation](https://prometheus.io/docs/prometheus/latest/federation/)
---
**Автор:** Ivan Tytar & AI Assistant
**Останнє оновлення:** 2026-01-10