🌐 Add 150 nodes network deployment plan
- Created NETWORK-150-NODES-PLAN.md with complete architecture - Ansible playbooks for automated security and deployment - Terraform configs for Hetzner infrastructure - Zero Trust security architecture - Prometheus federation for monitoring - Estimated costs and roadmap - PostgreSQL deployed on NODE1 and NODE3
This commit is contained in:
633
NETWORK-150-NODES-PLAN.md
Normal file
633
NETWORK-150-NODES-PLAN.md
Normal file
@@ -0,0 +1,633 @@
|
||||
# 🌐 План розгортання мережі 150 нод — DAARION Network
|
||||
|
||||
**Версія:** 1.0.0
|
||||
**Дата:** 2026-01-10
|
||||
**Статус:** Планування
|
||||
|
||||
---
|
||||
|
||||
## 📋 Зміст
|
||||
|
||||
1. [Архітектура мережі](#архітектура-мережі)
|
||||
2. [Централізоване управління](#централізоване-управління)
|
||||
3. [Автоматизація розгортання](#автоматизація-розгортання)
|
||||
4. [Безпека мережі](#безпека-мережі)
|
||||
5. [Моніторинг та алерти](#моніторинг-та-алерти)
|
||||
6. [Roadmap](#roadmap)
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Архітектура мережі
|
||||
|
||||
### Ієрархія нод
|
||||
|
||||
```
|
||||
┌─────────────────┐
|
||||
│ MASTER NODE │
|
||||
│ (NODE1) │
|
||||
│ Hetzner │
|
||||
└────────┬────────┘
|
||||
│
|
||||
┌────────────────┼────────────────┐
|
||||
│ │ │
|
||||
┌──────▼──────┐ ┌──────▼──────┐ ┌──────▼──────┐
|
||||
│ REGION EU │ │ REGION US │ │ REGION ASIA │
|
||||
│ Controller │ │ Controller │ │ Controller │
|
||||
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
|
||||
│ │ │
|
||||
┌───────┼───────┐ ┌────┼────┐ ┌──────┼──────┐
|
||||
│ │ │ │ │ │ │ │ │
|
||||
50 50 50 25 25 25 25 25 25
|
||||
nodes nodes nodes nodes nodes nodes nodes nodes nodes
|
||||
```
|
||||
|
||||
### Типи нод
|
||||
|
||||
| Тип | Кількість | Роль | Ресурси |
|
||||
|-----|-----------|------|---------|
|
||||
| **Master** | 1 | Центральне управління, GitOps | 8 CPU, 32GB RAM |
|
||||
| **Region Controller** | 3-5 | Регіональне управління | 4 CPU, 16GB RAM |
|
||||
| **Compute Node** | ~140 | Обчислення, AI workloads | 2-8 CPU, 8-64GB RAM |
|
||||
| **GPU Node** | ~5 | AI/ML inference | GPU + 32GB+ RAM |
|
||||
|
||||
---
|
||||
|
||||
## 🎛️ Централізоване управління
|
||||
|
||||
### Інструменти
|
||||
|
||||
| Інструмент | Призначення | Альтернатива |
|
||||
|------------|-------------|--------------|
|
||||
| **Ansible** | Configuration Management | Salt, Puppet |
|
||||
| **Terraform** | Infrastructure as Code | Pulumi |
|
||||
| **Kubernetes** | Container Orchestration | Docker Swarm |
|
||||
| **Consul** | Service Discovery | etcd |
|
||||
| **Vault** | Secrets Management | AWS Secrets Manager |
|
||||
| **Prometheus** | Metrics | InfluxDB |
|
||||
| **Grafana** | Dashboards | - |
|
||||
| **Loki** | Logs | ELK Stack |
|
||||
|
||||
### Ansible Inventory Structure
|
||||
|
||||
```yaml
|
||||
# inventory/production.yml
|
||||
all:
|
||||
children:
|
||||
masters:
|
||||
hosts:
|
||||
node1-master:
|
||||
ansible_host: 144.76.224.179
|
||||
ansible_user: root
|
||||
|
||||
region_controllers:
|
||||
hosts:
|
||||
node3-eu:
|
||||
ansible_host: 80.77.35.151
|
||||
ansible_port: 33147
|
||||
ansible_user: zevs
|
||||
ansible_become_pass: "{{ vault_node3_password }}"
|
||||
|
||||
compute_nodes:
|
||||
children:
|
||||
eu_nodes:
|
||||
hosts:
|
||||
node-eu-[001:050]:
|
||||
ansible_host: "{{ inventory_hostname }}.daarion.network"
|
||||
us_nodes:
|
||||
hosts:
|
||||
node-us-[001:050]:
|
||||
ansible_host: "{{ inventory_hostname }}.daarion.network"
|
||||
asia_nodes:
|
||||
hosts:
|
||||
node-asia-[001:050]:
|
||||
ansible_host: "{{ inventory_hostname }}.daarion.network"
|
||||
|
||||
gpu_nodes:
|
||||
hosts:
|
||||
gpu-[01:05]:
|
||||
ansible_host: "{{ inventory_hostname }}.daarion.network"
|
||||
```
|
||||
|
||||
### Ansible Playbook: Security Setup
|
||||
|
||||
```yaml
|
||||
# playbooks/security-setup.yml
|
||||
---
|
||||
- name: Security Setup for All Nodes
|
||||
hosts: all
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
security_packages:
|
||||
- fail2ban
|
||||
- auditd
|
||||
- rkhunter
|
||||
- chkrootkit
|
||||
- ufw
|
||||
|
||||
tasks:
|
||||
- name: Update apt cache
|
||||
apt:
|
||||
update_cache: yes
|
||||
cache_valid_time: 3600
|
||||
|
||||
- name: Install security packages
|
||||
apt:
|
||||
name: "{{ security_packages }}"
|
||||
state: present
|
||||
|
||||
- name: Install Trivy
|
||||
shell: |
|
||||
curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin
|
||||
args:
|
||||
creates: /usr/local/bin/trivy
|
||||
|
||||
- name: Configure fail2ban
|
||||
template:
|
||||
src: templates/jail.local.j2
|
||||
dest: /etc/fail2ban/jail.local
|
||||
notify: restart fail2ban
|
||||
|
||||
- name: Configure UFW defaults
|
||||
ufw:
|
||||
direction: "{{ item.direction }}"
|
||||
policy: "{{ item.policy }}"
|
||||
loop:
|
||||
- { direction: incoming, policy: deny }
|
||||
- { direction: outgoing, policy: deny }
|
||||
|
||||
- name: Allow SSH
|
||||
ufw:
|
||||
rule: allow
|
||||
port: "{{ ansible_port | default(22) }}"
|
||||
proto: tcp
|
||||
|
||||
- name: Allow necessary outgoing
|
||||
ufw:
|
||||
rule: allow
|
||||
direction: out
|
||||
port: "{{ item }}"
|
||||
proto: "{{ item.proto | default('tcp') }}"
|
||||
loop:
|
||||
- { port: 53, proto: udp }
|
||||
- { port: 80 }
|
||||
- { port: 443 }
|
||||
- { port: 123, proto: udp }
|
||||
|
||||
- name: Block internal networks
|
||||
ufw:
|
||||
rule: deny
|
||||
direction: out
|
||||
to_ip: "{{ item }}"
|
||||
loop:
|
||||
- 10.0.0.0/8
|
||||
- 172.16.0.0/12
|
||||
|
||||
- name: Enable UFW
|
||||
ufw:
|
||||
state: enabled
|
||||
|
||||
- name: Copy security check script
|
||||
copy:
|
||||
src: files/security-check.sh
|
||||
dest: /opt/scripts/security-check.sh
|
||||
mode: '0755'
|
||||
|
||||
- name: Setup security cron
|
||||
cron:
|
||||
name: "Security check"
|
||||
minute: "0"
|
||||
job: "/opt/scripts/security-check.sh"
|
||||
|
||||
handlers:
|
||||
- name: restart fail2ban
|
||||
service:
|
||||
name: fail2ban
|
||||
state: restarted
|
||||
```
|
||||
|
||||
### Ansible Playbook: PostgreSQL Deployment
|
||||
|
||||
```yaml
|
||||
# playbooks/postgresql-deploy.yml
|
||||
---
|
||||
- name: Deploy PostgreSQL to Nodes
|
||||
hosts: database_nodes
|
||||
become: yes
|
||||
|
||||
vars:
|
||||
postgres_image: "postgres@sha256:23e88eb049fd5d54894d70100df61d38a49ed97909263f79d4ff4c30a5d5fca2"
|
||||
postgres_user: "daarion"
|
||||
postgres_password: "{{ vault_postgres_password }}"
|
||||
postgres_db: "daarion_main"
|
||||
|
||||
tasks:
|
||||
- name: Pull PostgreSQL image
|
||||
docker_image:
|
||||
name: "{{ postgres_image }}"
|
||||
source: pull
|
||||
|
||||
- name: Scan image with Trivy
|
||||
command: trivy image --severity HIGH,CRITICAL --exit-code 1 {{ postgres_image }}
|
||||
register: trivy_result
|
||||
failed_when: trivy_result.rc != 0
|
||||
|
||||
- name: Create PostgreSQL volume
|
||||
docker_volume:
|
||||
name: "postgres_data_{{ inventory_hostname }}"
|
||||
|
||||
- name: Run PostgreSQL container
|
||||
docker_container:
|
||||
name: dagi-postgres
|
||||
image: "{{ postgres_image }}"
|
||||
state: started
|
||||
restart_policy: "no"
|
||||
security_opts:
|
||||
- no-new-privileges:true
|
||||
read_only: yes
|
||||
tmpfs:
|
||||
- /tmp:noexec,nosuid,nodev,size=100m
|
||||
- /var/run/postgresql:noexec,nosuid,nodev,size=10m
|
||||
volumes:
|
||||
- "postgres_data_{{ inventory_hostname }}:/var/lib/postgresql/data"
|
||||
env:
|
||||
POSTGRES_USER: "{{ postgres_user }}"
|
||||
POSTGRES_PASSWORD: "{{ postgres_password }}"
|
||||
POSTGRES_DB: "{{ postgres_db }}"
|
||||
cpus: 2
|
||||
memory: 2g
|
||||
ports:
|
||||
- "5432:5432"
|
||||
|
||||
- name: Wait for PostgreSQL to be ready
|
||||
wait_for:
|
||||
host: localhost
|
||||
port: 5432
|
||||
delay: 5
|
||||
timeout: 60
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Автоматизація розгортання
|
||||
|
||||
### GitOps Workflow
|
||||
|
||||
```
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ GitHub │────▶│ ArgoCD │────▶│ Kubernetes │
|
||||
│ (configs) │ │ (GitOps) │ │ (runtime) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
│ │ │
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
||||
│ Terraform │────▶│ Ansible │────▶│ Nodes │
|
||||
│ (infra) │ │ (config) │ │ (150) │
|
||||
└─────────────┘ └─────────────┘ └─────────────┘
|
||||
```
|
||||
|
||||
### Terraform: Node Provisioning
|
||||
|
||||
```hcl
|
||||
# terraform/main.tf
|
||||
terraform {
|
||||
required_providers {
|
||||
hcloud = {
|
||||
source = "hetznercloud/hcloud"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
variable "hcloud_token" {
|
||||
sensitive = true
|
||||
}
|
||||
|
||||
variable "node_count" {
|
||||
default = 50
|
||||
}
|
||||
|
||||
provider "hcloud" {
|
||||
token = var.hcloud_token
|
||||
}
|
||||
|
||||
resource "hcloud_ssh_key" "default" {
|
||||
name = "daarion-network"
|
||||
public_key = file("~/.ssh/daarion_network.pub")
|
||||
}
|
||||
|
||||
resource "hcloud_server" "compute_nodes" {
|
||||
count = var.node_count
|
||||
name = "node-eu-${format("%03d", count.index + 1)}"
|
||||
server_type = "cx31" # 2 vCPU, 8GB RAM
|
||||
image = "ubuntu-24.04"
|
||||
location = "nbg1"
|
||||
ssh_keys = [hcloud_ssh_key.default.id]
|
||||
|
||||
labels = {
|
||||
role = "compute"
|
||||
region = "eu"
|
||||
managed = "terraform"
|
||||
}
|
||||
|
||||
user_data = <<-EOF
|
||||
#cloud-config
|
||||
packages:
|
||||
- docker.io
|
||||
- fail2ban
|
||||
- ufw
|
||||
runcmd:
|
||||
- systemctl enable docker
|
||||
- systemctl start docker
|
||||
- ufw default deny incoming
|
||||
- ufw default deny outgoing
|
||||
- ufw allow 22/tcp
|
||||
- ufw allow out 53/udp
|
||||
- ufw allow out 443/tcp
|
||||
- ufw --force enable
|
||||
EOF
|
||||
}
|
||||
|
||||
output "node_ips" {
|
||||
value = hcloud_server.compute_nodes[*].ipv4_address
|
||||
}
|
||||
```
|
||||
|
||||
### Deployment Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# scripts/deploy-network.sh
|
||||
|
||||
set -e
|
||||
|
||||
NODES_COUNT=${1:-10}
|
||||
REGION=${2:-eu}
|
||||
|
||||
echo "🚀 Deploying $NODES_COUNT nodes in $REGION region..."
|
||||
|
||||
# 1. Provision infrastructure
|
||||
echo "[1/5] Provisioning infrastructure..."
|
||||
cd terraform
|
||||
terraform init
|
||||
terraform apply -var="node_count=$NODES_COUNT" -auto-approve
|
||||
cd ..
|
||||
|
||||
# 2. Wait for nodes to be ready
|
||||
echo "[2/5] Waiting for nodes..."
|
||||
sleep 60
|
||||
|
||||
# 3. Update Ansible inventory
|
||||
echo "[3/5] Updating inventory..."
|
||||
terraform output -json node_ips | jq -r '.[]' > inventory/hosts_$REGION.txt
|
||||
|
||||
# 4. Run security setup
|
||||
echo "[4/5] Running security setup..."
|
||||
ansible-playbook -i inventory/production.yml playbooks/security-setup.yml --limit "$REGION_nodes"
|
||||
|
||||
# 5. Deploy services
|
||||
echo "[5/5] Deploying services..."
|
||||
ansible-playbook -i inventory/production.yml playbooks/services-deploy.yml --limit "$REGION_nodes"
|
||||
|
||||
echo "✅ Deployment complete!"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔒 Безпека мережі
|
||||
|
||||
### Zero Trust Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────┐
|
||||
│ ZERO TRUST LAYER │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ mTLS │ │ RBAC │ │ Network │ │ Secrets │ │
|
||||
│ │ │ │ │ │ Policy │ │ Vault │ │
|
||||
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ SERVICE MESH (Istio) │
|
||||
├─────────────────────────────────────────────────────────┤
|
||||
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
|
||||
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ │ Node N │ │
|
||||
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
|
||||
└─────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Security Policies
|
||||
|
||||
```yaml
|
||||
# k8s/network-policy.yml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: default-deny-all
|
||||
spec:
|
||||
podSelector: {}
|
||||
policyTypes:
|
||||
- Ingress
|
||||
- Egress
|
||||
|
||||
---
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: allow-postgres
|
||||
spec:
|
||||
podSelector:
|
||||
matchLabels:
|
||||
app: postgres
|
||||
ingress:
|
||||
- from:
|
||||
- podSelector:
|
||||
matchLabels:
|
||||
access: postgres
|
||||
ports:
|
||||
- protocol: TCP
|
||||
port: 5432
|
||||
```
|
||||
|
||||
### Vault Integration
|
||||
|
||||
```yaml
|
||||
# vault/postgres-policy.hcl
|
||||
path "database/creds/daarion-db" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
|
||||
path "secret/data/postgres/*" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
```
|
||||
|
||||
```bash
|
||||
# Отримання credentials
|
||||
vault read database/creds/daarion-db
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Моніторинг та алерти
|
||||
|
||||
### Prometheus Federation
|
||||
|
||||
```yaml
|
||||
# prometheus/federation.yml
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: 'federate'
|
||||
scrape_interval: 30s
|
||||
honor_labels: true
|
||||
metrics_path: '/federate'
|
||||
params:
|
||||
'match[]':
|
||||
- '{job="node"}'
|
||||
- '{job="docker"}'
|
||||
- '{job="postgres"}'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'node-eu-001:9090'
|
||||
- 'node-eu-002:9090'
|
||||
# ... all nodes
|
||||
```
|
||||
|
||||
### Grafana Dashboard
|
||||
|
||||
```json
|
||||
{
|
||||
"dashboard": {
|
||||
"title": "DAARION Network Overview",
|
||||
"panels": [
|
||||
{
|
||||
"title": "Total Nodes",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(up{job=\"node\"})"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Healthy Nodes",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "count(up{job=\"node\"} == 1)"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"title": "Security Alerts",
|
||||
"type": "stat",
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(security_alerts_total)"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Alert Rules
|
||||
|
||||
```yaml
|
||||
# prometheus/alerts.yml
|
||||
groups:
|
||||
- name: network
|
||||
rules:
|
||||
- alert: NodeDown
|
||||
expr: up{job="node"} == 0
|
||||
for: 5m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Node {{ $labels.instance }} is down"
|
||||
|
||||
- alert: HighCPU
|
||||
expr: node_cpu_seconds_total{mode="idle"} < 20
|
||||
for: 10m
|
||||
labels:
|
||||
severity: warning
|
||||
|
||||
- alert: SuspiciousProcess
|
||||
expr: security_suspicious_process > 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Suspicious process on {{ $labels.instance }}"
|
||||
|
||||
- alert: PostgresDown
|
||||
expr: pg_up == 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📅 Roadmap
|
||||
|
||||
### Phase 1: Foundation (Тиждень 1-2)
|
||||
- [x] NODE1 rebuild + security
|
||||
- [x] NODE3 setup + security
|
||||
- [x] PostgreSQL на NODE1 та NODE3
|
||||
- [ ] Ansible repository setup
|
||||
- [ ] Terraform configs
|
||||
- [ ] CI/CD pipeline
|
||||
|
||||
### Phase 2: Regional Controllers (Тиждень 3-4)
|
||||
- [ ] Deploy 3 region controllers
|
||||
- [ ] Consul cluster setup
|
||||
- [ ] Vault setup
|
||||
- [ ] Prometheus federation
|
||||
|
||||
### Phase 3: First 50 Nodes (Тиждень 5-8)
|
||||
- [ ] EU region: 50 nodes
|
||||
- [ ] Automated deployment testing
|
||||
- [ ] Security audit
|
||||
- [ ] Performance testing
|
||||
|
||||
### Phase 4: Scale to 150 (Тиждень 9-12)
|
||||
- [ ] US region: 50 nodes
|
||||
- [ ] Asia region: 50 nodes
|
||||
- [ ] Global monitoring
|
||||
- [ ] Disaster recovery testing
|
||||
|
||||
### Phase 5: Production (Тиждень 13+)
|
||||
- [ ] Full production workloads
|
||||
- [ ] 24/7 monitoring
|
||||
- [ ] Automated incident response
|
||||
- [ ] Continuous security audits
|
||||
|
||||
---
|
||||
|
||||
## 💰 Estimated Costs
|
||||
|
||||
| Resource | Per Node | 50 Nodes | 150 Nodes |
|
||||
|----------|----------|----------|-----------|
|
||||
| Hetzner CX31 | €10/mo | €500/mo | €1,500/mo |
|
||||
| Storage (100GB) | €5/mo | €250/mo | €750/mo |
|
||||
| Bandwidth | ~€5/mo | €250/mo | €750/mo |
|
||||
| **Total** | **€20/mo** | **€1,000/mo** | **€3,000/mo** |
|
||||
|
||||
---
|
||||
|
||||
## 📚 Додаткові ресурси
|
||||
|
||||
- [Ansible Best Practices](https://docs.ansible.com/ansible/latest/user_guide/playbooks_best_practices.html)
|
||||
- [Terraform Hetzner Provider](https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs)
|
||||
- [Kubernetes Network Policies](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
|
||||
- [HashiCorp Vault](https://www.vaultproject.io/docs)
|
||||
- [Prometheus Federation](https://prometheus.io/docs/prometheus/latest/federation/)
|
||||
|
||||
---
|
||||
|
||||
**Автор:** Ivan Tytar & AI Assistant
|
||||
**Останнє оновлення:** 2026-01-10
|
||||
Reference in New Issue
Block a user