improving things

improve wording
adding loki and fixing version
2025-10-28 15:36:03 -03:00 · 2025-10-25 22:53:28 -03:00 · 2025-10-24 22:41:40 -03:00 · 2025-10-21 15:53:48 -03:00 · 2025-10-21 15:53:42 -03:00
16 changed files with 553 additions and 53 deletions
--- a/README.md
+++ b/README.md
@@ -2,15 +2,15 @@

 **A *forever-work-in-progress* self-hosted server setup**

-Based on a multi-node k3s cluster running on VMs and bare metal hardware.
+Runs on a multi-node k3s cluster deployed across VMs and bare-metal hosts.

-The overall application configs are stored in a NFS share inside of a SSD that was purposed specifically for this. For that I'm using `nfs-subdir-external-provisioner` as a dynamic storage provisioner with specified paths on each PVC. Some other data is stored on a NAS server with a NFS share as well.
+Application configuration is stored on an NFS share located on a dedicated SSD. This uses `nfs-subdir-external-provisioner` as a dynamic storage provisioner with PVC-specific paths. Additional data is stored on a NAS exported via NFS.

-The cluster is running on `k3s` with `nginx` as the ingress controller. For load balancing I'm using `MetalLB` in layer 2 mode. I'm also using `cert-manager` for local CA and certificates (as Vaultwarden requires it).
+The cluster runs `k3s` with `nginx` as the ingress controller. `MetalLB` is used in layer 2 mode for load balancing. `cert-manager` provides a local CA and issues certificates (required by Vaultwarden).

-For more information on setup, check out [SETUP.md](SETUP.md).
+For setup details, see [SETUP.md](SETUP.md).

-Also, the repository name is a reference to my local TLD which is `.haven` :)
+The repository name references my local TLD, `.haven` ;)

 ## Namespaces
 - default
@@ -27,26 +27,36 @@ Also, the repository name is a reference to my local TLD which is `.haven` :)
    - AdGuardHome-2 (2nd instance)
    - AdGuard-Sync
 - infra
-    - Haven Notify (my own internal service)
+    - [Haven Notify](https://git.ivanch.me/ivanch/server-scripts/src/branch/main/haven-notify)
    - Beszel
-    - Beszel Agent (running as DaemonSet)
-    - Code Config (vscode for internal config editing)
+    - Beszel Agent (running as a DaemonSet)
+    - Code Config (VS Code for internal config editing)
    - WireGuard Easy
 - dev
    - Gitea Runner (x64)
    - Gitea Runner (arm64)
+- monitoring
+    - Grafana
+    - Prometheus
+    - Node Exporter
+    - Kube State Metrics
+    - Loki
+    - Alloy

 #### Miscellaneous namespaces

- lab (A playground/sandbox namespace)
-    - nfs-pod (for testing and accessing NFS mounts through NFS)
+- lab (a playground/sandbox namespace)
+    - nfs-pod (for testing and accessing NFS mounts)
 - metallb-system
    - MetalLB components
 - cert-manager
-    - Cert-Manager components
+    - cert-manager components

-## Todo:
- Move archivebox data to its own PVC on NAS
- Move uptimekuma to `infra` namespace
- Add links to each application docs
- Add links to server scripts
+## Todo
+- Move ArchiveBox data to its own PVC on the NAS
+- Move Uptime Kuma to the infra namespace
+- Add links to each application's documentation
+- Add links to server scripts
+- Move Alloy to the monitoring namespace
+- Install Loki, Grafana, and Prometheus via Helm charts
+- Configure Loki and Prometheus to use PVCs
--- a/SETUP.md
+++ b/SETUP.md
@@ -50,7 +50,7 @@ kubectl apply -f metallb-system/address-pool.yaml

 ## Install cert-manager
 ```bash
-kubectl create namespace cert-manager
+kubectl create ns cert-manager
 kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.15.1/cert-manager.yaml
 ```

--- a/default/homarr.yaml
+++ b/default/homarr.yaml
@@ -67,7 +67,7 @@ spec:
    - port: 7575
      targetPort: homarr-port
 ---
-# 3) PersistentVolumeClaim (for /config)
+# 3) PersistentVolumeClaim
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
@@ -83,7 +83,7 @@ spec:
    requests:
      storage: 1Gi
 ---
-# 4) Ingress (Traefik)
+# 4) Ingress
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
--- a/default/notepad.yaml
+++ b/default/notepad.yaml
@@ -44,7 +44,7 @@ spec:
    - port: 80
      targetPort: 80
 ---
-# 3) PersistentVolumeClaim (local storage via k3s local-path)
+# 3) PersistentVolumeClaim
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
@@ -60,7 +60,7 @@ spec:
    requests:
      storage: 1Gi
 ---
-# 4) Ingress (Traefik)
+# 4) Ingress
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
--- a/default/searxng.yaml
+++ b/default/searxng.yaml
@@ -49,7 +49,7 @@ spec:
    - port: 8080
      targetPort: searxng-port
 ---
-# 3) PersistentVolumeClaim (for /config)
+# 3) PersistentVolumeClaim
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
@@ -65,7 +65,7 @@ spec:
    requests:
      storage: 1Gi
 ---
-# 4) Ingress (Traefik)
+# 4) Ingress
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
--- a/default/uptime-kuma.yaml
+++ b/default/uptime-kuma.yaml
@@ -63,7 +63,7 @@ spec:
    - port: 3001
      targetPort: uptimekuma-port
 ---
-# 3) PersistentVolumeClaim (for /config)
+# 3) PersistentVolumeClaim
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
@@ -79,7 +79,7 @@ spec:
    requests:
      storage: 1Gi
 ---
-# 4) Ingress (Traefik)
+# 4) Ingress
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
--- a/default/vaultwarden.yaml
+++ b/default/vaultwarden.yaml
@@ -75,7 +75,7 @@ spec:
    requests:
      storage: 1Gi
 ---
-# 4) Ingress (Traefik)
+# 4) Ingress
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
@@ -102,7 +102,7 @@ spec:
                port:
                  number: 80
 ---
-# 4) Ingress (Traefik)
+# 4) Ingress
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
--- a/dns/README.md
+++ b/dns/README.md
@@ -4,18 +4,3 @@ kubectl create secret generic adguardhome-password \
    --from-literal=password='your_adguardhome_password' \ 
    --from-literal=username='your_adguardhome_username' -n dns
 ```
-
-## Add AdGuardHome to CoreDNS configmap fallback:
-1. Edit the CoreDNS configmap:
-```bash
-kubectl edit configmap coredns -n kube-system
-```
-2. Replace the `forward` line with the following:
-```
-    forward . <ADGUARDHOME_IP> <ADGUARDHOME_IP_2>
-```
-This will use AdGuardHome as the primary DNS server and a secondary one as a fallback, instead of using the default Kubernetes CoreDNS server.
-
-You may also use `/etc/resolv.conf` to forward to the node's own DNS resolver, but it depends on whether it's well configured or not. *Since it's Linux, we never know.*
-
-Ideally, since DNS is required for fetching the container image, you would have AdGuardHome as first and then a public DNS server as second (fallback).
--- a/infra/beszel-agent.yaml
+++ b/infra/beszel-agent.yaml
@@ -22,7 +22,7 @@ spec:
            secretKeyRef:
              name: beszel-key
              key: SECRET-KEY
-        image: henrygd/beszel-agent:0.12.10
+        image: henrygd/beszel-agent:0.14.1
        imagePullPolicy: Always
        name: beszel-agent
        ports:
--- a/infra/beszel.yaml
+++ b/infra/beszel.yaml
@@ -15,15 +15,19 @@ spec:
      labels:
        app: beszel
    spec:
+      affinity:
+        nodeAffinity:
+          requiredDuringSchedulingIgnoredDuringExecution:
+            nodeSelectorTerms:
+            - matchExpressions:
+              - key: kubernetes.io/arch
+                operator: In
+                values:
+                - amd64
      containers:
        - name: beszel
-          image: ghcr.io/henrygd/beszel/beszel:0.12.10
+          image: ghcr.io/henrygd/beszel/beszel:0.14.1
          imagePullPolicy: Always
-          env:
-            - name: PUID
-              value: "1000"
-            - name: PGID
-              value: "1000"
          ports:
            - containerPort: 8090
              name: beszel-port
@@ -49,7 +53,7 @@ spec:
    - port: 80
      targetPort: beszel-port
 ---
-# 3) PersistentVolumeClaim (for /config)
+# 3) PersistentVolumeClaim
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
@@ -65,7 +69,7 @@ spec:
    requests:
      storage: 1Gi
 ---
-# 4) Ingress (Traefik)
+# 4) Ingress
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
--- a/infra/code-config.yaml
+++ b/infra/code-config.yaml
@@ -66,7 +66,7 @@ spec:
    - port: 8443
      targetPort: code-port
 ---
-# 3) PersistentVolumeClaim (for /config)
+# 3) PersistentVolumeClaim
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
@@ -82,7 +82,7 @@ spec:
    requests:
      storage: 5Gi
 ---
-# 4) Ingress (Traefik)
+# 4) Ingress
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
--- a/monitoring/grafana.yaml
+++ b/monitoring/grafana.yaml
@@ -0,0 +1,105 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  labels:
+    app: grafana
+  name: grafana
+  namespace: monitoring
+spec:
+  selector:
+    matchLabels:
+      app: grafana
+  template:
+    metadata:
+      labels:
+        app: grafana
+    spec:
+      securityContext:
+        fsGroup: 472
+        supplementalGroups:
+          - 0
+      containers:
+        - name: grafana
+          image: grafana/grafana:latest
+          imagePullPolicy: Always
+          ports:
+            - containerPort: 3000
+              name: http-grafana
+              protocol: TCP
+          readinessProbe:
+            failureThreshold: 3
+            httpGet:
+              path: /robots.txt
+              port: 3000
+              scheme: HTTP
+            initialDelaySeconds: 10
+            periodSeconds: 30
+            successThreshold: 1
+            timeoutSeconds: 2
+          livenessProbe:
+            failureThreshold: 3
+            initialDelaySeconds: 30
+            periodSeconds: 10
+            successThreshold: 1
+            tcpSocket:
+              port: 3000
+            timeoutSeconds: 1
+          resources:
+            requests:
+              cpu: 250m
+              memory: 750Mi
+          volumeMounts:
+            - mountPath: /var/lib/grafana
+              name: grafana-pv
+      volumes:
+        - name: grafana-pv
+          persistentVolumeClaim:
+            claimName: grafana-pvc
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: grafana-pvc
+  namespace: monitoring
+  annotations:
+    nfs.io/storage-path: "grafana-data"
+spec:
+  storageClassName: "nfs-client"
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 1Gi
+---
+apiVersion: v1
+kind: Service
+metadata:
+  namespace: monitoring
+  name: grafana
+spec:
+  ports:
+    - port: 3000
+      protocol: TCP
+      targetPort: http-grafana
+  selector:
+    app: grafana
+  type: ClusterIP
+---
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  namespace: monitoring
+  name: grafana
+spec:
+  ingressClassName: nginx
+  rules:
+    - host: grafana.haven
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service:
+                name: grafana
+                port:
+                  number: 3000
--- a/monitoring/kube-state-metrics.yaml
+++ b/monitoring/kube-state-metrics.yaml
@@ -0,0 +1,109 @@
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: kube-state-metrics
+  namespace: monitoring
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: kube-state-metrics
+rules:
+  - apiGroups: [""]
+    resources:
+      - nodes
+      - pods
+      - services
+      - endpoints
+      - namespaces
+      - replicationcontrollers
+    verbs: ["list", "watch"]
+  - apiGroups: ["extensions", "apps"]
+    resources:
+      - daemonsets
+      - deployments
+      - replicasets
+      - statefulsets
+    verbs: ["list", "watch"]
+  - apiGroups: ["batch"]
+    resources:
+      - cronjobs
+      - jobs
+    verbs: ["list", "watch"]
+  - apiGroups: ["autoscaling"]
+    resources:
+      - horizontalpodautoscalers
+    verbs: ["list", "watch"]
+  - apiGroups: ["policy"]
+    resources:
+      - poddisruptionbudgets
+    verbs: ["list", "watch"]
+  - apiGroups: ["storage.k8s.io"]
+    resources:
+      - storageclasses
+      - volumeattachments
+    verbs: ["list", "watch"]
+  - apiGroups: ["apps"]
+    resources:
+      - replicasets
+    verbs: ["list", "watch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: kube-state-metrics
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: kube-state-metrics
+subjects:
+  - kind: ServiceAccount
+    name: kube-state-metrics
+    namespace: monitoring
+---
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: kube-state-metrics
+  namespace: monitoring
+  labels:
+    app: kube-state-metrics
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: kube-state-metrics
+  template:
+    metadata:
+      labels:
+        app: kube-state-metrics
+    spec:
+      serviceAccountName: kube-state-metrics
+      containers:
+        - name: kube-state-metrics
+          image: registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0
+          ports:
+            - name: http-metrics
+              containerPort: 8080
+          resources:
+            requests:
+              cpu: 50m
+              memory: 64Mi
+            limits:
+              cpu: 200m
+              memory: 256Mi
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: kube-state-metrics
+  namespace: monitoring
+  labels:
+    app: kube-state-metrics
+spec:
+  ports:
+    - name: http-metrics
+      port: 8080
+      targetPort: http-metrics
+  selector:
+    app: kube-state-metrics
--- a/monitoring/loki.yaml
+++ b/monitoring/loki.yaml
@@ -0,0 +1,101 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: loki
+  namespace: monitoring
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: loki
+  template:
+    metadata:
+      labels:
+        app: loki
+    spec:
+      containers:
+        - name: loki
+          image: grafana/loki:3
+          args: ["-config.file=/etc/loki/config/config.yaml"]
+          ports:
+            - containerPort: 3100
+          volumeMounts:
+            - name: config
+              mountPath: /etc/loki/config
+            - name: loki-storage
+              mountPath: /tmp/loki
+      volumes:
+        - name: config
+          configMap:
+            name: loki-config
+        - name: loki-storage
+          emptyDir:
+            medium: Memory
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: loki-config
+  namespace: monitoring
+data:
+  config.yaml: |
+    auth_enabled: true
+    server:
+      http_listen_port: 3100
+
+    common:
+      ring:
+        instance_addr: 127.0.0.1
+        kvstore:
+          store: inmemory
+      replication_factor: 1
+      path_prefix: /tmp/loki
+    querier:
+      multi_tenant_queries_enabled: true
+
+    schema_config:
+      configs:
+        - from: "2024-01-01"
+          store: tsdb
+          object_store: filesystem
+          schema: v13
+          index:
+            prefix: index_
+            period: 24h
+
+    storage_config:
+      tsdb_shipper:
+        active_index_directory: /tmp/loki/index
+        cache_location: /tmp/loki/cache
+      filesystem:
+        directory: /tmp/loki/chunks
+
+    limits_config:
+      allow_structured_metadata: true
+      retention_period: 0
+
+    ingester:
+      lifecycler:
+        ring:
+          kvstore:
+            store: inmemory
+          replication_factor: 1
+      chunk_idle_period: 1m
+      max_chunk_age: 5m
+      chunk_target_size: 1536000
+
+    compactor:
+      retention_enabled: false
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: loki
+  namespace: monitoring
+spec:
+  ports:
+    - port: 3100
+      targetPort: 3100
+      name: http
+  selector:
+    app: loki
--- a/monitoring/nodeexporter.yaml
+++ b/monitoring/nodeexporter.yaml
@@ -0,0 +1,56 @@
+apiVersion: apps/v1
+kind: DaemonSet
+metadata:
+  name: node-exporter
+  namespace: monitoring
+  labels:
+    app: node-exporter
+spec:
+  selector:
+    matchLabels:
+      app: node-exporter
+  template:
+    metadata:
+      labels:
+        app: node-exporter
+    spec:
+      hostNetwork: true
+      containers:
+        - name: node-exporter
+          image: prom/node-exporter:latest
+          imagePullPolicy: Always
+          args:
+            - "--path.rootfs=/host"
+          ports:
+            - containerPort: 9100
+              hostPort: 9100
+              name: metrics
+              protocol: TCP
+          resources:
+            requests:
+              memory: "50Mi"
+              cpu: "100m"
+            limits:
+              memory: "100Mi"
+              cpu: "200m"
+          volumeMounts:
+            - name: host
+              mountPath: /host
+              readOnly: true
+      volumes:
+        - name: host
+          hostPath:
+            path: /
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: node-exporter
+  namespace: monitoring
+spec:
+  selector:
+    app: node-exporter
+  ports:
+    - name: metrics
+      port: 9100
+      targetPort: metrics
--- a/monitoring/prometheus.yaml
+++ b/monitoring/prometheus.yaml
@@ -0,0 +1,130 @@
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: prometheus
+  namespace: monitoring
+  labels:
+    app: prometheus
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: prometheus
+  template:
+    metadata:
+      labels:
+        app: prometheus
+    spec:
+      serviceAccountName: prometheus
+      containers:
+        - name: prometheus
+          image: prom/prometheus:latest
+          args:
+            - "--config.file=/etc/prometheus/prometheus.yml"
+            - "--storage.tsdb.path=/prometheus"
+            - "--storage.tsdb.retention.time=1d"
+            - "--web.enable-lifecycle"
+          ports:
+            - containerPort: 9090
+              name: web
+          volumeMounts:
+            - name: prometheus-config-volume
+              mountPath: /etc/prometheus
+            - name: prometheus-storage
+              mountPath: /prometheus
+          resources:
+            requests:
+              memory: "500Mi"
+              cpu: "200m"
+            limits:
+              memory: "1Gi"
+              cpu: "500m"
+      volumes:
+        - name: prometheus-config-volume
+          persistentVolumeClaim:
+            claimName: prometheus-pvc
+        - name: prometheus-storage
+          emptyDir:
+            medium: Memory
+            sizeLimit: 256Mi
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: prometheus-pvc
+  namespace: monitoring
+  annotations:
+    nfs.io/storage-path: "prometheus-config"
+spec:
+  storageClassName: "nfs-client"
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 1Gi
+---
+# Service URL - http://prometheus.monitoring.svc.cluster.local:9090
+apiVersion: v1
+kind: Service
+metadata:
+  name: prometheus
+  namespace: monitoring
+  labels:
+    app: prometheus
+spec:
+  ports:
+    - name: web
+      port: 9090
+      targetPort: web
+  selector:
+    app: prometheus
+  type: ClusterIP
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: prometheus
+  namespace: monitoring
+  labels:
+    app: prometheus
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: prometheus
+  namespace: monitoring
+  labels:
+    app: prometheus
+rules:
+- apiGroups: [""]
+  resources:
+  - nodes
+  - nodes/proxy
+  - services
+  - endpoints
+  - pods
+  verbs: ["get", "list", "watch"]
+- apiGroups: ["extensions"]
+  resources:
+  - ingresses
+  verbs: ["get", "list", "watch"]
+- apiGroups: ["networking.k8s.io"]
+  resources:
+  - ingresses
+  verbs: ["get", "list", "watch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: prometheus
+  namespace: monitoring
+  labels:
+    app: prometheus
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: prometheus
+subjects:
+- kind: ServiceAccount
+  name: prometheus
+  namespace: monitoring
Author	SHA1	Message	Date
Jose Henrique	eb6b3108e0	improving things	2025-10-28 15:36:03 -03:00
Jose Henrique	868fdce461	improve wording	2025-10-25 22:53:28 -03:00
Jose Henrique	5d436bb632	adding loki and fixing version	2025-10-24 22:41:40 -03:00
Jose Henrique	7d88137084	adding monitoring	2025-10-21 15:53:48 -03:00
Jose Henrique	703775e224	updating beszel	2025-10-21 15:53:42 -03:00