OpenIM Go Backend Deadlock Socket K8s Consistent Hashing

OpenIM Self-Hosting Pitfalls (1): Stuck Online/Offline Process

We deployed OpenIM version 3.5.1 in our Kubernetes environment for our company’s IM scenarios. During development and operation, we encountered some issues. Here, I’ll document the details of the problems and the resolution process. Problem Trigger Scenario We wrote a stress testing program to simulate typical user scenarios: Establish connection (Online) Send message Receive reply Disconnect (Offline) Test settings: 100 concurrent user accounts, with each account continuously repeating the above process to simulate high-frequency online/offline scenarios. ...

February 12, 2026 · 6 min · 1142 words · Simon Sun

OpenIM Self-Hosting Pitfalls (2): Socket Leak

We deployed OpenIM version 3.5.1 in our Kubernetes environment for our company’s IM scenarios. During development and operation, we encountered some issues. Here, I’ll document the details of the problems and the resolution process. Problem Trigger Scenario 300 users online simultaneously, sending one-on-one messages irregularly. Problem Phenomenon From the monitoring metrics, we can see: A large number of goroutines in openimserver-openim-push and openimserver-openim-msggateway. A large number of socket connections in openimserver-openim-push and openimserver-openim-msggateway pods. ...

February 12, 2026 · 2 min · 268 words · Simon Sun

OpenIM Self-Hosting Pitfalls (3): Scaling Errors

We deployed OpenIM version 3.5.1 in our Kubernetes environment for our company’s IM scenarios. During development and operation, we encountered some issues. Here, I’ll document the details of the problems and the resolution process. Problem Description Simply put: Users are clearly online but cannot receive real-time messages. They can only receive push notifications, and messages only appear in the chat interface after a delay. This problem only occurs after scaling out. Everything works fine with a single node. The root cause is an inconsistency in OpenIM’s internal load balancing mechanism, causing messages to be sent to the wrong service node. ...

February 12, 2026 · 4 min · 683 words · Simon Sun