Demystifying Evals for AI Agents
Source: Anthropic Engineering — Demystifying evals for AI agents Good evaluations help teams ship AI agents more confidently. Without them, it’s easy to get stuck in reactive loops—catching issues only in production, where fixing one failure creates others. Evals make problems and behavioral changes visible before they affect users, and their value compounds over the lifecycle of an agent. ...
OpenIM Self-Hosting Pitfalls (1): Stuck Online/Offline Process
We deployed OpenIM version 3.5.1 in our Kubernetes environment for our company’s IM scenarios. During development and operation, we encountered some issues. Here, I’ll document the details of the problems and the resolution process. Problem Trigger Scenario We wrote a stress testing program to simulate typical user scenarios: Establish connection (Online) Send message Receive reply Disconnect (Offline) Test settings: 100 concurrent user accounts, with each account continuously repeating the above process to simulate high-frequency online/offline scenarios. ...
OpenIM Self-Hosting Pitfalls (2): Socket Leak
We deployed OpenIM version 3.5.1 in our Kubernetes environment for our company’s IM scenarios. During development and operation, we encountered some issues. Here, I’ll document the details of the problems and the resolution process. Problem Trigger Scenario 300 users online simultaneously, sending one-on-one messages irregularly. Problem Phenomenon From the monitoring metrics, we can see: A large number of goroutines in openimserver-openim-push and openimserver-openim-msggateway. A large number of socket connections in openimserver-openim-push and openimserver-openim-msggateway pods. ...
OpenIM Self-Hosting Pitfalls (3): Scaling Errors
We deployed OpenIM version 3.5.1 in our Kubernetes environment for our company’s IM scenarios. During development and operation, we encountered some issues. Here, I’ll document the details of the problems and the resolution process. Problem Description Simply put: Users are clearly online but cannot receive real-time messages. They can only receive push notifications, and messages only appear in the chat interface after a delay. This problem only occurs after scaling out. Everything works fine with a single node. The root cause is an inconsistency in OpenIM’s internal load balancing mechanism, causing messages to be sent to the wrong service node. ...