Debugging WS issues?

marinakai · September 9, 2025, 10:00pm

I’m working on an OGS client. I’ve got myself successfully authenticating to the WS API with a JWT, and presume I’m authenticating correctly because it’s sending me back an active-bots and a bunch of active_game messages.

To make a move, I’m sending a WS message that looks like e.g. ["game/move",{"game_id":78858441,"move":"df"},5] (where 5 is an autogenerated ID, “df” is a valid move for the game with that ID, and it is my turn in that game). I immediately get back [5, null, null] in response, which I take to mean “we received that message successfully”. However, the game does not actually get updated with that move if I refresh my app or the website. (this is pointed at production, not beta). I’m logging out all received messages and errors, I’m not getting anything in response other than the barebones acknowledgment.

When I make a move on online-go.com and sniff traffic, I see move messages look like ["game/move",{"game_id":79025745,"move":"oo","blur":16498,"clock":{"main_time":296929425.00000006,"timed_out":false}},2] and receive the same [2, null, null] in response. The ClientToServer type declaration in the Goban repo indicates that blur and clock are both optional (I’ve added blur to my payload with no change, my codebase isn’t properly handling game clock yet so I ideally don’t want to fake time data). So I know the API isn’t expecting e.g. a player ID, as some older examples include.

I’m curious if anyone knows the specific issue I’m facing offhand, but more than that: what strategies are people using to debug the RT API when you get zero feedback whether you’ve done something correctly other than seeing if it actually affects the gamestate or not? I feel like my dev velocity is slower than I’d like, and I don’t know if there’s something I’m missing or if that’s just the state of the API.

GreenAsJade · September 9, 2025, 11:18pm

In regards to clock … there’s a comment that says:

    /** Returns true if we believe the client is being responsible and will send
     *  a timed_out message if they've timed out. If we believe this, we give them
     *  a grace period for sending that message, otherwise we use server time to
     *  time them out immediately.
     */

… so it seems you don’t have to send that (as you correctly thought).

Also, the codebase is full of cases where it will reply with an error if there is one.

I haven’t found the path where it can respond without an error, but do nothing, yet.

benjito · September 9, 2025, 11:29pm

Hi! I’m working on similar (WeiqiHub/tree/automatch/lib/game_client/ogs).

I felt the lack of error feedback to be a problem as well. I was having a lot of trouble getting automatch to work, and ultimately it was because I was accidentally sending an empty string instead of the real jwt. Hopefully the maintainers consider adding clear error messages, as the lack thereof creates dev friction.

I don’t have much advice here, besides triple-checking things, and doing exactly as the official impl does. I found my issue by printing the jwt.

I send in dummy values here and it’s working. I didn’t try to remove them source

EDIT: okay, I tried again with the values omitted, and it works fine too

If you’re open to sharing the source, I’d be happy to take a look.

Also, here’s my login/WS setup code I assume you’re doing similar: OGSGameClient.login()

GreenAsJade · September 9, 2025, 11:34pm

I wonder if the “throws” don’t make it into an error packet back?

It appears as if they should.

biab.

benjito · September 9, 2025, 11:35pm

Is this the case? I when I was sending in an empty jwt string to “authenticate” I got nothing

EDIT: just saw your response. great if the error infra is (almost) there already!

marinakai · September 10, 2025, 12:58am

It turns out I was using an expired (but previously valid) JWT, my code path for requesting a refresh token isn’t fully up and running yet and I was caching too aggressively.

Definitely agree that there should be an error threaded through if the user is doing a WS authentication with an invalid token, especially with it half-behaving like a valid token by returning valid (public) data as if login was successful!

GreenAsJade · September 10, 2025, 2:12am

Yeah - probably it doesn’t even get to the code I was looking at if auth is broken.

We’ll see about improving the error response there, though you know … it’s in the queue.

marinakai · September 12, 2025, 3:13am

Hrm, I spoke too soon, this isn’t working again.

@benjito still figuring out my plan for properly open-sourcing things, but here’s the relevant code: OGS API client code · GitHub

As far as I can tell my, path is:

Make an OAuth auth request (always password, not handling refresh tokens yet)
Use the received Bearer token to call /config
Open a WS connection, then send a message of the form ["authenticate", {"jwt": <user_jwt received from /config>, <id>], receive empty acknowledgement (same as prod OGS except I’m not yet sending client metadata, which docs say are optional. I’ve seen examples include the player’s ID, but prod OGS does not)
Make a valid move of the form ["game/move", { game_id: <game ID>, move: <SGF Point string> }, <id>], receive empty acknowledgement (same as prod OGS except for the blur/clock options we’ve confirmed are optional).

Result is as before, silently fails. I’ve confirmed my JWT is valid (/config is giving me the same exact JWT string the prod client is using in my browser), and other than the missing optional params I can’t see any difference between my authenticate and game/move calls and the one the official client is sending over the wire. I’m especially confused since this was working at one point a few days ago!

marinakai · September 12, 2025, 3:19am

Okay, leaving up for posterity in case it’s helpful to anyone else, but the problem was me game/connect-ing to multiple games that were not the one I was trying to make moves on — when I shrunk it down to only connecting to the active game, sending a move for that game works.

If someone with more context on the system can confirm that’s expected, I’m happy to PR in a docs change. ClientToServer just says “Once connected, the client will receive game updates relevant to the game” (hence me saying “sure, give me real-time updates on all games in the user’s active game list” and connecting to all), rather than it being more stateful and affecting e.g. your ability to send moves.

GreenAsJade · September 12, 2025, 3:44am

Just intuitively, it makes sense that the backend is expecting one socket per game (in fact, one socket per browser tab). (Intuitively - I mean, this is what I think I’ve seen in the code, but I haven’t gone to check in detail yet )

benjito · September 12, 2025, 5:16am

Hmm I don’t think multiple game connections is usually a problem. The Watch page and home page both do this, though admittedly you can’t send moves through those pages.

This would be unexpected behavior IMO.

I wrote a quick demo in the official client with two separate games sharing a socket, and I’m able to submit moves: GitHub - benjaminpjones/online-go.com at two-games-one-socket (the source code doesn’t make it totally obvious it’s the same socket, but I confirmed in the network tab)

Are you sure you’re sending the right move ID when there are multiple?

Screen Recording 2025-09-12 at 12.45.09 AM

GreenAsJade · September 12, 2025, 5:20am

Yeah I had a look, intuition was wrong: the endpoint literally just submits the received move to the target game

@benijto can you provoke an error response?

The handler is more or less

 const g = game_singleton(args.game_id);
 g.move(
            args.player_id,
            args.move, ...
))
            .catch((err) => {
                ...
                socket.send(`game/${args.game_id}/error`, `${err.toString()} ${last_move}`);
            });

There are heaps of places inside g.move that throw a helpful error message.

I’m wondering if it just doesn’t make it back?

marinakai · September 12, 2025, 5:30am

Yeah, I’m definitely using the correct game ID. For times this has been working vs not working, that code path has also not been touched at all.

marinakai · September 12, 2025, 5:35am

Tomorrow I’ll poke and see if I can rip out all my UI etc code and just get a minimal working example that sends the set of messages I was sending. That might either be a useful failing testcase, or shake out for me what I’m doing wrong.

benjito · September 12, 2025, 4:06pm

one thing that helps me is to print out all traffic that goes in/out of my app. For web, the Network tab makes this easy, but for other apps just update the send and handler methods

export function sendMessage(command: string, data: unknown) {
	const message = JSON.stringify([command, data, ++requestId])
        console.log(`sending: ${message}`)
          ...
}

And looks like you’re already logging the handleCommand side.

Logging the inputs and outputs might help reduce to a reproducible test case as well.

marinakai · September 12, 2025, 5:09pm

Yeah, I’m typically debugging this using the network tab, since IMO that makes it easy to inspect both the REST requests and WS messages than parsing messy logs. This is a mobile app that’s running as a “native” view via an app wrapper, so even when I’m running it on-device on iOS I still have access to the browser devtools and inspector.

marinakai · September 12, 2025, 6:34pm

Hmm, okay, so this is interesting.

test.js · GitHub is about as minimal a JS WS setup I could come up with. Pop in a valid JWT from inspecting the /config call in the network tab of a production copy of online-go.com (since my auth flow is returning the same JWT from /config as prod, I figured this was a good way to isolate the problem and not deal with OAuth), and the gameID/move coords of a valid move, and it will:

Connect to WS
Send an authenticate message once the socket is open (there’s basic queueing functionality)
Log out all of the messages it receives
After 5 seconds, send a hardcoded move.

I’ve found this is… inconsistent. Sometimes a valid move will fail once but succeed the second time I run the script, sometimes it will never work.

I did notice that, when moves actually ‘take’, I get another “active_game” message after the move send for the game that’s been updated.

Example output:

connection still connecting, queueing message
open Event {
  type: 'open',
  defaultPrevented: false,
  cancelable: false,
  timeStamp: 424.999125
}
attempting to flush message queue
sending queued message ["authenticate",{"jwt":"<redacted JWT>"},0]
received message active-bots
received message [ 0, { id: 1845696, username: 'lazerwalker' }, null ]
received message active_game
received message active_game
received message active_game
received message active_game
sending move 78858441 dd
attempting to flush message queue
sending message ["game/move",{"game_id":78858441,"move":"dd"},1]
received message [ 1, null, null ]

@GreenAsJade might be interesting to try to coordinate you being around and looking at logs if you’re willing, this feels like as clear a minimal reproduction case I could hope for.

wolfeystudios · September 12, 2025, 7:31pm

Have you tried connecting to the corresponding game with your client (game/connect with the game ID and bool of whether you want to join the chat) before sending the move?

marinakai · September 12, 2025, 8:02pm

Okay, huh, thanks. Per earlier in this thread, I had been trying to connect to the game (and suspected the issue was related to whether I was just connecting to the relevant game, or all my active games), but the issue (or at least a issue) is I was sending a malformed “game/connect” message ({ id: id} instead of { game_id: id }), which was silently failing with no error other than not immediately getting back a bunch of gamestate messages in response.

Things are currently working, both in my minimal test script and my actual codebase. Assuming they continue to work consistently (fingers crossed!), it sounds to me like a few issues are:

game/move depending on game/connect being an undocumented dependency
game/connect not sending an error to the client when given a malformed request (in addition to similar error-threading issues with game/move)

benjito · September 12, 2025, 8:09pm

Awesome it’s working!

I feel like this is a bug more than lack of documentation. Or at least, it’s confusing that it would work some of the time

+1 it feels like the error messages properly working will be the best form of documentation here, regardless of what the right relationship between connect and move