You could notice that, since December, WatchBot started to have connectivity problems. The most visible effect is that a lot of games are splitted into many separate parts in spite of the fact that no adjournment by players took place.
The problem
The effect seems to be related to some network problem - from my Linode even bare telnet connection to FICS freezes after some time, as if it was dropped by some firewall or router on the way. Linode support claims they are in no way filtering or tampering the traffic and - as they used to be very professional - I tend to believe them. Unfortunately I am not able to reach any comments from the FICS staff but the problem most likely lies somewhere in between.
While running the bot from home I do not observe the problem, but my home computer can not be permanently turned on. I tried changing IP address of my VPS but it in no way helped (it proves that there is no IP-based filtering but nothing more). I am considering moving my VPS to another datacenter, but this is a bigger task for which I have no time currently.
I am curious whether anybody observes similar problems (FICS connection freezing, even during play, see below).
Solution attempts
To mitigate the effect, I rewrote parts of WatchBot to operate a little bit differently. Instead of using single FICS connection I spawn a few guest connections and use them to observe the games, reconnecting those which seem to be frozen.
The first visible effect of the change is that WatchBot no longer observes your games, instead some Guests appear as observers. WatchBot still handles usual commands, though, notifies about finished games etc.
Another consequence is that games should not be split into parts anymore, instead, after each connection freeze, they should be picked up by another worker. Unfortunately there is a small (15 seconds at the moment) window when the game is not observed, so some whispers or - even worse - moves can be lost (at worst leading to invalid PGN and unplayable games here or there).
The bad thing is that if FICS locks guest logins - as it happens from time to time - WatchBot won't be able to observe anything.
Future ideas
While lost comments are unrecoverable, lost moves can be downloaded from FICS and filled in. It will make already complicated code even more complicated, but I consider working on it sooner or later. I also think about introducing some redundancy (so, for example, every game is watched by two different guests).
Moving WatchBot (or, rather, whole mekk.waw.pl) to another datacenter is also an option but as I am not 100% sure whether it is to help, I hesitate to invest my time.
Finally, I consider working on concatenating old split games, but can't promise doing it soon.
Question
As I performed the changes hastily and without deep testing which I used to undertake, bugs are possible and likely. Let me know if WatchBot misbehaves.
Also, I would be very grateful for any feedback regarding FICS connectivity state. The test is very simple:
- open
telnet
(Linux or Windows) and let it connect tofreechess.org
, port 5000 - log in as guest when prompted
- type
set gin 1
(not crucial but seems to accelerate the problem) - leave it running,
- check the state of the things after an hour.
If, after an hour, you find FICS logout ascii art, then your connection works properly. But if the session is not logged out, no new messages appear and nothing happens when you try entering new commands, you observe the same problem which I face. In both cases let me know how it worked and which IP you use (send me email or direct FICS message if you don't want to publish your IP).
I would be particularly grateful if somebody could run this test from Linode VPS, or another solid VPS (SliceHost for example).