by Husker » Sun Nov 23, 2025 1:29 pm
Robots.txt is your friend (good friend), but not a be all, end all. Numerous bots a very poorly written, and crawl hard and do not honor or fully honor the robots.txt file. Also, be sure to log in as a robot yourself, to make sure that the data on the pages gets stripped down properly, and that assets you do not want to be downloaded over and over again, are not presented to the non-interactive bots. The phpbb data (style-sheets, iirc) will need a bit of tweaking. Also, dig into the problem bots. Most will show what they are obtaining, and what external clients they support. Often, some of the worst bots are doing NO GOOD at all, for your forum. Simply ban them at the web server, cause the entire site to go dark (to that naughty bot). Bots are great things, they DO attract external people, being indexed is being alive on the web.
Bots may not be the only issue, but they certainly can raise the level of activity to where other issues do appear. Almost all stability issues will boil down to 4 things (actually all the same issue). 1. lack of bandwidth for peaks. 2. lack of memory (for peaks). 3. Lack of disk throughput (possibly lack of disk in total) (during peaks). 4. Lack of CPU power. All 4 are the same thing. Resources starved. All (almost all) server setup and runtime issues can be solved by more of each of those 3 things. But for reality, resources are limited, and a intricate dance must be done to juggle the available resources. Things like larger caching on the DB will vastly reduce #3 (the disk throughput), but now there is significantly more stress on #2 (the memory). Using a 'thinner' quicker back-end web server as host may help #4 (lighter session process overhead, allowing more simultaneous sessions with the fixed amount of CPU), but this can expose other limits, like #1, #2, etc.
When a board is first setup, it may be tuned well, with (at the time), seemingly unlimited #1 to #4 resources. But the board grows. The board gets half a million posts, 10's or 100's of thousands of users. It starts getting lots of users online at once, first 10 at a time, then 100, then 1000, etc. As it grows, what used to be seemingly unlimited amounts of resources are now woefully lacking during these high traffic times. If the server is not upgraded, it WILL suffer brown outs, or even DDOS by 'real' users accessing the site in a manner that simply overwhelms the resources. Then you add bots into the picture, and everyone points their fingers at the bots.
This is the reality of running a public server. It is going to require occasional auditing by someone who understands the current setup, can work with the current access patterns, and can make choices which satisfy the current access patterns, while also 'projecting' what near term growth patterns will be, and can make the overall changes to the server to meet those requirements.
Everyone wants their site to grow. When it does, many may not be prepared to handle it.
H.
Husker: The 6'4" hobbit.