Forum clone

Having problems with the forum software, or suggestions for improvements

Forum clone

Postby dobrichev » Tue Jun 10, 2025 5:26 pm

After the constant crashes of this forum in recent weeks, I took some time to archive its content.

A read-only clone of the forum is temporarily available here.

The known non-cloned items include
  • Users with no posts.
  • User details (avatars, signatures, etc).
  • Poll topics.
  • View/download counters.
  • Date of the latest post editing.
  • Data for already broken posts/topics like the topic at the bottom of the General forum

For now, I have no plans for automated content synchronization, nor for placing the clone on a permanent site.
dobrichev
2016 Supporter
 
Posts: 1888
Joined: 24 May 2010

Re: Forum clone

Postby champagne » Tue Jun 10, 2025 5:48 pm

Hi mladen,
no skill on my side to do that, so thanks on behalf of the community
Best
champagne
2017 Supporter
 
Posts: 7915
Joined: 02 August 2007
Location: France Brittany

Re: Forum clone

Postby blue » Tue Jun 10, 2025 7:31 pm

Very comforting.
Many thanks.
blue
 
Posts: 1104
Joined: 11 March 2013

Re: Forum clone

Postby m_b_metcalf » Wed Jun 11, 2025 3:31 pm

Super!
User avatar
m_b_metcalf
2017 Supporter
 
Posts: 13693
Joined: 15 May 2006
Location: Berlin

Re: Forum clone

Postby coloin » Mon Jun 30, 2025 10:15 pm

Yes very good you have done this... especially as the hosting appears very inconsistent these days.....
Perhaps you can outline how you did it !!
how much storage space would it need to host it ... permanently ?
coloin
 
Posts: 2676
Joined: 05 May 2005
Location: Devon

Re: Forum clone

Postby dobrichev » Wed Jul 02, 2025 3:05 am

Hi coloin,

The required storage is small, maybe below 1 gigabyte.

About 3 years ago I attempted to restore the forum from the latest available backup from 2019.
The backup contained database, attached files, and some other unrelated stuff.
I found the source code for the same version of PhpBB. Now I doubt if it was the right version, but at least is was close.
I installed MariaDB and Nginx web server.
From the settings record in database I found two "mods" have been additionaly installed. One is SEO (Search Engines Optimization), which modifies navigation links from numerical identifiers to some text closer to the topic. The other is something related to payments, which I ignored.
Finally I found that my source code doesn't show latest topic in the forums list. I ignored this too after unsuccessful attempts to find the code for this patch.
SEO patching was painful - I found some source code which couldn't be installed automatically (likely not for the same version of PhpBB), and I did the patching manually, finding the right place for hundreds lines of extra code in dosens of files. I also had to manually write redirection configuration for the web server.
An additional change was the PHP code fixes to work around some unsupported pattern matching functions.

At this stage I got a semi-working clone of the forum on my home machine with data up to the 2019.

Several months later I attempted to read the forum and to convert the topics and posts to database records in a form expected by the forum software. I wrote transformation code and compared my results for old posts with the existing database records. After seeing the amount of differences, I gave up.

Recently, after the forum began its regular crashes, I did another attempt, this time less pedantic about data quality. The whole scan took about 1 day in a single thread. Many unrecoverable errors were found in my processing and after fixing them I did a second scan. The data was stored in some intermediate format in separate tables.
Finally, I manually performed a series of operations to reformat and merge data with the original database.

Why this forum crashes?

Initially I thought the site had been attacked by hackers and/or the administrators had significantly reduced the hardware.

Hours after I posted here the only public link to my forum clone, it was bitten by bots.
First days by Google (about one visit per hour from single IP address) and Microsoft (about one visit per 5 minutes from few similar IP addresses).
After a week or so, Microsoft lost intersest, but Amazon bots from Singapore started crawling more aggressively. At the moment I have about 120 visits per hour from various IP addresses and the number is growing.

If the same or similar bots cause crashes of the forum, a fix from administrators is needed. For example, Amazon bots clearly identify themselves as user agent "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot) Chrome/119.0.6045.214".
dobrichev
2016 Supporter
 
Posts: 1888
Joined: 24 May 2010

Postby 1to9only » Sat Jul 05, 2025 9:33 am

dobrichev wrote:Hours after I posted here the only public link to my forum clone, it was bitten by bots.

The 1st 50-60 users in a fresh phpbb install are mostly bots - deleting these may stop them!

Also a robots.txt in (I think) the htdocs folder may also block these bots.

To block all web crawlers from all content, robots.txt content:

Code: Select all
User-agent: *
Disallow: /
User avatar
1to9only
 
Posts: 4213
Joined: 04 April 2018

Postby dobrichev » Sat Jul 05, 2025 12:38 pm

Hi 1to9only,

robots.txt must be accessible and be in the root directory (and in any root of subdomains which isn't the case here).

In fact my clone has the same robots.txt file as this forum.

Now I realise that its effect is ruined by SEO mod which rewrites most of the navigation links so that they don't match the disallowed patterns.

I am changing the content of robots.txt according to your proposal. Let see the effect. Nevertheless the problem of this forum still holds.
dobrichev
2016 Supporter
 
Posts: 1888
Joined: 24 May 2010

Re: Forum clone

Postby dobrichev » Fri Jul 11, 2025 8:06 am

Filtering by robots.txt works.
About a day after the change, most of the robots disappeared.
dobrichev
2016 Supporter
 
Posts: 1888
Joined: 24 May 2010

Re: Forum clone

Postby Husker » Sun Nov 23, 2025 1:29 pm

Robots.txt is your friend (good friend), but not a be all, end all. Numerous bots a very poorly written, and crawl hard and do not honor or fully honor the robots.txt file. Also, be sure to log in as a robot yourself, to make sure that the data on the pages gets stripped down properly, and that assets you do not want to be downloaded over and over again, are not presented to the non-interactive bots. The phpbb data (style-sheets, iirc) will need a bit of tweaking. Also, dig into the problem bots. Most will show what they are obtaining, and what external clients they support. Often, some of the worst bots are doing NO GOOD at all, for your forum. Simply ban them at the web server, cause the entire site to go dark (to that naughty bot). Bots are great things, they DO attract external people, being indexed is being alive on the web.

Bots may not be the only issue, but they certainly can raise the level of activity to where other issues do appear. Almost all stability issues will boil down to 4 things (actually all the same issue). 1. lack of bandwidth for peaks. 2. lack of memory (for peaks). 3. Lack of disk throughput (possibly lack of disk in total) (during peaks). 4. Lack of CPU power. All 4 are the same thing. Resources starved. All (almost all) server setup and runtime issues can be solved by more of each of those 3 things. But for reality, resources are limited, and a intricate dance must be done to juggle the available resources. Things like larger caching on the DB will vastly reduce #3 (the disk throughput), but now there is significantly more stress on #2 (the memory). Using a 'thinner' quicker back-end web server as host may help #4 (lighter session process overhead, allowing more simultaneous sessions with the fixed amount of CPU), but this can expose other limits, like #1, #2, etc.

When a board is first setup, it may be tuned well, with (at the time), seemingly unlimited #1 to #4 resources. But the board grows. The board gets half a million posts, 10's or 100's of thousands of users. It starts getting lots of users online at once, first 10 at a time, then 100, then 1000, etc. As it grows, what used to be seemingly unlimited amounts of resources are now woefully lacking during these high traffic times. If the server is not upgraded, it WILL suffer brown outs, or even DDOS by 'real' users accessing the site in a manner that simply overwhelms the resources. Then you add bots into the picture, and everyone points their fingers at the bots.

This is the reality of running a public server. It is going to require occasional auditing by someone who understands the current setup, can work with the current access patterns, and can make choices which satisfy the current access patterns, while also 'projecting' what near term growth patterns will be, and can make the overall changes to the server to meet those requirements.

Everyone wants their site to grow. When it does, many may not be prepared to handle it.

H.
Husker: The 6'4" hobbit.
User avatar
Husker
 
Posts: 2
Joined: 07 October 2025
Location: Locked in semi rural US, dead center (drawn and X on the map, and I live where the X crosses).

Re: Forum clone

Postby coloin » Sat Mar 21, 2026 11:54 pm

Maybe time to update the clone ? TIA
coloin
 
Posts: 2676
Joined: 05 May 2005
Location: Devon

Re: Forum clone

Postby dobrichev » Wed Apr 08, 2026 1:17 pm

This unmaintained forum is dead.

Updating the clone is inefficient and I am not motivated to repeat the exercise.

Hosting a forum with a commercial hosting company seems to cost around $50 - $100 per year, unless there are other hidden costs.

I would help by providing existing data and knowledge if someone were to clone the forum with the prospect of us all starting to post only there.
dobrichev
2016 Supporter
 
Posts: 1888
Joined: 24 May 2010


Return to Forum questions and feedback