Forums

Unfortunately no one can be told what FluxBB is - you have to see it for yourself.

You are not logged in.

#1 2010-05-20 12:28:24

Pierre
Member
From: Germany/Bonn
Registered: 2010-05-20
Posts: 50
Website

Improving performance for large topics

Hi,

if a topic has a lot of posts (like 20 thousand and more) fluxbb needs quite a long time to display the last pages. The reason is the usage of mysql's LIMIT with an offset. This is extremely inefficient. For example if I have aa thread with 40k posts fluxbb will do something like "SELECT *... LIMIT 40000, 25". Even if you only want to display the last 25 posts mysql will have to load the 40k previous ones.

The solution to this is quite simple: Just manually index your posts. For my forum I added a new column called count to each post entry which represents its position within the thread. First posting has 1, second 2, third 3 and so on. (don't forget to recount all posts if you delete a post from within a thread) You might need to create an mysql index for this column then.

Using this data structure you can replace the LIMIT usage with something like "SELECT * ... WHERE count BETWEEN 40000 AND 40025". This will result in a huge performance improvement and less load/ram usage. Using this method it doesn't really matter how many posts you have within one thread.

If wanted I could point you to an example thread; I also have the source code (but its a self written board not fluxbb; so might not be that helpful)

Btw: This method can be used at other places where LIMIT with an offset is used. (but afaik you can only display in ascending or descending order then)

Greetings,

Pierre

Offline

#2 2010-05-20 14:48:04

Smartys
Former Developer
Registered: 2008-04-27
Posts: 3,139
Website

Re: Improving performance for large topics

There are much better ways to improve performance without needing to maintain an index (which carries a performance penalty of its own and isn't as easy to do in a safe way as you seem to think it is).

In this case, you can split out the query in viewtopic.php into two queries: one that gets post IDs and one that joins those post IDs. That way, MySQL handles 40k rows that only consist of integers: once it filters, you then perform the joins.

Last edited by Smartys (2010-05-20 14:48:39)

Offline

#3 2010-05-20 15:49:06

Franz
Lead developer
From: Germany
Registered: 2008-05-13
Posts: 6,684
Website

Re: Improving performance for large topics

But that would only be a performance improvements on big topics and on pages other than the first, right, Smartys?


fluxbb.de | develoPHP

"As code is more often read than written it's really important to write clean code."

Offline

#4 2010-05-20 16:42:24

Smartys
Former Developer
Registered: 2008-04-27
Posts: 3,139
Website

Re: Improving performance for large topics

The performance boost is most noticeable on large topics, yes. The idea originally came from a topic on the phpBB forums about how to cope with scaling (they had a similar suggestion). It's not something I would recommend unless the forum is experiencing issues with large topics.

Offline

#5 2010-05-20 20:50:27

Jérémie
Member
From: France
Registered: 2008-04-30
Posts: 629
Website

Re: Improving performance for large topics

Has anyone done some testing about what's "large" in this particular issue?

Offline

#6 2010-05-20 22:00:47

Pierre
Member
From: Germany/Bonn
Registered: 2010-05-20
Posts: 50
Website

Re: Improving performance for large topics

Smartys wrote:

There are much better ways to improve performance without needing to maintain an index (which carries a performance penalty of its own and isn't as easy to do in a safe way as you seem to think it is).

There is only a very small overhead if you add a posting (you simply have to add 1 to the last used count entry. If you delete posts you have to more or less renumber all...but this is still very fast. All in all posts are far more often read than deleted.

There should be no performance impact on reading posts using this "index". In fact all topics with more than one page should benefit from this.

The implementation is really easy and robust. I am using it for several years now.

Smartys wrote:

In this case, you can split out the query in viewtopic.php into two queries: one that gets post IDs and one that joins those post IDs. That way, MySQL handles 40k rows that only consist of integers: once it filters, you then perform the joins.

I am not sure if I got that. Using two queries should add quite a lot of overhead. How do you got those post IDs then if not using "LIMIT a,b"? And for doing that join you probably need to create a temporary table. But most likely I just misunderstood your idea. :-)


Btw: Here is an example. This topic just has 17k posts...didn't find a bigger one for now; but the effect should still be noticeable. I have setup a test fluxbb here: https://test-forum.archlinux.de/viewtopic.php?id=4310 Switch the pages of that thread; especially the last ones or somewhere in the middle. Now do the same here:  https://forum.archlinux.de/?id=20;page= … hread=4310 (it's not completely fair as this is not fluxbb...but at least one should notice that load time is always the same no matter if it is page 1 or 600)

Greetings,

Pierre

Offline

#7 2010-05-20 22:06:22

Franz
Lead developer
From: Germany
Registered: 2008-05-13
Posts: 6,684
Website

Re: Improving performance for large topics

Pierre wrote:
Smartys wrote:

In this case, you can split out the query in viewtopic.php into two queries: one that gets post IDs and one that joins those post IDs. That way, MySQL handles 40k rows that only consist of integers: once it filters, you then perform the joins.

I am not sure if I got that. Using two queries should add quite a lot of overhead. How do you got those post IDs then if not using "LIMIT a,b"? And for doing that join you probably need to create a temporary table. But most likely I just misunderstood your idea. :-)

Yes, you do use LIMIT, but only on one column (the ID column), which makes loading thousands of rows less painful, as only one column is fetched. Then these selected IDs are used to filter out rows in the big join query instead of the LIMIT clause.

And word association threads - seriously... wink


fluxbb.de | develoPHP

"As code is more often read than written it's really important to write clean code."

Offline

#8 2010-05-21 03:18:58

Smartys
Former Developer
Registered: 2008-04-27
Posts: 3,139
Website

Re: Improving performance for large topics

There is only a very small overhead if you add a posting (you simply have to add 1 to the last used count entry. If you delete posts you have to more or less renumber all...but this is still very fast. All in all posts are far more often read than deleted.

Unfortunately, you fail to take concurrency into account.

Lets take a simple example:
- Person A creates a topic.
- Person B and Person C reply to that topic at the same instant.

How is the next index calculated? See http://www.iknowkungfoo.com/blog/index. … ECT-MAX-id for why the idea you're probably thinking about is bad. If you come up with another idea, let me know as well: I'd be interested to see whether it addresses the concurrency issues.

I am not sure if I got that. Using two queries should add quite a lot of overhead.

Not at all. Two well-architected queries can easily be faster than one poorly written query.

How do you got those post IDs then if not using "LIMIT a,b"?

As Franz said, you DO use LIMIT. The slowness has less to do with looking at thousands of rows and more to do with the number of joins done on those rows. The query in question joins several tables: by removing those joins and only focusing on post IDs, you reduce the overheard dramatically.

Offline

#9 2010-05-21 05:29:43

Pierre
Member
From: Germany/Bonn
Registered: 2010-05-20
Posts: 50
Website

Re: Improving performance for large topics

To solve the concurrency issue you have to put the operation of reading the old counter value and setting the new one into one transaction. You can either use db-based transactions like innodb provides, lock the table during that operation or try to use sub queries. (I cannot remember if mysql supports the latter in insert statements)

I went with table locking here. This is the code for inserting a post:

     $this->DB->execute('LOCK TABLES posts WRITE');

    $stm = $this->DB->prepare
        ('
        SELECT
            COUNT(*)
        FROM
            posts
        WHERE
            threadid = ?'
        );
    $stm->bindInteger($this->thread);
    $counter = $stm->getColumn();
    $stm->close();

    $stm = $this->DB->prepare
        ('
        INSERT INTO
            posts
        SET
            threadid = ?,
            userid = ?,
            username = ?,
            text = ?,
            dat = ?,
            counter = ?'
        );
    $stm->bindInteger($this->thread);
    $stm->bindInteger($userid);
    $stm->bindString($username);
    $stm->bindString($this->text);
    $stm->bindInteger($this->Input->getTime());

    $stm->bindInteger($counter);

    $stm->execute();
    $stm->close();

     $this->DB->execute('UNLOCK TABLES');

I still wonder why I didn't use "INSERT INTO ... SET counter = (SELECT COUNT(*)...)...".

That COUNT(*) operation is extremely fast if you have an index on that row, so this query shouldn't block long. So you would need a lot of postings per second until this will get a real problem.


As for your two queries approach: I now get it :-). I wonder if that could be achieved within one query or the optimizer be hinted to see that too.

Offline

#10 2010-05-21 05:52:10

Smartys
Former Developer
Registered: 2008-04-27
Posts: 3,139
Website

Re: Improving performance for large topics

- A full table lock is inefficient and while technically correct will limit concurrency.
- Transactions suffer from similar problems (if you run with anything less than full serializable isolation, you run into situations where the data can become inconsistent again)
- Subqueries won't work over all DBMSes (at least in MySQL, you can't reference the table you're updating in a subquery)

Also, COUNT(*) is not fast (although the speed is database engine dependent). I wouldn't call it slow either, but even with an index on the WHERE clause you likely need to do a scan of some kind to get a count.

Last edited by Smartys (2010-05-21 05:54:06)

Offline

#11 2010-05-21 11:39:58

sirena
Member
From: AU
Registered: 2008-05-10
Posts: 172

Re: Improving performance for large topics

I wonder how VB seems to manage this very well.

An epic example:

http://www.militaryphotos.net/forums/sh … 9/page2219

Offline

#12 2010-05-21 11:53:54

Reines
Administrator
From: Scotland
Registered: 2008-05-11
Posts: 3,197
Website

Re: Improving performance for large topics

vBulletin does what Smarty's suggested, fetching just the post IDs using LIMIT X,Y, then doing a second query to fetch the actual contents.

It might be worth trying to perform some tests to see how the different approaches compare, I imagine splitting it into 2 queries like that will benefit topics with huge numbers of posts, but be worse for topics with a small number.

Anyway I'm going to move this into the feature requests board, since it is an enhancement rather than a bug. It is definitely something worth considering though.

Offline

#13 2010-05-21 19:45:46

Pierre
Member
From: Germany/Bonn
Registered: 2010-05-20
Posts: 50
Website

Re: Improving performance for large topics

This is getting an interesting discussion. In general we will have a trade of between read and write performance. (That is with all indexes you use) The difficulty is now to figure out if more work during write is worth the increase performance you get when reading.

The "split query" approach seems also interesting. It's easier to implement. The downside is probably a little overhead for heads with only "few" posts. But its probably something we should try first.

The implementation details would be interesting though. Fetching all the ids from the sql server and then pushing them back to do the join might not be the best thing. Maybe you just create a temporary table with those ids you get from the limit call on the posts ids and then join that with the post table.

Offline

#14 2010-05-21 19:56:39

Reines
Administrator
From: Scotland
Registered: 2008-05-11
Posts: 3,197
Website

Re: Improving performance for large topics

Getting into using temporary tables/views is a bit risky since it needs to be as portable as possible, I'm not sure if SQLite2 even supports temporary tables?

Offline

#15 2010-05-21 20:04:11

Pierre
Member
From: Germany/Bonn
Registered: 2010-05-20
Posts: 50
Website

Re: Improving performance for large topics

Hmm, that's the downside of supporting multiple db backends then. :-) I guess sqlite is way to slow for such big boards that would run into this problem here anyway.

Offline

#16 2010-05-21 21:29:39

Paul
Developer
From: Wales, UK
Registered: 2008-04-27
Posts: 1,653

Re: Improving performance for large topics

Would it be possible to use more than one method. The default for normal topics and one for large topics based on the value of num_replies in the topics table.


The only thing worse than finding a bug is knowing I created it in the first place.

Offline

#17 2010-05-21 21:34:51

Reines
Administrator
From: Scotland
Registered: 2008-05-11
Posts: 3,197
Website

Re: Improving performance for large topics

In theory yeah, if we can decide at what point it becomes worth using the other method.

Offline

#18 2010-05-21 23:21:07

Jérémie
Member
From: France
Registered: 2008-04-30
Posts: 629
Website

Re: Improving performance for large topics

Make it an option. Even one in the CFG file if you don't want to handle the UI issue and don't want to overwhelm the admin. But it would need some guideline as to when the admin need to make the switch.

Offline

#19 2010-05-22 04:38:46

Smartys
Former Developer
Registered: 2008-04-27
Posts: 3,139
Website

Re: Improving performance for large topics

Pierre: You seem to still be missing something. The database server is sorting, say, 40000 posts and only returning the relevant 20 or so post IDs. There's no need for a temporary table or anything like that. This is a solution I've been using on PunBB-Hosting for years: it works.

Offline

#20 2010-05-22 05:00:49

artoodetoo
Member
From: Far-Far-Away
Registered: 2008-05-11
Posts: 223

Re: Improving performance for large topics

Smartys wrote:

In this case, you can split out the query in viewtopic.php into two queries: one that gets post IDs and one that joins those post IDs. That way, MySQL handles 40k rows that only consist of integers: once it filters, you then perform the joins.

In fact you can use one query with subquery when subquery returns only ID of the first row and the parent query returns N joined values by ">=" condition like:

SELECT ...
FROM posts AS p 
JOIN ...
JOIN ...
JOIN ...
(SELECT p2.id FROM posts AS p2 WHERE p2.topic_id=:T ORDER BY p2.id LIMIT :M,1) as sub 
WHERE (p.topic_id=:T) AND (p.id>=sub.id)
ORDER BY p.id 
LIMIT :N

here :T is topicId, :M is first-post-on-page number and :N is number of posts on page

For more effect you should define the key (`topic_id`,`id`) on posts instead of (`topic_id`)

Last edited by artoodetoo (2010-05-22 05:01:57)


I'm not a fan of FluxBB way anymore.

Offline

#21 2010-05-22 05:08:13

Smartys
Former Developer
Registered: 2008-04-27
Posts: 3,139
Website

Re: Improving performance for large topics

artoodetoo: That really doesn't help as much. Think about it: for the first post on the first page, that query will have the same performance as the current query (and in fact, will be slightly worse since it has to handle the subquery as well).

Offline

#22 2010-05-22 05:11:10

artoodetoo
Member
From: Far-Far-Away
Registered: 2008-05-11
Posts: 223

Re: Improving performance for large topics

MySql EXPLAIN disagree with you )


I'm not a fan of FluxBB way anymore.

Offline

#23 2010-05-22 05:13:52

Smartys
Former Developer
Registered: 2008-04-27
Posts: 3,139
Website

Re: Improving performance for large topics

Give me some real example queries. I'll run them on large topics on PunBB-Hosting and give you some benchmarks.

Offline

#24 2010-05-22 05:14:00

artoodetoo
Member
From: Far-Far-Away
Registered: 2008-05-11
Posts: 223

Re: Improving performance for large topics

First page is unique case of course.


I'm not a fan of FluxBB way anymore.

Offline

#25 2010-05-22 05:21:53

Smartys
Former Developer
Registered: 2008-04-27
Posts: 3,139
Website

Re: Improving performance for large topics

Correct. But the performance is best on the last page and degrades from then on. The suggestion I made works consistently well throughout (and I believe at no point does your solution perform better than mine).

Last edited by Smartys (2010-05-22 05:22:14)

Offline

Board footer

Powered by FluxBB