Ticket #495 (fixed bug)
Search can be tricked into examining and joining everything
- Created: 2011-09-19 15:04:13
- Reported by: toofishes
- Assigned to: daris
- Milestone: 1.4.8
- Component: search
- Priority: high
From our slow query log:
# Time: 110919 7:23:07
# User@Host: fluxbb[fluxbb] @ localhost []
# Query_time: 199.690448 Lock_time: 0.000093 Rows_sent: 16618253 Rows_examined: 33888557
SET timestamp=1316409787;
SELECT m.post_id, p.topic_id, t.last_post AS sort_by FROM search_words AS w INNER JOIN search_matches AS m ON m.word_id = w.id INNER JOIN posts AS p ON p.id=m.post_id INNER JOIN topics AS t ON t.id=p.topic_id LEFT JOIN forum_perms AS fp ON (fp.forum_id=t.forum_id AND fp.group_id=4) WHERE w.word LIKE '%%%' AND (fp.read_forum IS NULL OR fp.read_forum=1);
I was able to nearly reproduce this query with something such as the following URL: /search.php?action=search&keywords=********+not+foobar&show_as=topics&search=Submit
The result in the slow query log:
# Time: 110919 16:56:35
# User@Host: fluxbb[fluxbb] @ localhost []
# Query_time: 160.382064 Lock_time: 0.000097 Rows_sent: 16618333 Rows_examined: 33888904
SET timestamp=1316444195;
SELECT m.post_id, p.topic_id, t.last_post AS sort_by FROM search_words AS w INNER JOIN search_matches AS m ON m.word_id = w.id INNER JOIN posts AS p ON p.id=m.post_id INNER JOIN topics AS t ON t.id=p.topic_id LEFT JOIN forum_perms AS fp ON (fp.forum_id=t.forum_id AND fp.group_id=1) WHERE w.word LIKE '%%%%%%%%' AND (fp.read_forum IS NULL OR fp.read_forum=1);
This is pretty absurd. First thing to note is it really tanks MySQL performance as it creates 33 million rows here to do the search, and then proceeds to send back 16 million of them to FluxBB. The process never even gets them because PHP times out under our current settings.
We have ~122,500 topics and ~966,000 posts.
History
Franz 2011-12-01 20:10:34

- Owner set to Franz.
Interesting catch, I will try to come up with an intelligent solution to this problem.
daris 2011-12-10 15:21:30

- Uploaded patch search.patch. (view)
toofishes, can you provide more examples when it might occurs?
The attached patch should exclude words that have only asterisks (***)
daris 2011-12-10 15:53:14

- Uploaded patch search2.patch. (view)
Ignore above, this patch should be better, beacuse it ignores all * and % when checking whether current word is valid.
So for example this:
**d***d***+not+foobar
will give results for
not+foobar
Franz 2011-12-12 22:33:16

- Owner changed from Franz to daris.
Your turn, then.
Would searches for things like "for*" still return posts with "forum" and "forums", though?
daris 2011-12-13 09:52:13

Well, you provided a good example I have to move the added code by the patch some lines below because "for*" after stripping * gives a "for" which is a badword (see lang/English/badwords.txt)
The word is valid whether is at least 3 characters long after stripping * and %