Fork me on GitHub
Subscribe 4

Ticket #495 (fixed bug)

Search can be tricked into examining and joining everything

  • Created: 2011-09-19 15:04:13
  • Reported by: toofishes
  • Assigned to: daris
  • Milestone: 1.4.8
  • Component: search
  • Priority: high

From our slow query log:

# Time: 110919  7:23:07
# User@Host: fluxbb[fluxbb] @ localhost []
# Query_time: 199.690448  Lock_time: 0.000093 Rows_sent: 16618253  Rows_examined: 33888557
SET timestamp=1316409787;
SELECT m.post_id, p.topic_id, t.last_post AS sort_by FROM search_words AS w INNER JOIN search_matches AS m ON m.word_id = w.id INNER JOIN posts AS p ON p.id=m.post_id INNER JOIN topics AS t ON t.id=p.topic_id LEFT JOIN forum_perms AS fp ON (fp.forum_id=t.forum_id AND fp.group_id=4) WHERE w.word LIKE '%%%' AND (fp.read_forum IS NULL OR fp.read_forum=1);

I was able to nearly reproduce this query with something such as the following URL: /search.php?action=search&keywords=********+not+foobar&show_as=topics&search=Submit

The result in the slow query log:

# Time: 110919 16:56:35
# User@Host: fluxbb[fluxbb] @ localhost []
# Query_time: 160.382064  Lock_time: 0.000097 Rows_sent: 16618333  Rows_examined: 33888904
SET timestamp=1316444195;
SELECT m.post_id, p.topic_id, t.last_post AS sort_by FROM search_words AS w INNER JOIN search_matches AS m ON m.word_id = w.id INNER JOIN posts AS p ON p.id=m.post_id INNER JOIN topics AS t ON t.id=p.topic_id LEFT JOIN forum_perms AS fp ON (fp.forum_id=t.forum_id AND fp.group_id=1) WHERE w.word LIKE '%%%%%%%%' AND (fp.read_forum IS NULL OR fp.read_forum=1);

This is pretty absurd. First thing to note is it really tanks MySQL performance as it creates 33 million rows here to do the search, and then proceeds to send back 16 million of them to FluxBB. The process never even gets them because PHP times out under our current settings.

We have ~122,500 topics and ~966,000 posts.

History

Reines 2011-10-13 12:32:57

  • Milestone set to 1.4.8.

Franz 2011-12-01 20:10:34

  • Owner set to Franz.

Interesting catch, I will try to come up with an intelligent solution to this problem.

daris 2011-12-10 15:21:30

  • Uploaded patch search.patch. (view)

toofishes, can you provide more examples when it might occurs?

The attached patch should exclude words that have only asterisks (***)

daris 2011-12-10 15:53:14

  • Uploaded patch search2.patch. (view)

Ignore above, this patch should be better, beacuse it ignores all * and % when checking whether current word is valid.

So for example this:
**d***d***+not+foobar
will give results for
not+foobar

Franz 2011-12-12 22:33:16

  • Owner changed from Franz to daris.

Your turn, then.

Would searches for things like "for*" still return posts with "forum" and "forums", though?

daris 2011-12-13 09:52:13

Well, you provided a good example smile I have to move the added code by the patch some lines below because "for*" after stripping * gives a "for" which is a badword (see lang/English/badwords.txt) smile

The word is valid whether is at least 3 characters long after stripping * and %

daris 2011-12-13 09:56:13

Commit 3db6042 to fluxbb fluxbb-1.4

#495 Do not allow to search words that after stripping * and % are shorter than PUN_SEARCH_MIN_WORD (which is 3 by default)

Franz 2011-12-13 10:32:51

Smart solution smile

daris 2011-12-13 10:34:35

  • Status changed from open to fixed.

Thanks smile