As many of you know, posting can be very slow on Cemetech, while everything else completes in a very reasonable amount of time, helping us have fast page loads. Wherein lies the culprit? For a long time, I have suspected something with the search database getting updated, and comicIDIOT and I discussed a good way to check. I logged the time for the add_search_words() call to complete for a few recent posts:


Code:
Add search: 13.91 = 1319884255.98 to 1319884269.89<br />
Add search: 1.62 = 1319884424.22 to 1319884425.84<br />
Add search: 4.18 = 1319884488.98 to 1319884493.16<br />
Add search: 2.18 = 1319885458.05 to 1319885460.23<br />
Add search: 2.05 = 1319885958.26 to 1319885960.31<br />
Add search: 6.42 = 1319887679.05 to 1319887685.47<br />
Add search: 11.57 = 1319891898.01 to 1319891909.59<br />


As you can see, unacceptably high numbers. I'm debating a bunch of possible solutions, including marking some of the words with frequencies around 5000-10000 and higher as common, non-indexable words (which would more than halve the size of the word to post relation table) or turning off some portions of the search. For now, I'll delve deeper into that call and see what I find. Any suggestions?
I can't make any useful suggestions, this is out of my knowledge bank, but I would like to say that my average posting waiting time lately has been around 30 seconds per post.
Perhaps you could set it up to: use another database and/or index in the background, that way the times won't actually bother anything. How do you currently save words to search for? Could there maybe be a problem with topic renaming, post deleting, etc?
Slow queries can come from a number of places. It may be worth checking/repairing/optimising the tables to see if that helps for starters, as I have had issues where extremely slow queries were caused by corrupt indices.

I have found that the MySQL query optimiser seems to go a bit crazy on large tables (picking fast indices on medium-sized tables on my developer machine and picking slow indices or no indices at all on the very large tables on the production site). It might be worth checking what the queries are doing (prefix the query with EXPLAIN to see relevant information) to verify that the optimiser is doing a good job and applying a USE INDEX hint if it isn't (or adding indices if they are missing). I turned a >6 second query to <0.2 seconds one merely by adding USE INDEX (PRIMARY) to a SELECT query and significantly reduced the load on the database in the process, but you will need to be able to identify the bottlenecks first.

You may be able to access a slow queries log from your host if there is a history of problems. If not, periodically querying SHOW PROCESSLIST will show which queries are currently being processed (along with a time); you may be able to identify particularly problematic queries this way.
There is a slow posting time? I have no noticed... it is been rather fast for me.
Qazz: It's been a problem for years.
Catherine: There are two tables. One maps words to numbers, giving each unique word a unique ID number. Some words are marked as "common" and are therefore not indexed. The second table contains indexed word-to-post mappings. Each word can appear in many posts, and each post can match many words. The second table is two columns, both numbers.
Benryves: Some excellent suggestions; I'll get right on them. Interestingly, I did check/analyze/repair/optimize, just for starters, and I then pulled the new posting time log. Can you guess where I performed those steps? I don't understand the difference, though, or rather if this was the fix, what exactly it did.


Code:
Add search: 13.91 = 1319884255.98 to 1319884269.89<br />
Add search: 1.62 = 1319884424.22 to 1319884425.84<br />
Add search: 4.18 = 1319884488.98 to 1319884493.16<br />
Add search: 2.18 = 1319885458.05 to 1319885460.23<br />
Add search: 2.05 = 1319885958.26 to 1319885960.31<br />
Add search: 6.42 = 1319887679.05 to 1319887685.47<br />
Add search: 11.57 = 1319891898.01 to 1319891909.59<br />
Add search: 35.6903560162 = 1319892170.44 to 1319892206.13<br />
Add search: 18.3354690075 = 1319892801.24 to 1319892819.58<br />
Add search: 1.07836389542 = 1319892927.31 to 1319892928.39<br />
Add search: 4.06772184372 = 1319893139.52 to 1319893143.59<br />
Add search: 21.9721899033 = 1319893450.72 to 1319893472.69<br />
Add search: 4.72761797905 = 1319893576.73 to 1319893581.45<br />
Add search: 2.96879386902 = 1319895844.53 to 1319895847.5<br />
Add search: 4.97087001801 = 1319895842.62 to 1319895847.59<br />
Add search: 26.8631720543 = 1319895945.97 to 1319895972.83<br />
Add search: 0.0943388938904 = 1319896060.34 to 1319896060.44<br />
Add search: 0.494699954987 = 1319896099.03 to 1319896099.53<br />
Add search: 0.23473405838 = 1319896218.08 to 1319896218.31<br />
Add search: 0.12420797348 = 1319896591.84 to 1319896591.97<br />


Edit: Bah, I spoke too soon. This very post:


Code:

Add search: 6.05574798584 = 1319897038.57 to 1319897044.62<br />
oh, it has been a problem for years? ah, I thought that it was a recent thing, then. However, my posts usually take 2 or 5 seconds to go through, never 30 D:
qazz42 wrote:
oh, it has been a problem for years? ah, I thought that it was a recent thing, then. However, my posts usually take 2 or 5 seconds to go through, never 30 D:
The worst time that I had today was a 70-second post. :/ Anyway, I have added more debugging logging to the add_search_words() function. It looks like it first merges synonyms and "stopwords", then figures out which words it will be inserting, removing the common words, does the insertion, then checks to see what new words should be marked as common (based on the criteria that they appear in at least 40% of the posts, which I think is way too high).
*bump* I collected a few hundred posts' worth of statistics over the past two days or so; the following graph summarizes the numbers. The x-axis is the time for a given post to complete the add_search_words() function. The y-axis indicate the completion time of the four difference phases that make up each add_search_words() call. These are:

(1) Phase 1: Open stopword/synonym files, clean title and body of post using those words. Collect words to be inserted into database in arrays.
(2) Phase 2: Perform one SQL query to fetch out IDs of all words already in database, then one SQL query per additional words not yet with an ID number in the database to insert them.
(3) Phase 3: Insert (this_post_id, word_id) into the word-to-post match table for all words in the post marked as indexable.
(4) Phase 4: Remove "common words" from the database, where a common word appears in 40% or more of posts AND is also in this current post.



I find it interesting to note that Phases 2, 3, and 4 all take a long time if the complete process is long; none of them seems to take up much more than the others, which tells me that any of the queries being performed for these long posts is itself slow. Any thoughts?
Does that graph take edits into account?
Well, you could always give up on the crappy built in searching and replace it with a custom google search (or whatever other engine floats your boat). Then it's indexed asynchronously and gives better results anyway.
comicIDIOT: It does not.
Kllrnohj: Is there a way to properly wrap it in the template?
KermMartian wrote:
Kllrnohj: Is there a way to properly wrap it in the template?


I think this still works: http://www.google.com/cse/
Kllrnohj wrote:
KermMartian wrote:
Kllrnohj: Is there a way to properly wrap it in the template?


I think this still works: http://www.google.com/cse/
But... ads and stuff. Sad Plus with Google cutting every fringe project, I wouldn't be surprised if this went away. I did do some customization to see how it would look, though...
KermMartian wrote:
But... ads and stuff. Sad


?

Quote:
Plus with Google cutting every fringe project, I wouldn't be surprised if this went away.


I wouldn't really call search a fringe project :p But even if it is cut there are alternatives that are basically the same thing. I think Bing has one, others probably do as well.
*bump*

Could we re-touch on this topic? The past few days have been extremely slow, both for viewing and actually posting. Are you still keeping track of how long it takes to post, and what sections are taking the longest? If so, and if it is still the search thing, then something needs to be done about it. I personally like having Search be part of Cemetech and not having it rely on Google or Bing, but if it causes reading and posting times to go above 1 minute, which seems to be the case from me and a few others, then it can't be helped, imo.
There aren't that many outliers currently on the search database time logging, which makes me worry we might be under a low-scale accidental or intentional DDoS from people who might not want Cemetech to be swift and successful.


Code:
Add search: 16.6273469925;, 0.0607118606567, 1.05928015709, 11.0689220428, 4.43825793266
Add search: 1.56841897964;, 0.0163629055023, 0.173053979874, 0.466546058655, 0.912291049957
Add search: 1.5907459259;, 0.0166640281677, 0.179395914078, 0.7262840271, 0.668272018433
Add search: 2.3115439415;, 0.0162379741669, 0.565196037292, 0.463271141052, 1.2666759491
Add search: 5.79047989845;, 0.0182249546051, 3.07967591286, 1.41603207588, 1.27638602257
Add search: 23.9397928715;, 0.0222969055176, 1.40285015106, 11.8665139675, 10.6478819847
Add search: 0.146900892258;, 0.0224261283875, 0.00196385383606, 0.00279998779297, 0.119535207748
Add search: 2.70527291298;, 0.0322170257568, 0.207705020905, 0.380931854248, 2.08429908752
Add search: 2.87383294106;, 0.016560792923, 0.653027057648, 1.87202906609, 0.332067966461
Add search: 4.51063394547;, 0.075432062149, 2.85807585716, 0.799530982971, 0.777473211288
Add search: 32.149353981;, 0.0159940719604, 5.64302897453, 18.7406380177, 7.74956488609
Add search: 2.10935711861;, 0.0169479846954, 0.794954061508, 0.669199943542, 0.628098964691
Add search: 4.45927906036;, 0.0163149833679, 0.674462080002, 2.27482199669, 1.49355697632
Add search: 3.60428619385;, 0.0476849079132, 0.41090798378, 1.69240999222, 1.45315408707
Add search: 5.15256905556;, 0.0234439373016, 1.29425907135, 1.56875896454, 2.26594805717
Add search: 4.96481990814;, 0.0239100456238, 1.26920199394, 0.462732076645, 3.20872879028
Add search: 0.336380958557;, 0.0522658824921, 0.0490651130676, 0.103234052658, 0.131638050079
Add search: 4.02230286598;, 0.0304400920868, 1.37377977371, 2.37216520309, 0.245725870132
Add search: 25.3548309803;, 0.0541450977325, 3.47763490677, 17.0106911659, 4.8121368885
Add search: 11.5803258419;, 0.046355009079, 2.0904750824, 5.83701491356, 3.60634207726
Add search: 8.57902216911;, 0.164416074753, 3.07942891121, 1.73133897781, 3.60366511345
Add search: 1.85993909836;, 0.071033000946, 0.424069166183, 0.648632049561, 0.715969800949
Add search: 2.02557206154;, 0.138990163803, 0.772533893585, 0.806777954102, 0.307106018066
Add search: 4.48441505432;, 0.0375709533691, 0.165943145752, 0.342214822769, 3.93852305412
Add search: 5.70149302483;, 0.0959420204163, 0.240934848785, 0.658316135406, 4.70615792274
Add search: 4.70806193352;, 0.0203428268433, 1.22002100945, 0.403457164764, 3.06407189369
Add search: 1.54163599014;, 0.0802478790283, 0.156568050385, 0.242254018784, 1.06242108345
Is there any information to get that info from your host?
comicIDIOT wrote:
Is there any information to get that info from your host?
There is, but there's a bit more sleuthing that I can do first on my end. Are we continuing to see these problems?
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement