Community Help: Lussumo.com Connectivity Issues [RESOLVED]

Today you may have begun to notice database errors when attempting to load any of my websites. Particularly lussumo.com/community and markosullivan.ca/blog have been showing intermittent errors.
These errors have come at a particularly inopportune time (is there ever a good time?) because I am extremely busy with a new contract, development of the Garden framework, Vanilla 2, and I also manage to have a life in there somewhere (sometimes
.
When I began to notice the slow page-loading times on my server and then the errors that followed, I contacted my hosting company to find out what was going wrong. I am hosted at rackspace.com, and they are well known for their fanatical support. True to form, they got back to me quickly with a diagnosis of the problem:
Good Afternoon, I have made some adjustments to the my.cnf configuration file in /etc skip-bdb query_cache_size=64M query_cache_limit=12M interactive_timeout=300 wait_timeout=300 tmp_table_size=128M max_heap_table_size=128M in order to decrease the high amount of disk I/O occuring on this server. This should help with the query building by allocating more memory to this resource. I have also disabled persistent MySQL connections from PHP: mysql.allow_persistent = Off It appears you are reaching your maximum connections limit for MySQL. The above adjustments are conservative due to the low amount of physical memory you have on this server. When your server runs out of physical memory, it resorts to using disk space (SWAP memory). This swapping can and will cause your server to become unresponsive. You may also consider increasing the amount of physical memory on this server with a RAM upgrade. If you are interested in proceeding, I can send this ticket to a BDC who can assist you with this upgrade and update you on pricing for this component. Besides processes in "sleep" status, indicating the use of persistent MySQL connections, it appears most of the connections are due to table locking occuring: +-----+---------+-----------+-----------+---------+------+-------------------------------+------------------------------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+---------+-----------+-----------+---------+------+-------------------------------+------------------------------------------------------------------------------------------------------+ | 573 | xxxx | localhost | community | Query | 9 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 574 | xxxx | localhost | community | Query | 10 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 583 | xxxx | localhost | community | Query | 10 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 584 | xxxx | localhost | community | Query | 9 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 591 | xxxx | localhost | community | Query | 10 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 593 | xxxx | localhost | community | Query | 10 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 728 | xxxx | localhost | community | Query | 5 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 729 | xxxx | localhost | community | Query | 4 | Locked | select a.AddOnID as AddOnID, a.AddOnTypeID as AddOnTypeID, a.ApplicationID as ApplicationID, a.Au | | 733 | xxxx | localhost | community | Query | 3 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 734 | xxxx | localhost | community | Query | 3 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 730 | xxxx | localhost | community | Query | 3 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 735 | xxxx | localhost | community | Query | 2 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 736 | xxxx | localhost | community | Query | 2 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 737 | xxxx | localhost | community | Query | 2 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 738 | xxxx | localhost | community | Query | 0 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 739 | xxxx | localhost | community | Query | 0 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | | 740 | xxxx | localhost | community | Query | 0 | Locked | SELECT t.DiscussionID AS DiscussionID, t.FirstCommentID AS FirstCommentID, t.AuthUserID AS AuthUs | +-----+---------+-----------+-----------+---------+------+-------------------------------+------------------------------------------------------------------------------------------------------+ as these queries are locking the table, subsequent queries are having to wait and thus stacking up taking available connections. You may find that changing this table type to Innodb may help with this table locking issue. You may need to discuss with your developers if this change would have an inverse affect to your applications. As well, I have enabled slow query logging in: /var/lib/mysqllogs/slow-log which will log queries taking over 5 seconds to complete. This information will help your developers to optimize any SQL queries and/or apply indexing where appropriate. I have also put in the option in Apache: MaxRequestsPerChild 1000 which will help to reduce the memory footprint of this service. While it appears that the above changes helped with the non availabilty of MySQL, the server is still highly loaded.
Now, I always knew that the Vanilla 1 queries were hairy and could cause problems. I didn’t think it was going to happen any time soon, and I was hoping to get Vanilla 2 in place before this became an issue (Vanilla 2′s queries are much simpler and faster) – but it looks like that is not going to happen. Regardless, it would seem that my traffic has slowly and steadily been increasing at lussumo.com over the years. In December we peaked at 2.5 million page views for that month at lussumo.com alone, and we’ve maintained that amount of traffic almost every day since.
Obviously I could throw more RAM at the server as the Rackspace support person suggested – this seems to be a common answer to problems of this sort (we currently only have 1G of ram on the server), but I don’t know if that is the answer I should be looking for – especially considering that I’m already paying a lot of money for the server.
So, I am hoping that all of those who use Vanilla can step up to the plate and offer your expertise on how to resolve this issue. I am opening the doors and accepting any and all advice, questions, ideas on how to fix the problem.
Here is what I have tried so far:
* I reviewed the slow queries that mysql logged and found that 99% of them were Vanilla’s “comments page” and “discussions page” queries. I’ve uploaded a sample of the slow query log so you can see what queries are causing problems.
* I downloaded a copy of the Lussumo Community database to my local dev machine so I could get a good look at the tables, indexes, etc.
* I found that none of the indexes that are included with the current release of Vanilla 1 were applied on the tables (other than primary keys). This is probably due to the fact that I’ve just added columns as development has continued and never had a problem before now.
* I added the indexes that are shipped with the current release of Vanilla 1 to the community database. I found that this had little-to-no effect on the speed of the page-load (it might have even made the queries slower).
* I’ve created a script that converts all of the tables in the community db to innodb tables (as suggested by the rackspace tech). I’ve done some googling that has detailed both good and bad results of this type of change. It could start to throw fatal errors when data is being inserted (rather than while it’s being selected, as it is now). I have not yet run this script as I want to hear back from the community first.
* I’ve taken the community forums offline and enabled wp-cache on this blog so that everyone can have access to this blog post and be fully aware of the issue.
Help!
So, I am reaching out to you for help. No question is a dumb one. Any idea is welcome. Please share your expertise and help us to get this convoy back on the road…
Update
It turns out that I had forgotten to apply all of the indexes & optimizations to this database through the years that we’ve been online. The growth of our community, combined with poor indexing caused a couple of the tables to begin to lock. The LUM_User and LUM_UserDiscussionWatch tables in particular were locking. These tables are updated frequently with login information and discussion tracking information respectively. Because the tables were MyISAM type, all records would be locked when an update was applied to just a single row – this meant that all 9000+ user records would get locked whenever anyone’s “DateLastActive” field was updated, and all 90,000+ records in the LUM_UserDiscussionWatch table would get locked whenever anyone even looked at a single discussion (and the record of their view of that discussion was recorded).
To fix both of these issues, I changed their table types to InnoDB so that only the affected row should become locked when updates are applied.
I also analyzed the Discussions & Comments queries, which are (obviously) the most actively run queries in the application. The comments query was extremely slow. After running EXPLAIN on the query, I found that it was indexed incorrectly. For some reason the LUM_Comment table was using both the CommentID and the DiscussionID columns as it’s primary key. I removed the DiscussionID as a primary key and added it as a simple index. This allows the query to not scan the entire LUM_Comment table when performing the join to LUM_Discussion. I also found that the LUM_UserBlock table had no indexes at all, so I added those and was able to further reduce the query time. Here is a list of the changes that I made to the database for anyone who might be interested:
ALTER TABLE `community`.`LUM_Comment` DROP PRIMARY KEY, ADD PRIMARY KEY USING BTREE(`CommentID`), ADD INDEX `comment_discussion`(`DiscussionID`); ALTER TABLE LUM_UserBlock ADD INDEX (BlockingUserID); ALTER TABLE LUM_UserBlock ADD INDEX (BlockedUserID); ALTER TABLE LUM_User ENGINE=InnoDB; ALTER TABLE LUM_UserDiscussionWatch ENGINE=InnoDB;
Thanks to Damien (Dinoboff) and Dave (Wallphone) for jumping in and offering some assistance.

Can you show the slow query log (you can set the time for what qualify as a slow query)?
ps: Have your thought to move to Amazon EC2; a basic server (1.7Go of ram) cost 72$/month and they can easy be scaled up.
You could try the SimpleCache extension, while it doesn’t change anything about the queries, it can reduce the number of queries sent to the DB.
I haven’t played with it much to know if it would help in this situation.
@Damien – I’ve uploaded a sample of the slow query log already (it’s linked in the post above). I haven’t considered moving servers because the server is really not the problem that I want to solve.
@WallPhone – I haven’t tried it. I’ll take a look at that next.
Which table is creating the lock. The User table with the visit count? The UserDiscussionWatch? I don’t think it could be the discussion or comment table. I don’t think there is much edit/minutes.
You could try to convert them to Innodb. If the UserDiscussionWatch is the problem, it might be possible to partitioned, using MyISAM Merge table for the display of the bookmarked discussion or search result?
Forgot my last comment. It is not like many of these slow queries are affected by locking problems.
I’m thinking about setting up memcached and writing a plugin for Vanilla 1 to use it. Should be fairly straightforward….
That would be good for anonymous visitors. For logged-in ones, you will need to break down some of the query in multi query, which won’t be as easy.
About this query:
SELECT count(m.CommentID) AS Count
FROM LUM_Comment m
inner join LUM_Discussion t ON m.DiscussionID = t.DiscussionID
WHERE
(m.Deleted = ‘x’ or m.Deleted = x )
and (m.WhisperUserID = ‘x’
or m.WhisperUserID is null
or m.WhisperUserID = x
or m.WhisperUserID = ‘x’
or m.AuthUserID = ‘x’ )
AND m.DiscussionID = ‘xxxx’;
count: 319 (2nd in the slow-query-log)
I am guessing it is used to regenerate the count discussion of a row on the discussion table. Could not it be smarter and just increase the count or decreasing it.
For the worst one:
SELECT coalesce(ab.Blocked, x) AS AuthBlocked,
coalesce(cb.Blocked, x) AS CommentBlocked,
m.CommentID AS CommentID,
m.DiscussionID AS DiscussionID,
m.Body AS Body,
m.FormatType AS FormatType,
m.DateCreated AS DateCreated,
m.DateEdited AS DateEdited,
m.DateDeleted AS DateDeleted,
m.Deleted AS Deleted,
m.AuthUserID AS AuthUserID,
m.EditUserID AS EditUserID,
m.DeleteUserID AS DeleteUserID,
m.WhisperUserID AS WhisperUserID,
m.RemoteIp AS RemoteIp,
a.Name AS AuthUsername,
a.Icon AS AuthIcon,
r.Name AS AuthRole,
r.RoleID AS AuthRoleID,
r.Description AS AuthRoleDesc,
r.Icon AS AuthRoleIcon,
r.PERMISSION_HTML_ALLOWED AS AuthCanPostHtml,
e.Name AS EditUsername,
d.Name AS DeleteUsername,
t.WhisperUserID AS DiscussionWhisperUserID,
w.Name AS WhisperUsername
FROM LUM_Comment m
inner join LUM_User a ON m.AuthUserID = a.UserID
left join LUM_Role r ON a.RoleID = r.RoleID
left join LUM_User e ON m.EditUserID = e.UserID
left join LUM_User d ON m.DeleteUserID = d.UserID
inner join LUM_Discussion t ON m.DiscussionID = t.DiscussionID
left join LUM_User w ON m.WhisperUserID = w.UserID
left join LUM_CategoryRoleBlock crb ON t.CategoryID = crb.CategoryID and crb.RoleID = x
left join LUM_UserBlock ab ON m.AuthUserID = ab.BlockedUserID and ab.BlockingUserID = x
left join LUM_CommentBlock cb ON m.CommentID = cb.BlockedCommentID and cb.BlockingUserID = x
WHERE (crb.Blocked = ‘x’ or crb.Blocked = x or crb.Blocked is null )
and (m.Deleted = ‘x’ or m.Deleted = x )
and (m.WhisperUserID = ‘x’ or m.WhisperUserID is null or m.WhisperUserID = x or m.WhisperUserID = ‘x’ or m.AuthUserID = ‘x’ )
AND m.DiscussionID = ‘x’
ORDER BY m.DateCreated asc
LIMIT x, x;
count: 665
I am guessing block user and block comment feature are making things worst.
instead of:
CREATE TABLE `LUM_CommentBlock` (
`BlockingUserID` int(11) NOT NULL default ’0′,
`BlockedCommentID` int(11) NOT NULL default ’0′,
`Blocked` enum(’1′,’0′) NOT NULL default ’1′,
KEY `comment_block_user` (`BlockingUserID`),
KEY `comment_block_comment` (`BlockedCommentID`)
);
It should be:
CREATE TABLE `LUM_CommentBlock` (
`BlockingUserID` int(11) NOT NULL default ’0′,
`BlockedCommentID` int(11) NOT NULL default ’0′,
`Blocked` enum(’1′,’0′) NOT NULL default ’1′,
PRIMARY KEY (`BlockingUserID `,`BlockedCommentID `)
);
The LUM_CategoryRoleBlock was also missing the primary key. I added it in revision 790.
For other key update we will need to EXPLAIN.
^^^ For other keys we will need to use EXPLAIN.
@Damien – re: multiquery – do you have experience with that? At the very least I could implement memcached on all queries that are not user-specific. I bet that would help immensely.
“I am guessing it is used to regenerate the count discussion of a row on the discussion table. Could not it be smarter and just increase the count or decreasing it.”
Yes, definitely. I think the real problem comes with whispers (where a discussion has a different count if there are whispers for the selecting user) – that’s why I didn’t use a stored value.
“I am guessing block user and block comment feature are making things worst.”
Could be. I’ve tried editing the queries and removing various joins to see what effect it would have on speed. It really had no effect in my tests – but that was just me running the query once – so I’m not sure the effects it would have when the server is getting nailed.
Re: the commentblock table keys – I’ll rebuild the table with the appropriate primary keys as you suggested.
Re: LUM_CategoryRoleBlock – I’ll add the primary key to that table.
Re: explain – do you want me to send you explain results for particular queries? Let me know and I’ll send them along. I can also send you a copy of the community db with whispers & other sensitive user information obscured so you can do your own tests…
We might want to move this discussion to email so we can respond to each other faster.
Thanks for your help!
I asked for more help from the rackspace technicians, and I got another great response:
The following queries have caused lock waits per the slow query log:
UPDATE LUM_UserDiscussionWatch SET CountComments=’128′, LastViewed=’2009-02-17 14:28:13′ WHERE UserID = ’9810′ AND DiscussionID = ’5448′;
UPDATE LUM_User SET DateLastActive=’2009-02-17 11:18:32′, VerificationKey=’55158xxxxxxxxxxxxxxfd331293′, CountVisit=CountVisit + 1 WHERE UserID = ’2469′;
UPDATE LUM_User SET DateLastActive=’2009-02-17 12:53:21′, VerificationKey=’ddf1b1xxxxxxxxxxxxxad45b6′, CountVisit=CountVisit + 1 WHERE UserID = ’9778′;
UPDATE LUM_User SET DateLastActive=’2009-02-17 15:08:21′, VerificationKey=’ff66xxxxxxxxxxxxxxxdfdc95f79′, CountVisit=CountVisit + 1 WHERE UserID = ’9810′;
These queries are creating a write lock on the LUM_UserDiscussionWatch and LUM_User tables. Converting these to InnoDB may improve performance.
The remaining queries printed to the slow query log are SELECTs utilizing JOINs. These queries are likely running slowly because the tables referenced in the JOIN are not indexed properly. A way of determining which tables require indexes is to use the EXPLAIN command against each SELECT statement that is running slowly. Please see the attached text file for an example of usage and the resulting output. If you would like additional assistance optimizing your queries then I would recommend speaking with your account manager about engaging the services of our DBA team.
Using explain, I was able to optimize the slowest query Damien listed above with:
ALTER TABLE `community`.`LUM_Comment` DROP PRIMARY KEY,
ADD PRIMARY KEY USING BTREE(`CommentID`),
ADD INDEX `comment_discussion`(`DiscussionID`);
ALTER TABLE LUM_UserBlock ADD INDEX (BlockingUserID);
ALTER TABLE LUM_UserBlock ADD INDEX (BlockedUserID);
It is *much* faster, now.
I’ve updated the blog post above with the latest info about changes & fixes I’ve applied.
About the useblock and commentblock table, I think it should be like that:
# MySql will be able t use one index or the other
# but not both on the same join;
# you need the two column in the index
ALTER TABLE `lum_commentblock` DROP INDEX `comment_block_user`;
ALTER TABLE `lum_commentblock` DROP INDEX `comment_block_comment`;
ALTER TABLE `lum_commentblock` ADD PRIMARY KEY ( `BlockedCommentID` , `BlockingUserID` )
ALTER TABLE `lum_userblock` ADD PRIMARY KEY ( `BlockedUserID` , `BlockingUserID` )
Interesting, User 9810 is another one without a role history: http://lussumo.com/community/account/9810/
I pulled the number from Mark’s comment above.
2469 is Dinoboff
9778 appears to be a legitimate user, but a seemingly random name.
@Damien – I’ve fixed up the indexes on those two tables as you suggested. Thanks!
@Matthew – that’s weird about the lack of role history. I have a theory that they somehow got in on a registration page other than the community forum one. I think there might be one laying around in the addons app or the bug forum app. I’m not overly concerned about it since all of those apps will be replaced (hopefully very) soon.
I ran into a problem similar to this not to long ago and had to quickly learn of indexes!
Indexes, EXPLAIN and ANALYZE will become your friends.
Fortunately for me, I have DBAs at my expense!
To be honest, I’ve worked with indexes a lot in the past, and I was totally surprised by the fact that I had simply forgotten to apply indexes to the community database this entire time. It’s just not something that I thought to do – and a relatively massive mistake.
*slaps forehead*
So, speaking of indexes, are you adding them to Garden where needed?
Of course!
It’s not that I forgot to add indexes to Vanilla as an application. It’s that I forgot to add them to the community forum installation – which has been around since Vanilla 0.1.
I hope V2 won’t have such problems.
@Trevor – this is the reason v2 was started in the first place. Check out this blog post about garden from Dec 2007: http://markosullivan.ca/blog/?p=14
Quote: “What changes should be made for version 2?
(a) It needs to perform faster – this means that some of the features that slow it down need to come out. It needs to allow caching in some way. It needs to not rely on object oriented programming (unless it is PHP5 specific).”
Silly question of the day, but which of the aforementioned should/can I apply to ensure optimum performance? (not a MySQL DB guy)
@TomTester – my problems arose from the fact that I had not been applying the indexes that are included with the release version(s) of Vanilla. If you make sure that your installation has the indexes of the release version, you’re good to go.
I suppose Garden have a debugging mode that display the query executed by the current request like Vanilla has (or had). You could add a link to each query to perform EXPLAIN on it.
@Damien – Yes, Garden has a debugging plugin. It was the first plugin I wrote
It doesn’t have the link to explain, but that’s a pretty cool addition!
I dont know much about mySQL. From what I have seen of Vanilla, which is in my view the best looking forum out there, it “deserves” to run off a 2GB ram server. Its an honour thing!
I am surprised it is running off 1GB ram. I would urge you to upgrade to 2GB ram.
Also, I use http://www.fasthosts.co.uk who are UK based. They might not be fanatical about support, but they offer a 2GB dual core dedicated for £100!
I cant wait for Vanilla2, which I am hoping is a few days away. I will definetly donate to the cause once I get it setup. Just so you know, the ability to integrate into an existing site is gold dust!
Saeed