Overall, everything works fine. I´ve got about 8k users which most of them use POP3 but there is a large portion of users that user IMAP and Webmail.
Users are able to send and receive email authenticating agaisnt a Win2k3 Active Directory + secured connections with SSL.
Besides some other problems (that I posted in this forum), there is a problem that dovecot hangs at times. It may be a coincidence, but it happens everyday at about 9:00/9:30am, 11:20am, 1:50pm and 2:30pm (as far as I have monitored/been told), which are the hours that users come to work and return from lunch break.
When dovecot hangs, i restart its service and all is good again. I used to think that my AD was being overwelmed with authentication (regular windows login and dovecot), but I installed a new AD server on Win2k3 Enterprise Edition 64bits on a new and powerfull hardware and the problem persists.
My mail server is installed on an identical hardware with RHEL 5.4 64bits. Dovecot 1.0.7, Postfix 2.3.3.
Since you are using Active Directory, please use Kerberos authentication instead of LDAP authentication. There are a lot of problem with LDAP authentication and this is one of them.
normal dovecot hang is becoz of time sync to different source, configure your ntpd to sync with AD server should help. i experience dovecot hang when ntpd sync to external source resulted more than 5 second time change, dovecot will stop by itself.
Well, I didn't have a time syncronization problem, but thanks for the advise.
I changed my structure to a new server, and so far so good. Still no problems with dovecot hanging.
I made some interesting chages:
Dovecot now uses passdb pam instead of passdb ldap;
System doesn't use samba, nor winbind nor changes in nsswitch;
Dovecot no longer hangs (considering that this is end-of-the-year period, and people here usually don't check their emails and stuff, lots of people in vacation, so the load is a lot lower).
For a week I had a perfect server with no flaws or slow processing.
Yesterday (monday, first working day of the year) I had problems with dovecot hanging again, but this time only in the morning.
The problem seems strange and it looks like dovecot has been having problems with IMAPs and not with POP3s. Well, I'm not really sure about what I just said… Anyways, does dovecot have a limitation of concurrent users or something like that?
What would cause it to hang for about 10 minutes? A user with a "broken" /home directory in file system logging in, perhaps?
Since you are now using Kerberos, we can cross it out. If you are sure it's not due to time synchronization as pointed out by Terry, we can also cross it out. How about the log file, does it contain anything suspicious?
Not really… I keep checking the log, but there are A LOT of entries. But as far as I looked, nothing so unusual.
I discovered something when the problem happened again this afternoon. Dovecot doesn't hang completely. I mean, Outlook (and Thunderbird) users still can connect, but webmail is slow…. super slow! Eventually, if I wait longer Outlook won't connect anymore. So there is this period where the clients work but are slow. After that, dovecot hangs.
In maillog I keep getting this at times (both rip=127.0.0.1 and rip=network-ip-address)
Otherwise, the log file seams to be OK. I haven't seen anything unusual right before the hanging so far. I'll be taking closer looks. Which log analyser could I use to make it easier?
It is not about the number of file descriptor. And yes you should change something in Dovecot, try setting the login_process_size to 64. See http://wiki.dovecot.org/LoginProcess for more details.
My login_process_size is already set to 64, but login_max_processes_count was at 128.
Then I issued the following command every second:
ps -ef | grep imap-login | grep dovecot | wc -l
The results were in between 149 and 152.
and my webmail already was starting to get slow at this time (2:24pm) and dovecot is about to hang.
I changed login_max_processes_count to 256 and restarted dovecot. Ten minutes after, my processes are increasing slowly. All is fine (as usual after I restart dovecot).
Hope this fixes the problem. The next "hanging" would be on monday at about 9:40am.