Message boards : Questions and problems : boinc.exe starting problem
Message board moderation
    
| Author | Message | 
|---|---|
| Send message Joined: 4 Jul 09 Posts: 12   | 
 I've got 4 computers that recently developed a problem that I can't figure out. It's also difficult to describe so please bear with me. They all seem to having a similar problem but I'll focus on one for now. It's a server running Windows Server 2003 and client version 6.6.36, running as a service. It was working fine but as of about a week ago, when I start BOINC Manager it says "Communicating with BOINC client. Please wait..." and never actually connects. I look at the processes that are running and I can see boinc.exe running under the user name boinc_master and it is using 100% of the CPU time. If I stop and restart the BOINC service, the same thing happens. The boinc.exe process starts and uses 100% of the CPU time and I can't connect with the BOINC manager. Any ideas? Keep in mind that this is happening on 4 computers and all started having the problem around the same time. | 
| Send message Joined: 5 Oct 06 Posts: 5150   | 
 It sounds similar to an experience I documented in BOINC cc using excess CPU as service, and I've seen a couple of times since. It seems only to affect servers, and only to affect BOINC itself if installed as a service. In that thread, I showed that the same installation of BOINC, when started by BOINC Manager as a user program, behaved absolutely properly, but when started as a service (from the service control panel, services.msc) hit that excess cpu problem, and everything follows from that. My experience was with v5.10.45 - can't go above that on a domain controller - but you make it sound the same issue is still in v6.6.36 The only thing that's cured it for me is to reboot the computer - not always easy to schedule on a server, but it has to be done sometime to install security patches. In fact, I see some correlation with automatic updates being downloaded and ready to install: that might explain why all four of your machines developed the same behaviour at around the same time. 'Update Tuesday' was last week. | 
| Send message Joined: 4 Jul 09 Posts: 12   | 
 I thought that the Microsoft Updates might have had something to do with it but I've been able to reboot one of the servers and it still has the same problem. Also, as of last night, the problem showed up on a Vista machine. It was a little different in that boinc.exe was actually starting and then crashing after about 5 seconds though. | 
|  KSMarksPsych  Send message Joined: 30 Oct 05 Posts: 1239   | 
 I thought that the Microsoft Updates might have had something to do with it but I've been able to reboot one of the servers and it still has the same problem. Also, as of last night, the problem showed up on a Vista machine. It was a little different in that boinc.exe was actually starting and then crashing after about 5 seconds though. Check stderr.txt (I think that's what the file's called: been a while since I had BOINC on a Win machine) for a stack dump. Is this one also on 6.6.36? Kathryn :o) | 
|  Jord  Send message Joined: 29 Aug 05 Posts: 15705   | 
 Hi Cleaner, Can you test with BOINC 6.10.15, the latest beta release candidate if the problem continues? If possible, can you update to this BOINC on a machine that hasn't gotten in its Windows Updates yet (or didn't do a reboot yet) and see if it is unaffected? I have warned the developers already that there seems to be (a) Windows Update(s) again that break(s) BOINC. We had a similar report in this thread a couple of days ago. He managed to fix it with the old release candidate. | 
| Send message Joined: 4 Jul 09 Posts: 12   | 
 I looked at stderrdae.txt on one of the servers and the only information in there that seems relevant is: *** Dump of thread ID 7800 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 937500.000000, User Time: 781250.000000, Wait Time: 4852726.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x004350ED read attempt to address 0x00001CF0 This particular server is running 6.6.38 actually and has an interesting twist that might shed some light on this problem. I have my clients set to only do network communication at night. Boinc appears to run normally when I start it this morning until I tell it to do any network activity. Then boinc.exe immediately crashes. The vista machine that developed the problem last night was running 6.6.38 but I installed the beta client 6.10.15 and it seemed to fix the problem. At least it was still running when I checked it this morning. Both of these machines have had the latest round of Windows Updates installed. I can try to install the client 6.10.15 on one of the servers that hasn't been rebooted yet and hasn't had the Windows Updates installed yet and report back what happens. | 
| Send message Joined: 5 Oct 06 Posts: 5150   | 
 Hi Cleaner, I've just been round an update cycle on three machines with Win XP, BOINC installed as a service: 1) Set BOINC service start to 'disabled' (but leave it running) 2) Install Windows updates - all offered 3) Allow Windows Update to restart computer 4) Install BOINC upgrade while idle, v6.10.13 --> v6.10.15, settings unchanged 5) Open BOINC Manager to re-start daemon All worked OK, no error messages - but v6.10.15 is seriously messed up for multi-threaded tasks, the cure is worse than the disease. Will prepare logs, screenshots and write-up for boinc_alpha. | 
| Send message Joined: 4 Jul 09 Posts: 12   | 
 Ok, I just installed client 6.10.15 on one of the servers. This server was having the original problem described of boinc.exe running at 100% CPU time and the BOINC Manager not able to communicate with it. It never launches any tasks. Right after installing the update, I started the BOINC Manager and it connected to boinc.exe long enough for me to see some messages. It had the message about the version change (6.6.36 -> 6.10.15) and ran CPU benchmarks. Then as soon as it was finished with the CPU benchmarks the BOINC Manager lost communication with boinc.exe even though it is still running. | 
|  Jord  Send message Joined: 29 Aug 05 Posts: 15705   | 
 Ok, is boinc.exe still running (check in task manager)? If it isn't, can you please open stdoutdae.txt in the BOINC Data directory, scroll to the bottom, and check which project application or applications started before the crash? If boinc.exe is still running, try stopping it. Now just start BOINC Manager then quickly go Advanced view->Tasks tab->Click "show active tasks". Hopefully you have enough time for that, that is. (if you didn't let me know and I'll tell you where to add what in the registry) Do you have much work cached on that machine? Are there any messages in stderrdae.txt or stderrgui.txt? | 
| Send message Joined: 4 Jul 09 Posts: 12   | 
 I'll try to get to this as soon as I can but work is getting in the way. :) | 
| Send message Joined: 4 Jul 09 Posts: 12   | 
 Ok, the boinc.exe process is still running. The BOINC Manager won't connect to it so I can't see what tasks are running. I don't believe any tasks are running because there are no processes running under the boinc_project username. If I kill the boinc.exe process and close the BOINC Manager, then restart the BOINC Manager and try to look at the tasks really fast I still can't see anything before it says "Communicating with BOINC client. Please wait...". The last lines in the stdoutdae.txt file are: 20-Oct-2009 12:53:34 [---] General prefs: using your defaults 20-Oct-2009 12:53:34 [---] Preferences limit memory usage when active to 1023.73MB 20-Oct-2009 12:53:34 [---] Preferences limit memory usage when idle to 1535.60MB 20-Oct-2009 12:53:34 [---] Preferences limit disk usage to 2.00GB 20-Oct-2009 12:53:35 [---] Suspending network activity - time of day Both stderrdae.txt and stderrgui.txt are empty. They are zero length files. I'm not sure how to check how much work is cached on this machine but the SLOTS folder is completely empty. | 
|  Jord  Send message Joined: 29 Aug 05 Posts: 15705   | 
 I'm not sure how to check how much work is cached on this machine but the SLOTS folder is completely empty. The slots are only used when work is actually being done. In all cases, you should be able to see how much work you have per project in the BOINC\projects\project_url\ directories in the BOINC Data directory. But ok, let's test something. Open regedit (Start->type regedit, click OK) Navigate to HKEY_CURRENT_USER\Software\Space Sciences Laboratory, U.C. Berkeley\BOINC Manager\Tasks Change ActiveTasksOnly from 0 to 1 Exit regedit. Start BOINC Manager. What this does is start BOINC Manager showing only active tasks in the Tasks tab. Active tasks are all those running, suspended, waiting to run. Anything other than ready to start at least. With showing tasks this way you can cut into the problems BM has with showing large caches. There's no need to restart the computer or anything. | 
| Send message Joined: 4 Jul 09 Posts: 12   | 
 I changed the registry entry as you described and then started BOINC Manager. Still no luck, had the same result. Boinc.exe is running but BOINC Manager just says "Communicating with BOINC client..." As for projects, I have 8 projects and the folder with the most files only has 257 files in it and that's World Community Grid. The project with the 2nd most files in it is Rosetta with 76 files. All 8 of the project folders total about 315MB of data. | 
|  Jord  Send message Joined: 29 Aug 05 Posts: 15705   | 
 All right, can you check in Windows Firewall (or whichever other firewall you use on that system) that you still allowed and boinc.exe and boincmgr.exe through?  If you can set port numbers, boinc.exe needs internet access on TCP port 80 and 443, while to be able to communicate with each other, boinc.exe and boincmgr.exe need access to TCP port 31416. | 
| Send message Joined: 4 Jul 09 Posts: 12   | 
 I am not running any firewall software on this server. It almost seems like boinc.exe is getting hung up trying to do something right before it kicks off the tasks. I can see in the Security log that the boinc_project user is getting logged in so it's not that account. Any other ideas? I'm kind of at a loss here. | 
|  Jord  Send message Joined: 29 Aug 05 Posts: 15705   | 
 I know you can disable the firewall in Windows, but as far as I know that doesn't totally disable it. Some portions of it will keep on running, I think especially on the Server versions.  But OK, I have just posted this advice to someone with a similar problem. Could you follow that as well and then report back? At least we know it's not lingering remnants or wrong accounts then. | 
| Send message Joined: 22 Oct 09 Posts: 1   | 
 Thank you Ageless... I have laptop Vista. woke this morning to find Windows updated and now Boinc would not connect. after reading it may be issue with 6.6.38 I ran your 6.6.41 windows 32 bit It WORKS... Thank you. Bob | 
| Send message Joined: 4 Jul 09 Posts: 12   | 
 Ok, well, here's an update on this problem. Since these are servers that are having the problems I haven't been able to reboot them. I was looking at the BOINC logs and all of the servers having the problems appeared to develop the problem sometime on October 13th, the same day that Microsoft released all their updates. I had kind of discounted the Windows Updates as the problem because 3 of the servers having the problem hadn't even had the updates installed yet. They were simply showing as updates that needed to be installed. Last night I got the opportunity to install these updates and restart the servers. It appears to have fixed the problem. I guess maybe because Windows was waiting for the updates to be installed was enough to stop BOINC from working properly? Anyway, yes, I installed the updates on the 3 servers that didn't have them installed yet, including the one that I upgraded to BOINC 6.10.15, restarted them and they are all happily chugging away now. The only one that doesn't fit is the 4th server that I had already installed the updates on and restarted but was still having an issue with BOINC. I restarted it again last night and now it is working too. Who knows...??? Thanks for the help Ageless. I have no idea what could have caused this other than the Windows Updates simply waiting to install. It must be related because of the date all of this started. | 
| Send message Joined: 5 Oct 06 Posts: 5150   | 
 Windows Updates simply waiting to install. Yes, that's exactly the situation I've been observing - sporadically, not every month. But then, I don't update servers every month, so the problem may only manifest itself once a critical mass has been reached. | 
        Copyright © 2025  University of California.
        
Permission is granted to copy, distribute and/or modify this document
        under the terms of the GNU Free Documentation License,
        Version 1.2 or any later version published by the Free Software Foundation.