Message boards : BOINC client : (temporarily) Solving the LHC/BOINC crashing problem.
Message board moderation
Author | Message |
---|---|
![]() Send message Joined: 29 Aug 05 Posts: 15585 ![]() |
At this moment LHC's scheduler is running again, so the below is no longer necessary. I will start with a warning. By following the below you will rid yourself of any work of LHC that you still have uploading, ready to start, downloading or ready to report. it will go lost. ------------------------------------------------ If you do not want to lose any work from LHC, force BOINC to use the Network Activity Suspended option. This will let BOINC run, but no projects will upload/download/report. To do so, exit BOINC, navigate to your BOINC directory, edit client_state.xml, scroll to the bottom of the file and where it says <user_network_request></user_network_request> change the value in between the tags to 3. Save client_state.xml and restart BOINC. ------------------------------------------------ So only use the below if you truly want to, don't mind the lost work etc. Otherwise wait for LHC to return on line, while keeping your network activity suspended. ------------------------------------------------ Exit BOINC. Navigate to your BOINC directory. Open client_state.xml with a text editor. Use the Find option (usually under the F3 key) Type in lhc and click Find. Move your cursor to the left side of the screen, make sure it sits before the <project> tag <project> <master_url>http://lhcathome.cern.ch/lhcathome/</master_url> Now hold down Shift. Scroll down, all the way till you see the next <project> tag. Stop the scroll after the last </result> tag you see before the next <project> tag. You have now selected all of LHC. Hit Delete. All what you selected will now be deleted. Save client_state.xml through the File->Save menu. (!! Don't use the Save As... option. !!) Still in your BOINC directory, rename client_state_prev.xml to client_state_prev.xml.backup (if you want to get rid of LHC from here on in for the moment, delete or rename account_lhcathome.cern.ch_lhcathome.xml) Restart BOINC. Set LHC to No New tasks and/or Suspend. Why the backup of client_state_prev.xml? Now it still has the LHC files in it. Perhaps if LHC comes back, that you can edit client_state.xml again and add the information from the backup file. I don't think it'll work, but it never hurts to try at that time. At that time you just copy the material from the backup file back into client_state.xml and delete the new cs_prev.xml file. Do NOT change anything else in client_state.xml !! And if the above doesn't help you, or you still have questions, or you want me to do it for you, just ask for help. Please! |
Send message Joined: 31 Mar 08 Posts: 59 ![]() |
The above solution will hack out LHC WU's for transfer. To disable network connectivity with WU's pending for transfer, find the below string at the bottom of client_state.xml (open with Notepad): <user_network_request>1</user_network_request> Replace the numeral one (or whatever) in the above string with a numeral three. Save the file. Then restart BOINC. Do this and the aforementioned procedure by Ageless after BOINC has been terminated. Ensure that no BOINC client is running either (for me Rosetta would load and begin execution and then BOINC manager would crash trying to upload LHC WU). FWIW, do Ageless suggested "hack" if and ONLY if you have WU's for other projects that MUST be uploaded for credit (or if you have no other WU's to crunch). Thanks Ageless. Hope these band-aides help somebody. |
![]() Send message Joined: 29 Aug 05 Posts: 147 |
The above solution will hack out LHC WU's for transfer. To avoid getting work from LHC you could suspend the project. You still have to elide all of the information from LHC for a bit, but having the account file present will auto attach you to the project. ![]() BOINC WIKI |
Send message Joined: 31 Mar 08 Posts: 59 ![]() |
Right, what you said. However, the aforementioned "band-aides" address the specific situation of BOINC manager crashing at this time for users who have completed LHC work units stuck in the transfer queue due to a crashed LHC server. To fix the BOINC manager crash, network connectivity must be disabled thereby preventing BOINC access to the LHC server. To disable network connectivity one has to hack the client_state.xml file as indicated. Doing so in the GUI is futile, in that the BOINC manager crashes on start (as soon as upload to LHC of completed LHC WU's that are stuck in the transfer queue is initiated). In my case a download pending for another BOINC client, a different BOINC client began execution, and LHC attempted to upload a completed result, and then BOINC manager crashed. The temp fix for this problem is to disable network communication. This however shuts the door to communication for ALL projects. To resolve THAT issue, hacking out completed LHC WU's allows users who have completed BOINC client WU's OTHER than LHC to upload that are approaching deadline (or if LHC WU's awaiting upload are passed deadline), OR have no other WU's to crunch in the mean time. If one is a dedicated LHC client, then they're sort of stuck for the time being. |
![]() Send message Joined: 29 Aug 05 Posts: 15585 ![]() |
Doing so in the GUI is futile, in that the BOINC manager crashes on start Actually, BOINC Manager works fine. It's the core client (boinc.exe) that crashes. But that's peanuts. ;-) |
Send message Joined: 5 Oct 06 Posts: 5142 ![]() |
I removed the sections in client_state.xml that Ageless suggests (<project>...</project> and <active_task>...</active_task>), so there were no references to LHC in the file. BOINC started just fine: I have other projects on the box, so I wanted to keep networking enabled. However, I hadn't read JM7's comment about the 'account_' file, so BOINC tried to reconnect to LHC with a 'project initialisation' request. I suspended the LHC Project which had reappeared in the projects list, but BOINC kept sending the initialistation requests (BUG? v5.10.13, as usual) and evenually crashed. I closed it down, removed the (largely empty) <project>...</project> which had reappeared in client_state, and parked the account_ file in a handy folder out of the way. This time, when I restarted BOINC, all seemed to work normally. I found two files in the BOINC folder, "master_lhcathome.cern.ch_lhcathome.xml" and "sched_reply_lhcathome.cern.ch_lhcathome.xml", both datestamped at the time BOINC was trying to re-initialise the project. Both of them appear to be copies of a recent front page of the LHC website. I would expect that for 'master_', but 'sched_reply_'?????? |
Send message Joined: 6 Jun 06 Posts: 12 ![]() |
I've put lhcathome.cern.ch into my /etc/hosts with a dummy address. After re-enabling networking, my other projects are uploading and downloading fine. The entry looks like this: 172.20.1.1 lhcathome.cern.ch That address should be OK for most people, but if you feel you need to change it for any reason, please be careful and use RFC3330 as a guide. Unfortunately, this means I can't go to the project home page from my machine to check the status. Perhaps it should be a principle that hostnames for WU transfers should be different from those for home pages (even if they resolve to the same IP address). |
Send message Joined: 30 Aug 05 Posts: 65 |
I've put lhcathome.cern.ch into my /etc/hosts with a dummy address. After re-enabling networking, my other projects are uploading and downloading fine. This worked a treat for me. LHC work tried to upload and immediately failed with a non response and a days worth of MalariaControl.Net was able to upload fine. Thanks for the suggestion, it sure got me out of a bind! Live long and BOINC! Paul. ![]() |
Send message Joined: 6 Jun 06 Posts: 12 ![]() |
I've just come back to this, because my LHC WUs from Monday were still not being uploaded. I wasn't overly concerned, because the LHC@HOME Web site is still down for maintenance. However, tracing with Wireshark I found that even though I removed the dummy entry from /etc/hosts on Tuesday, the BOINC client was still trying to connect to the dummy host. I run nscd, but the same happened when I stopped it. When I restarted the client, my WUs uploaded fine. It seems the client caches host addresses (indefinitely?) This is a serious problem. I vaguely recall a year or two ago being unable to upload WUs for some project, I think after a host address change, until I restarted the client. |
Send message Joined: 6 Jun 06 Posts: 12 ![]() |
I'm running the current Ubuntu package, 5.10.8. Is this a known problem? I don't see anything like it at http://boinc.berkeley.edu/trac/query. |
![]() Send message Joined: 29 Aug 05 Posts: 147 |
I've just come back to this, because my LHC WUs from Monday were still not being uploaded. I wasn't overly concerned, because the LHC@HOME Web site is still down for maintenance. The problem is the library used caches the ip addresses in one mode and does even worse things in other modes. I know that this was being worked on, but I don't know the current status. ![]() BOINC WIKI |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.