Message boards : BOINC client : can't download new work units
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Apr 08 Posts: 22 ![]() |
I'm not sure this is the right place for this question, so if it isn't, if someone can redirect me, I'd be much obliged. I currently work on 6 projects. With the exception of lhc@home, up until about 48 hours ago as I write this, my client was happily downloading and managing work units. Now, all of my projects are drying up and when the client contacts the project, it asks for zero w/u. Since this is happening for five of the six projects, I sort of suspect the boinc client. I have double-checked the disk space parameters and as near as I can tell, there should be no holdup there. I first noticed this when climateprediction@home completed a run but wouldn't download another model. Then, one by one, einstein@home, milkyway@home and finally, setiathome all followed suit. orbit@home and lhc@home never have any work anyway, but at least orbit@home asks for work units. What should I be looking at? cheers bob graham |
Send message Joined: 5 Oct 06 Posts: 5142 ![]() |
What should I be looking at? One thing would be the time statistics for the computer. On any project website, towards the bottom of your computer details, you should see a block like: % of time BOINC client is running While BOINC running, % of time work is allowed Average CPU efficiency Task duration correction factor What figures do you see there? |
![]() Send message Joined: 29 Aug 05 Posts: 147 |
What are the <long_term_debt> values for each project? What is the connect ever X value (<work_buf_min_queue>). [edit] How many CPUs does the computer have? What is the remaining CPU time for each task? What project does each task belong to? ![]() BOINC WIKI |
Send message Joined: 19 Jan 07 Posts: 1179 ![]() |
I first noticed this when climateprediction@home completed a run but wouldn't download another model. That is a feature. Suppose you run CPDN along with project X. Project X has short workunits, and the deadlines are quite long enough. At some point, your computer gets in "deadline risk" with CPDN. If it continues getting work from X, it may delay the CPDN workunit too much and may not meet the deadline. So BOINC stops getting work from X. However, once CPDN is done, it stops getting work from it. Why? Because if it got another model, it would get into the same trouble *again*, and you would end up crunching a lot more CPDN than X (not following your resource shares). So BOINC stops getting work from CPDN until it computed enough of X to compensate. This is remembered by keeping "debts" for each project (John McLeod would explain the exact mechanism much better than me, if you're interested). So, maybe some of your current work is in deadline trouble (risk of not meeting deadline if it got more work), or maybe Orbit has a high debt (which makes the rest have a negative debt). |
Send message Joined: 13 May 06 Posts: 2 ![]() |
I first noticed this when climateprediction@home completed a run but wouldn't download another model. I have the same problem while running seti, einstein and rosetta. Has nothing to do with the particular projects because it did it when I had only seti and einstein. Clicking reset doesn't do anything. I detach the project that's stopped downloading and then reattach. Downloading starts immediately and everything is fine again until the next time. |
![]() Send message Joined: 29 Aug 05 Posts: 147 |
I first noticed this when climateprediction@home completed a run but wouldn't download another model. Before you do the detach / attach, what are the values for the long_term_debts for all of the projects? ![]() BOINC WIKI |
![]() Send message Joined: 29 Aug 05 Posts: 15585 ![]() |
To check Long Term Debt, use BOINCDV. Unzip it in your BOINC directory (or BOINC Data directory if you use BOINC 6), run it, then copy the contents to here. |
Send message Joined: 2 Sep 05 Posts: 103 ![]() |
The problem may be due to the way that LTD is accumulated for projects which are starved of work on the client and have no work available on the server. A project starts accumulating LTD as soon as its communication deferral timeout expires and continues to do so until you send a scheduler request asking for more work and the request times out or receives a no work scheduler reply. There are 2 conditions under which the effect is amplified:
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
![]() Send message Joined: 29 Aug 05 Posts: 147 |
The problem may be due to the way that LTD is accumulated for projects which are starved of work on the client and have no work available on the server. The basic problem is it cannot be known if the project is work starved or not. If the queue is less than what the user has asked for, more work is downloaded, even if the project from which it is downloaded must be one of those with negative LTD. This is true for the most recent clients. I would be really upset if the following happened: S@H does not have work because of a Tuesday outage. CPDN downloads a task and instantly goes into EDF for a year. The LTD does not change because S@H was out of work during the last contact. This is NOT reflecting my resource allocation at all. ![]() BOINC WIKI |
Send message Joined: 16 Apr 06 Posts: 386 ![]() |
... But if, for example, the Boinc client polled the other project weekly (in the same way WCG does), and each time it is out of work, then I think it would work quite well. There are a number of projects now which have been out of work for months or even over a year, and they're building up an incredible positive balance (which to my mind is just as bad as your example). |
Send message Joined: 19 Jan 07 Posts: 1179 ![]() |
But if, for example, the Boinc client polled the other project weekly (in the same way WCG does), and each time it is out of work, then I think it would work quite well. BOINC provides no way to "poll" a project like that. The only way for the client to know if a project has work, is by requesting work. |
![]() Send message Joined: 29 Aug 05 Posts: 147 |
But if, for example, the Boinc client polled the other project weekly (in the same way WCG does), and each time it is out of work, then I think it would work quite well. And if you get work, that is just added to the list of tasks that need to be done - whether there is time for it or not. Another point. If it is once a week, and you hit the S@H weekly outage every single time? ![]() BOINC WIKI |
Send message Joined: 16 Apr 06 Posts: 386 ![]() |
... The WCG project polls back every week, I don't know what mechanism they use ... (there are no workunits from WCG on my computer, and it is set to 'no more work'). |
![]() Send message Joined: 29 Aug 05 Posts: 147 |
... The poll is possible, but there is no method of discovering if there is currently work or has been work in the last week. This is also set up by the projects, and if the project does not indicate a need to check in on occasion, the BOINC client will not. Some projects have extremely overloaded DataBases, and do not want clients to phone in unless they are asking for more work, or reporting completed work (preferrably both at the same time). ![]() BOINC WIKI |
![]() Send message Joined: 29 Aug 05 Posts: 15585 ![]() |
The poll is possible, but there is no method of discovering if there is currently work or has been work in the last week. Another idea then, how about a maximum for the LTD in positive and negative numbers? Is that doable? |
![]() Send message Joined: 29 Aug 05 Posts: 147 |
The poll is possible, but there is no method of discovering if there is currently work or has been work in the last week. Not unless they are too large to be a meaningful cap. Think of running CPDN for a couple of years in EDF. ![]() BOINC WIKI |
![]() Send message Joined: 29 Aug 05 Posts: 15585 ![]() |
OK, when the maximum is reached, reset the LTD to near zero (say 1,000 either way). You just need to define the maximum, which isn't going to be easy. Perhaps going back to the polling part, when a project has no work, the message usually comes back as "no work from project". Isn't it possible to add these up and when it comes to a given amount, that the LTD counting stops as if the project is suspended or on NNT? Complications would be the amount of time you're deferred, but if those can be set at a standard 1 hour and you'd take a day as maximum, then after 24 times of "no work from project" the counting of LTD is frozen, until you get work again and the counter is reset. If a project is completely off line and doesn't even have a scheduler, what happens to the LTD then? |
![]() Send message Joined: 29 Aug 05 Posts: 147 |
OK, when the maximum is reached, reset the LTD to near zero (say 1,000 either way). You just need to define the maximum, which isn't going to be easy. You only get the "no work from project" message if you actually ask for work. In the situation we are talking about, asking for work is not what we want to do. Resetting LTD violates the long term resource shares you set. There is another group of people that would really like to have the LTD calculated in all cases, even if the project is not contactable for long periods of time. The most recent client will keep your queue full of work from projects below the cutoff if there is no work available from projects above the cutoff. ![]() BOINC WIKI |
![]() Send message Joined: 29 Aug 05 Posts: 304 ![]() |
Resetting LTD violates the long term resource shares you set. There is another group of people that would really like to have the LTD calculated in all cases, even if the project is not contactable for long periods of time. For example how will LHC, BURP, or AIS ever come close to getting it's correct share if the LTD is reset? All of those projects have intermittant work supplies and/or very restrictive limits on tasks in progress. I would love to add RALPH and SETI beta to that list. In my opinion any test project should only give work when they have something to test. BOINC WIKI ![]() ![]() BOINCing since 2002/12/8 |
![]() Send message Joined: 29 Aug 05 Posts: 15585 ![]() |
Back to polling then. I had this idea last night but didn't write it out. When BOINC polls a project that has work, it gets a null signal so the scheduler takes over and asks for work. When BOINC polls a project that doesn't have work, it gets a signal that there is no work (and sent a one (1)), no scheduler action after that. Would that by itself not increase LTD? Or would there still need to be an additional catch-all, of when signal 1 is received that LTD is frozen? I'm just brain-storming here. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.