Rev 33 | Rev 154 | Go to most recent revision | Show entire file | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed
| Rev 33 | Rev 108 | ||
|---|---|---|---|
| Line 2... | Line 2... | ||
| 2 | #include "data.h" |
2 | #include "data.h" |
| 3 | #include "epdglue.h" |
3 | #include "epdglue.h" |
| - | 4 | #if (CPUS > 1) |
|
| 4 | /* modified 04/ |
5 | /* modified 11/04/15 */ |
| 5 | /* |
6 | /* |
| 6 | ******************************************************************************* |
7 | ******************************************************************************* |
| 7 | * * |
8 | * * |
| 8 | * |
9 | * Split() is the driver for the threaded parallel search in Crafty. The * |
| 9 | * basic idea is that whenever we notice that one (or more) threads are in * |
10 | * basic idea is that whenever we notice that one (or more) threads are in * |
| 10 | * their idle loop, we drop into |
11 | * their idle loop, we drop into Split(), from Search(), and begin a new * |
| 11 | * parallel search from this node. This is simply a problem of |
12 | * parallel search from this node. This is simply a problem of establishing * |
| - | 13 | * a new split point, and then letting each thread join this split point and * |
|
| - | 14 | * copy whatever data they need. * |
|
| - | 15 | * * |
|
| - | 16 | * This is generation II of Split(). The primary differences address two * |
|
| - | 17 | * specific performance-robbing issues. (1) Excessive waiting for a split * |
|
| - | 18 | * to be done, and (b) excessive waiting on specific locks. Generation II * |
|
| - | 19 | * addresses both of these to significantly improve performance. * |
|
| - | 20 | * * |
|
| - | 21 | * The main difference between Gen I and Gen II is the effort required to * |
|
| 12 | * |
22 | * split the tree and which thread(s) expend this effort. In generation I, * |
| - | 23 | * the parent thread was responsible for allocating a split block for each * |
|
| - | 24 | * child thread, and then copying the necessary data from the parent split * |
|
| - | 25 | * block to these child split blocks. When all of this was completed, the * |
|
| - | 26 | * child processes were released to start the parallel search after being * |
|
| - | 27 | * held while the split / copy operations were done. In the generation II * |
|
| - | 28 | * Split() we now simply allocate a new split block for THIS thread, flag * |
|
| - | 29 | * the parent split block as joinable, and then go directly to ThreadWait() * |
|
| - | 30 | * which will drop us back in to the search immediately. The idle threads * |
|
| - | 31 | * are continually looping on Join() which will jump them right into this * |
|
| - | 32 | * split block letting them do ALL the work of allocating a split block, * |
|
| - | 33 | * filling it in, and then copying the data to their local split block. * |
|
| - | 34 | * This distributes the split overhead among all the threads that split, * |
|
| - | 35 | * rather than this thread having to do all the work while the other threads * |
|
| - | 36 | * sit idle. * |
|
| - | 37 | * * |
|
| - | 38 | * Generation II is also much more lightweight, in that it copies only the * |
|
| - | 39 | * bare minimum from parent to child. Generation I safely copied far too * |
|
| - | 40 | * much since this code was being changed regularly, but that is no longer * |
|
| - | 41 | * necessary overhead. * |
|
| - | 42 | * * |
|
| - | 43 | * Generation II has a zero contention split algorithm. In the past, when a * |
|
| - | 44 | * thread became idle, it posted a global split request and any thread that * |
|
| - | 45 | * was at an appropriate point would try to split. But while it was doing * |
|
| - | 46 | * the splitting, other threads that were also willing to split would "get * |
|
| - | 47 | * in line" because Crafty used a global lock to prevent two threads from * |
|
| - | 48 | * attempting to split at the same instant in time. They got in line, and * |
|
| - | 49 | * waited for the original splitter to release the lock, but now they have * |
|
| - | 50 | * no idle threads to split with. A waste of time. Now we allow ANY thread * |
|
| - | 51 | * to attempt to split at the current ply. When we do what might be called * |
|
| - | 52 | * a "gratuitous split" the only restriction is that if we have existing * |
|
| - | 53 | * "gratuitous split points" (split blocks that are joinable but have not * |
|
| - | 54 | * yet been joined), then we limit the number of such splits (per thread) to * |
|
| - | 55 | * avoid excessive overhead. * |
|
| - | 56 | * * |
|
| - | 57 | * Generation II takes another idea from DTS, the idea of "late-join". The * |
|
| - | 58 | * idea is fairly simple. If, when a thread becomes idle, there are already * |
|
| - | 59 | * other split points being searched in parallel, then we will try to join * |
|
| - | 60 | * one of them rather than waiting for someone to ask us to help. We use * |
|
| - | 61 | * some simple criteria: (1) The split point must be joinable, which simply * |
|
| - | 62 | * means that no processor has exited the split point yet (which would let * |
|
| - | 63 | * us know there is no more work here and a join would be futile); (2) We * |
|
| - | 64 | * compute an "interest" value which is a simple formula based on depth at * |
|
| - | 65 | * the split point, and the number of moves already searched. It seems less * |
|
| - | 66 | * risky to choose a split point with max depth AND minimum moves already * |
|
| 13 | * |
67 | * searched so that there is plenty to do. This was quite simple to add * |
| - | 68 | * after the rest of the Generation II rewrite. In fact, this is now THE * |
|
| - | 69 | * way threads join a split point, period, which further simplifies the code * |
|
| - | 70 | * and improves efficiency. IE, a thread can split when idle threads are * |
|
| - | 71 | * noticed, or if it is far enough away from the tips to make the cost * |
|
| - | 72 | * negligible. At that point any idle thread(s) can join immediately, those * |
|
| - | 73 | * that become idle later can join when they are ready. * |
|
| 14 | * * |
74 | * * |
| 15 | * There are a number of settable options via the command-line or .craftyrc * |
75 | * There are a number of settable options via the command-line or .craftyrc * |
| 16 | * initialization file. Here's a concise explanation for each option and an * |
76 | * initialization file. Here's a concise explanation for each option and an * |
| 17 | * occasional suggestion for testing/tuning. * |
77 | * occasional suggestion for testing/tuning. * |
| - | 78 | * * |
|
| - | 79 | * smp_affinity (command = smpaffinity=<n> is used to enable or disable * |
|
| - | 80 | * processor affinity. -1 disables affinity and lets threads run on any * |
|
| - | 81 | * available core. If you use an integer <n> then thread zero will bind * |
|
| - | 82 | * itself to cpu <n> and each additional thread will bind to the next * |
|
| - | 83 | * higher cpu number. This is useful if you try to run two copies of * |
|
| - | 84 | * crafty on the same machine, now you can cause one to bind to the first * |
|
| - | 85 | * <n> cores, and the second to the last <n> cores. For the first * |
|
| - | 86 | * instance of Crafty, you would use smpaffinity=0, and for the second * |
|
| - | 87 | * smpaffinity=8, assuming you are running 8 threads per copy on a 16 cpu * |
|
| - | 88 | * machine. If you get this wrong, you can have more than one thread on * |
|
| - | 89 | * the same cpu which will significantly impact performance. * |
|
| 18 | * * |
90 | * * |
| 19 | * smp_max_threads (command = smpmt=n) sets the total number of allowable * |
91 | * smp_max_threads (command = smpmt=n) sets the total number of allowable * |
| 20 | * threads for the search. The default is one (1) as Crafty does not * |
92 | * threads for the search. The default is one (1) as Crafty does not * |
| 21 | * assume it should use all available resources. For optimal performance * |
93 | * assume it should use all available resources. For optimal performance * |
| 22 | * this should be set to the number of physical cores your machine has, * |
94 | * this should be set to the number of physical cores your machine has, * |
| 23 | * which does NOT include hyperthreaded cores. * |
95 | * which does NOT include hyperthreaded cores. * |
| 24 | * * |
96 | * * |
| 25 | * smp_split_group (command = smpgroup=n) sets the maximum number of threads * |
97 | * smp_split_group (command = smpgroup=n) sets the maximum number of threads * |
| 26 | * at any single split point, with the exception of split points |
98 | * at any single split point, with the exception of split points fairly * |
| 27 | * |
99 | * close to the root where ALL threads are allowed to split together, * |
| - | 100 | * ignoring this limit. Note that this is ignored in the first 1/2 of * |
|
| - | 101 | * the tree (the nodes closer to the root). There it is actually good to * |
|
| 28 | * |
102 | * split and get all active threads involved. * |
| 29 | * * |
103 | * * |
| 30 | * smp_min_split_depth (command = smpmin=n) avoids splitting when remaining * |
104 | * smp_min_split_depth (command = smpmin=n) avoids splitting when remaining * |
| 31 | * depth < n. This is used to balance splitting overhead cost against * |
105 | * depth < n. This is used to balance splitting overhead cost against * |
| 32 | * the speed gain the parallel search |
106 | * the speed gain the parallel search produces. The default is currently * |
| 33 | * 5 (which could change with future generations of Intel hardware) but * |
107 | * 5 (which could change with future generations of Intel hardware) but * |
| 34 | * values between 4 and 8 will work. Larger values allow somewhat fewer * |
108 | * values between 4 and 8 will work. Larger values allow somewhat fewer * |
| 35 | * splits, which reduces overhead, but it also increases the percentage * |
109 | * splits, which reduces overhead, but it also increases the percentage * |
| 36 | * of the time where a thread is waiting on work. * |
110 | * of the time where a thread is waiting on work. * |
| 37 | * * |
111 | * * |
| 38 | * smp_split_nodes (command = smpnodes=n) will not allow a thread to split * |
- | |
| 39 | * until it has searched at least N nodes, to prevent excessive splitting * |
- | |
| 40 | * out near the tips in endgames. The default is 2000. Larger values * |
- | |
| 41 | * will reduce splitting overhead but will increase the amoutn of time * |
- | |
| 42 | * a thread is waiting on work. * |
- | |
| 43 | * * |
- | |
| 44 | * smp_split_at_root ( |
112 | * smp_split_at_root (command = smproot=0 or 1) enables (1) or disables (0) * |
| 45 | * splitting the tree at the root. This defaults to 1 which produces the * |
113 | * splitting the tree at the root. This defaults to 1 which produces the * |
| 46 | * best performance by a signficiant margin. But it can be disabled if * |
114 | * best performance by a signficiant margin. But it can be disabled if * |
| 47 | * you are playing with code changes. * |
115 | * you are playing with code changes. * |
| 48 | * * |
116 | * * |
| - | 117 | * smp_gratuitous_depth (command = smpgd=<n>) controls " gratuitous splits" * |
|
| - | 118 | * which are splits that are done without any idle threads. This sets a * |
|
| - | 119 | * depth limit (remaining depth) that must be present before such a split * |
|
| - | 120 | * can be done. Making this number larger will reduce the number of * |
|
| - | 121 | * these splits. Making it too small will increase overhead slightly and * |
|
| - | 122 | * increase split block usage significantly. * |
|
| - | 123 | * * |
|
| - | 124 | * smp_gratuitous_limit (command = smpgl=<n>) limits the number of these * |
|
| 49 | * |
125 | * splits that a thread can do. Once a thread has this number of * |
| - | 126 | * unjoined split points, it will not be allowed to split any more until * |
|
| 50 | * |
127 | * one or more threads join at least one of the existing split points. * |
| 51 | * |
128 | * In the smp search statistics, where you see output that looks like * |
| - | 129 | * this: * |
|
| - | 130 | * * |
|
| - | 131 | * splits=m(n) ... * |
|
| 52 | * |
132 | * * |
| 53 | * |
133 | * m is the total splits done, n is the number of "wasted splits" which * |
| 54 | * |
134 | * are basically gratuitous splits where no thread joined before this * |
| - | 135 | * split point was completed and deallocated. * |
|
| 55 | * |
136 | * * |
| - | 137 | * The best way to tune all of these paramenters is to use the "autotune" * |
|
| - | 138 | * command (see autotune.c and help autotune) which will automatically run * |
|
| - | 139 | * tests and optimize the parameters. More details are in the autotune.c * |
|
| - | 140 | * source file. * |
|
| 56 | * * |
141 | * * |
| 57 | * A few basic "rules of the road" for anyone interested in changing or * |
142 | * A few basic "rules of the road" for anyone interested in changing or * |
| 58 | * adding to any of this code. * |
143 | * adding to any of this code. * |
| 59 | * * |
144 | * * |
| 60 | * 1. If, at any time, you want to modify your private split block, no lock * |
145 | * 1. If, at any time, you want to modify your private split block, no lock * |
| 61 | * is required. * |
146 | * is required. * |
| 62 | * * |
147 | * * |
| 63 | * 2. If, at any time, you want to modify another split block, such as the * |
148 | * 2. If, at any time, you want to modify another split block, such as the * |
| 64 | * parent split block shared move list, you must acquire the lock in the * |
149 | * parent split block shared move list, you must acquire the lock in the * |
| 65 | * split block first. IE |
150 | * split block first. IE tree->parent->lock to lock the parent split * |
| 66 | * block during NextMove() and such |
151 | * block during NextMove() and such. * |
| 67 | * search-related data, NOT the SMP data within the split block (parent * |
- | |
| 68 | * pointer, sibling pointers, number of processors working here, etc). * |
- | |
| 69 | * Modifying those falls under the next lock below. * |
- | |
| 70 | * * |
152 | * * |
| 71 | * 3. If you want to modify any SMP-related |
153 | * 3. If you want to modify any SMP-related data that spans multiple split * |
| 72 | * |
154 | * blocks, such as telling sibling processes to stop, etc, then you must * |
| 73 | * acquire the global " |
155 | * acquire the global "lock_smp" lock first. This prevents a deadlock * |
| 74 | * |
156 | * caused by two different threads walking the split block chain from * |
| 75 | * |
157 | * different directions, and acquiring the split block locks in * |
| - | 158 | * different orders, which could cause a catastrophic deadlock to occur. * |
|
| 76 | * |
159 | * This is an infrequent event so the overhead is not significant. * |
| 77 | * * |
160 | * * |
| 78 | * 4. If you want to do any sort of I/O operation, you must acquire the * |
161 | * 4. If you want to do any sort of I/O operation, you must acquire the * |
| 79 | * "lock_io" lock first. Since threads share descriptors, there are * |
162 | * "lock_io" lock first. Since threads share descriptors, there are * |
| 80 | * lots of potential race conditions, from the simple tty-output getting * |
163 | * lots of potential race conditions, from the simple tty-output getting * |
| 81 | * interlaced from different threads, to outright data corruption in the * |
164 | * interlaced from different threads, to outright data corruption in the * |
| 82 | * book or log files. * |
165 | * book or log files. * |
| 83 | * * |
- | |
| 84 | * 5. There is a specific "lock_split" that is used to prevent a lot of * |
- | |
| 85 | * queued waiting to do splits, but it is unlikely that lock would ever * |
- | |
| 86 | * be used anywhere other than where it is currently accessed in * |
- | |
| 87 | * Search() at the parallel split initiation code. * |
- | |
| 88 | * * |
- | |
| 89 | * 6. If you want to alter the root move list, you must first acquire * |
- | |
| 90 | * lock_root, since the root move list is shared and multiple threads * |
- | |
| 91 | * can attempt to modify it at the same time. Overlooking this can * |
- | |
| 92 | * result in a corrupted root move list with one or more moves missing * |
- | |
| 93 | * or duplicated. * |
- | |
| 94 | * * |
166 | * * |
| 95 | * Some of the bugs caused by failing to acquire the correct lock will only * |
167 | * Some of the bugs caused by failing to acquire the correct lock will only * |
| 96 | * occur infrequently, and they are extremely difficult to find. Some only * |
168 | * occur infrequently, and they are extremely difficult to find. Some only * |
| 97 | * show up in a public game where everyone is watching, to cause maximum * |
169 | * show up in a public game where everyone is watching, to cause maximum * |
| 98 | * embarassment and causes the program to do something extremely stupid. * |
170 | * embarassment and causes the program to do something extremely stupid. * |
| 99 | * * |
171 | * * |
| 100 | ******************************************************************************* |
172 | ******************************************************************************* |
| 101 | */ |
173 | */ |
| 102 | int |
174 | int Split(TREE * RESTRICT tree) { |
| - | 175 | TREE *child; |
|
| 103 | int |
176 | int tid, tstart, tend; |
| 104 | 177 | ||
| 105 | /* |
178 | /* |
| 106 | ************************************************************ |
179 | ************************************************************ |
| 107 | * * |
180 | * * |
| 108 | * |
181 | * Here we prepare to split the tree. All we really do in * |
| - | 182 | * the Generation II threading is grab a split block for * |
|
| 109 | * |
183 | * this thread, then flag the parent as "joinable" and * |
| - | 184 | * then jump right to ThreadWait() to resume where we left * |
|
| - | 185 | * off, with the expectation (but not a requirement) that * |
|
| - | 186 | * other threads will join us to help. * |
|
| - | 187 | * * |
|
| - | 188 | * Idle threads are sitting in ThreadWait() repeatedly * |
|
| - | 189 | * calling Join() to find them a split point, which we are * |
|
| - | 190 | * fixing to provide. They will then join as quickly as * |
|
| 110 | * |
191 | * they can, and other threads that become idle later can * |
| - | 192 | * also join without any further splitting needed. * |
|
| - | 193 | * * |
|
| - | 194 | * If we are unable to allocate a split block, we simply * |
|
| - | 195 | * abort this attempted split and return to the search * |
|
| - | 196 | * since other threads will also split quickly. * |
|
| 111 | * * |
197 | * * |
| 112 | ************************************************************ |
198 | ************************************************************ |
| 113 | */ |
199 | */ |
| - | 200 | tstart = ReadClock(); |
|
| 114 |
|
201 | tree->nprocs = 0; |
| 115 | for ( |
202 | for (tid = 0; tid < (int) smp_max_threads; tid++) // Pierre-Marie Baty -- added type cast |
| 116 |
|
203 | tree->siblings[tid] = 0; |
| 117 |
|
204 | child = GetBlock(tree, tree->thread_id); |
| 118 |
|
205 | if (!child) |
| 119 | return 0; |
206 | return 0; |
| - | 207 | CopyFromParent(child); |
|
| - | 208 | thread[tree->thread_id].tree = child; |
|
| - | 209 | tree->joined = 0; |
|
| - | 210 | tree->joinable = 1; |
|
| - | 211 | parallel_splits++; |
|
| 120 |
|
212 | smp_split = 0; |
| - | 213 | tend = ReadClock(); |
|
| - | 214 | thread[tree->thread_id].idle += tend - tstart; |
|
| 121 | /* |
215 | /* |
| 122 | ************************************************************ |
216 | ************************************************************ |
| 123 | * * |
217 | * * |
| - | 218 | * We have successfully created a split point, which means * |
|
| 124 | * |
219 | * we are done. The instant we set the "joinable" flag, * |
| - | 220 | * idle threads may begin to join in at this split point * |
|
| - | 221 | * to help. Since this thread may finish before any or * |
|
| 125 | * |
222 | * all of the other parallel threads, this thread is sent * |
| 126 | * |
223 | * to ThreadWait() which will immediately send it to * |
| - | 224 | * SearchMoveList() like the other threads; however, going * |
|
| - | 225 | * to ThreadWait() allows this thread to join others if it * |
|
| - | 226 | * runs out of work to do. We do pass ThreadWait() the * |
|
| - | 227 | * address of the parent split block, so that if this * |
|
| 127 | * |
228 | * thread becomes idle, and this thread block shows no * |
| - | 229 | * threads are still busy, then this thread can return to * |
|
| - | 230 | * here and then back up into the previous ply as it * |
|
| 128 | * |
231 | * should. Note that no other thread can back up to the * |
| 129 | * |
232 | * previous ply since their recursive call stacks are not * |
| - | 233 | * set for that, while this call stack will bring us back * |
|
| - | 234 | * to this point where we return to the normal search, * |
|
| 130 | * |
235 | * which we just completed. * |
| 131 | * * |
236 | * * |
| - | 237 | ************************************************************ |
|
| - | 238 | */ |
|
| - | 239 | ThreadWait(tree->thread_id, tree); |
|
| - | 240 | if (!tree->joined) |
|
| - | 241 | parallel_splits_wasted++; |
|
| - | 242 | return 1; |
|
| - | 243 | } |
|
| - | 244 | ||
| - | 245 | /* modified 12/16/15 */ |
|
| - | 246 | /* |
|
| - | 247 | ******************************************************************************* |
|
| - | 248 | * * |
|
| - | 249 | * Join() is called just when we enter the usual spin-loop waiting for work. * |
|
| 132 | * |
250 | * We take a quick look at all active split blocks to see if any look * |
| - | 251 | * "joinable". If so, we compute an "interest" value, which will be defined * |
|
| - | 252 | * below. We then join the most interesting split point directly. This * |
|
| - | 253 | * split point might have been created specifically for this thread to join, * |
|
| - | 254 | * or it might be one that was already active when this thread became idle, * |
|
| 133 | * |
255 | * which allows us to join that existing split point and not request a new * |
| 134 | * |
256 | * split operation, saving time. * |
| - | 257 | * * |
|
| - | 258 | ******************************************************************************* |
|
| - | 259 | */ |
|
| - | 260 | int Join(int64_t tid) { |
|
| - | 261 | TREE *tree, *join_block, *child; |
|
| - | 262 | int interest, best_interest, current, pass = 0; |
|
| - | 263 | ||
| - | 264 | /* |
|
| - | 265 | ************************************************************ |
|
| 135 | * * |
266 | * * |
| 136 | * Special case 1: In the loop to allocate a split block * |
- | |
| 137 | * and copy current data to the new child thread, we skip * |
- | |
| 138 | * over the current splitting thread's data. After the * |
- | |
| 139 | * |
267 | * First we pass over ALL split blocks, looking for those * |
| 140 | * |
268 | * flagged as "joinable" (which means they are actually * |
| 141 | * |
269 | * active split points and that no processor at that split * |
| 142 | * up through the split block, so it will always call * |
- | |
| 143 | * ThreadWait() telling it to do so. There was a peculiar * |
- | |
| 144 | * condition that caused an inefficiency. If we are at * |
- | |
| 145 | * |
270 | * point has run out of work (there is no point in joining * |
| 146 | * |
271 | * a split point with no remaining work) and no fail high * |
| 147 | * |
272 | * has been found which would raise the "stop" flag.) This * |
| 148 | * |
273 | * is "racy" because we do not acquire any locks, which * |
| 149 | * |
274 | * means that the busy threads continue working, and there * |
| 150 | * |
275 | * is a small probability that the split point will * |
| 151 | * |
276 | * disappear while we are in this loop. To resolve the * |
| 152 | * |
277 | * potential race, after we find the most attractive split * |
| 153 | * |
278 | * point, we acquire the lock for that split block and * |
| 154 | * |
279 | * test again, but this time if the block is joinable, we * |
| 155 | * |
280 | * can safely join under control of the lock, which is not * |
| 156 | * |
281 | * held for very long at all. If the block is not * |
| 157 | * |
282 | * joinable once we acquire the lock, we abort joining * |
| 158 | * |
283 | * since it is futile. Note that if this happens, we will * |
| 159 | * |
284 | * try to find an existing split point we can join three * |
| 160 | * |
285 | * times before we exit, setting split to 1 to ask other * |
| 161 | * |
286 | * threads to produce more candidate split points. * |
| 162 | * * |
287 | * * |
| - | 288 | ************************************************************ |
|
| - | 289 | */ |
|
| - | 290 | for (pass = 0; pass < 3; pass++) { |
|
| - | 291 | best_interest = -999999; |
|
| - | 292 | join_block = 0; |
|
| - | 293 | for (current = 0; current <= (int) smp_max_threads * 64; current++) { // Pierre-Marie Baty -- added type cast |
|
| - | 294 | tree = block[current]; |
|
| - | 295 | if (tree->joinable && (tree->ply <= tree->depth / 2 || |
|
| - | 296 | tree->nprocs < (int) smp_split_group)) { // Pierre-Marie Baty -- added type cast |
|
| - | 297 | interest = tree->depth * 2 - tree->searched[0]; |
|
| - | 298 | if (interest > best_interest) { |
|
| - | 299 | best_interest = interest; |
|
| - | 300 | join_block = tree; |
|
| - | 301 | } |
|
| - | 302 | } |
|
| - | 303 | } |
|
| - | 304 | /* |
|
| - | 305 | ************************************************************ |
|
| - | 306 | * * |
|
| - | 307 | * Now we acquire the lock for this split block, and then * |
|
| 163 | * |
308 | * check to see if the block is still flagged as joinable. * |
| - | 309 | * If so, we set things up, and then we get pretty tricky * |
|
| 164 | * |
310 | * as we then release the lock, and then copy the data * |
| 165 | * |
311 | * from the parent to our split block. There is a chance * |
| 166 | * |
312 | * that while we are copying this data, the split point * |
| - | 313 | * gets completed by other threads. Which would leave an * |
|
| - | 314 | * apparent race condition exposed where we start copying * |
|
| 167 | * |
315 | * data here, the split point is completed, the parent * |
| 168 | * |
316 | * block is released and then reacquired and we continue * |
| - | 317 | * if nothing has happened here, getting data copied from * |
|
| 169 | * |
318 | * two different positions. * |
| 170 | * |
319 | * * |
| - | 320 | * Fortunately, we linked this new split block to the old * |
|
| - | 321 | * (original parent). If that split block is released, we * |
|
| - | 322 | * will discover this because that operation will also set * |
|
| - | 323 | * our "stop" flag which will prevent us from using this * |
|
| - | 324 | * data and breaking things. We allow threads to copy * |
|
| - | 325 | * this data without any lock protection to eliminate a * |
|
| - | 326 | * serialization (each node would copy the data serially, * |
|
| - | 327 | * rather than all at once) with the only consequence to * |
|
| - | 328 | * this being the overhead of copying and then throwing * |
|
| - | 329 | * the data away, which can happen on occasion even if we * |
|
| - | 330 | * used a lock for protection, since once we release the * |
|
| - | 331 | * lock it still takes time to get into the search and we * |
|
| - | 332 | * could STILL find that this split block has already been * |
|
| - | 333 | * completed, once again. Less contention and serial * |
|
| - | 334 | * computing improves performance. * |
|
| 171 | * * |
335 | * * |
| 172 | ************************************************************ |
336 | ************************************************************ |
| 173 | */ |
337 | */ |
| - | 338 | if (join_block) { |
|
| - | 339 | Lock(join_block->lock); |
|
| - | 340 | if (join_block->joinable) { |
|
| - | 341 | child = GetBlock(join_block, (int) tid); // Pierre-Marie Baty -- added type cast |
|
| 174 |
|
342 | Unlock(join_block->lock); |
| 175 |
|
343 | if (child) { |
| 176 |
|
344 | CopyFromParent(child); |
| 177 |
|
345 | thread[tid].tree = child; |
| 178 |
|
346 | parallel_joins++; |
| 179 |
|
347 | return 1; |
| - | 348 | } |
|
| 180 |
|
349 | } else { |
| 181 |
|
350 | Unlock(join_block->lock); |
| 182 | break; |
351 | break; |
| - | 352 | } |
|
| 183 | } |
353 | } |
| 184 | } |
354 | } |
| 185 | CopyToChild(tree, tree->thread_id); |
- | |
| 186 | nblocks++; |
- | |
| 187 | /* |
355 | /* |
| 188 | ************************************************************ |
356 | ************************************************************ |
| 189 | * * |
357 | * * |
| 190 | * |
358 | * We did not acquire a split point to join, so we set * |
| 191 | * |
359 | * smp_split to 1 to ask busy threads to create joinable * |
| 192 | * threads can begin the parallel search. We also flag * |
- | |
| 193 | * them as "not idle" so that a split immediately after we * |
- | |
| 194 | * |
360 | * split points. * |
| 195 | * * |
361 | * * |
| 196 | ************************************************************ |
362 | ************************************************************ |
| 197 | */ |
363 | */ |
| 198 |
|
364 | smp_split = 1; |
| - | 365 | return 0; |
|
| - | 366 | } |
|
| - | 367 | ||
| - | 368 | /* modified 11/04/15 */ |
|
| - | 369 | /* |
|
| - | 370 | ******************************************************************************* |
|
| - | 371 | * * |
|
| - | 372 | * ThreadInit() is called after a process is created. Its main task is to * |
|
| - | 373 | * initialize the process local memory so that it will fault in and be * |
|
| - | 374 | * allocated on the local node rather than the node where the original * |
|
| - | 375 | * (first) process was running. All threads will hang here via a custom * |
|
| - | 376 | * WaitForALlThreadsInitialized() procedure so that all the local thread * |
|
| - | 377 | * blocks are usable before the search actually begins. * |
|
| - | 378 | * * |
|
| - | 379 | ******************************************************************************* |
|
| - | 380 | */ |
|
| - | 381 | void *STDCALL ThreadInit(void *t) { |
|
| - | 382 | int tid = (int64_t) t; |
|
| - | 383 | ||
| - | 384 | ThreadAffinity(tid); |
|
| - | 385 | # if !defined(UNIX) |
|
| - | 386 | ThreadMalloc((uint64_t) tid); |
|
| - | 387 | # endif |
|
| 199 |
|
388 | thread[tid].blocks = 0xffffffffffffffffull; |
| - | 389 | Lock(lock_smp); |
|
| - | 390 | initialized_threads++; |
|
| - | 391 | Unlock(lock_smp); |
|
| - | 392 | WaitForAllThreadsInitialized(); |
|
| - | 393 | ThreadWait(tid, (TREE *) 0); |
|
| - | 394 | Lock(lock_smp); |
|
| - | 395 | smp_threads--; |
|
| - | 396 | Unlock(lock_smp); |
|
| - | 397 | return 0; |
|
| - | 398 | } |
|
| - | 399 | ||
| - | 400 | /* modified 01/08/15 */ |
|
| - | 401 | /* |
|
| - | 402 | ******************************************************************************* |
|
| - | 403 | * * |
|
| - | 404 | * ThreadAffinity() is called to "pin" a thread to a specific processor. It * |
|
| - | 405 | * is a "noop" (no-operation) if Crafty was not compiled with -DAFFINITY, or * |
|
| - | 406 | * if smp_affinity is negative (smpaffinity=-1 disables affinity). It * |
|
| - | 407 | * simply sets the affinity for the current thread to the requested CPU and * |
|
| - | 408 | * returns. NOTE: If hyperthreading is enabled, there is no guarantee that * |
|
| - | 409 | * this will work as expected and pin one thread per physical core. It * |
|
| - | 410 | * depends on how the O/S numbers the SMT cores. * |
|
| - | 411 | * * |
|
| - | 412 | ******************************************************************************* |
|
| - | 413 | */ |
|
| 200 |
|
414 | void ThreadAffinity(int cpu) { |
| - | 415 | # if defined(AFFINITY) |
|
| - | 416 | cpu_set_t cpuset; |
|
| 201 |
|
417 | pthread_t current_thread = pthread_self(); |
| - | 418 | ||
| 202 |
|
419 | if (smp_affinity >= 0) { |
| - | 420 | CPU_ZERO(&cpuset); |
|
| - | 421 | CPU_SET(cpu + smp_affinity, &cpuset); |
|
| - | 422 | pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset); |
|
| 203 |
|
423 | } |
| - | 424 | # endif |
|
| - | 425 | } |
|
| - | 426 | ||
| - | 427 | /* modified 11/04/15 */ |
|
| - | 428 | /* |
|
| - | 429 | ******************************************************************************* |
|
| - | 430 | * * |
|
| - | 431 | * ThreadSplit() is used to determine if we should split at the current ply. * |
|
| - | 432 | * There are some basic constraints on when splits can be done, such as the * |
|
| - | 433 | * depth remaining in the search (don't split to near the tips), and have we * |
|
| - | 434 | * searched at least one move to get a score or bound (YBW condition). * |
|
| - | 435 | * * |
|
| - | 436 | * If those conditions are satisfied, AND either a thread has requested a * |
|
| - | 437 | * split OR we are far enough away from the tips of the tree to justify a * |
|
| - | 438 | * "gratuitout split" then we return "success." A "gratuitout split" is a * |
|
| - | 439 | * split done without any idle threads. Since splits are not free, we only * |
|
| - | 440 | * do this well away from tips to limit overhead. We do this so that when a * |
|
| - | 441 | * thread becomes idle, it will find these split points immediately and not * |
|
| - | 442 | * have to wait for a split after the fact. * |
|
| - | 443 | * * |
|
| - | 444 | ******************************************************************************* |
|
| - | 445 | */ |
|
| - | 446 | int ThreadSplit(TREE *RESTRICT tree, int ply, int depth, int alpha, int o_alpha, // Pierre-Marie Baty -- missing keyword |
|
| - | 447 | int done) { |
|
| - | 448 | TREE *used; |
|
| - | 449 | int64_t tblocks; |
|
| - | 450 | int temp, unused = 0; |
|
| - | 451 | ||
| 204 | /* |
452 | /* |
| 205 | ************************************************************ |
453 | ************************************************************ |
| 206 | * * |
454 | * * |
| 207 | * One final test before leaving. We test all threads to * |
- | |
| 208 | * |
455 | * First, we see if we meet the basic criteria to create a * |
| 209 | * smp_split_group limit, but a thread could go idle after * |
- | |
| 210 | * our loop to pick up idle threads). If so we set split * |
- | |
| 211 | * back to one (actually a temporary split variable since * |
- | |
| 212 | * |
456 | * split point, that being that we must not be too far * |
| 213 | * |
457 | * from the root (smp_min_split_depth). * |
| 214 | * * |
458 | * * |
| 215 | ************************************************************ |
459 | ************************************************************ |
| 216 | */ |
460 | */ |
| 217 |
|
461 | if (depth < (int) smp_min_split_depth) // Pierre-Marie Baty -- added type cast |
| 218 | if (thread[proc].idle) |
- | |
| 219 |
|
462 | return 0; |
| 220 | /* |
463 | /* |
| 221 | ************************************************************ |
464 | ************************************************************ |
| 222 | * * |
465 | * * |
| 223 | * |
466 | * If smp_split is NOT set, we are checking to see if it * |
| 224 | * |
467 | * is acceptable to do a gratuitous split here. * |
| 225 | * |
468 | * * |
| 226 | * |
469 | * (1) if we are too far from the root we do not do * |
| 227 | * |
470 | * gratuitous splits to avoid the overhead. * |
| 228 | * |
471 | * * |
| 229 | * |
472 | * (2) if we have searched more than one move at this ply, * |
| 230 | * |
473 | * we don't do any further tests to see if a * |
| 231 | * |
474 | * gratuitous split is acceptable, since we have * |
| 232 | * |
475 | * previously done this test at this ply and decided * |
| 233 | * |
476 | * one should not be done. That condition has likely * |
| 234 | * |
477 | * not changed. * |
| - | 478 | * * |
|
| - | 479 | * (3) if we have pre-existing gratuitous split points for * |
|
| 235 | * |
480 | * this thread, we make certain we don't create more * |
| 236 | * |
481 | * than the gratuitous split limit as excessive splits * |
| 237 | * |
482 | * just add to the overhead with no benefit. * |
| 238 | * * |
483 | * * |
| 239 | ************************************************************ |
484 | ************************************************************ |
| 240 | */ |
485 | */ |
| 241 |
|
486 | if (!smp_split) { |
| - | 487 | if (depth < (int) smp_gratuitous_depth || done > 1) // Pierre-Marie Baty -- added type cast |
|
| - | 488 | return 0; |
|
| - | 489 | tblocks = ~thread[tree->thread_id].blocks; |
|
| 242 |
|
490 | while (tblocks) { |
| - | 491 | temp = LSB(tblocks); |
|
| - | 492 | used = block[temp + tree->thread_id * 64 + 1]; |
|
| - | 493 | if (used->joinable && !used->joined) |
|
| - | 494 | unused++; |
|
| - | 495 | Clear(temp, tblocks); |
|
| - | 496 | } |
|
| - | 497 | if (unused > (int) smp_gratuitous_limit) // Pierre-Marie Baty -- added type cast |
|
| - | 498 | return 0; |
|
| - | 499 | } |
|
| - | 500 | /* |
|
| - | 501 | ************************************************************ |
|
| - | 502 | * * |
|
| - | 503 | * If smp_split IS set, we are checking to see if it is * |
|
| - | 504 | * acceptable to do a split because there are idle threads * |
|
| - | 505 | * that need work to do. * |
|
| - | 506 | * * |
|
| - | 507 | * The only reason this would be false is if we have a * |
|
| - | 508 | * pre-existing split point that is joinable but has not * |
|
| - | 509 | * been joined. If one exists, there is no need to split * |
|
| - | 510 | * again as there is already an accessible split point. * |
|
| - | 511 | * Otherwise, if we are at the root and we are either not * |
|
| - | 512 | * allowed to split at the root, or we have additional * |
|
| - | 513 | * root moves that have to be searched one at a time using * |
|
| - | 514 | * all available threads we also can not split here. * |
|
| - | 515 | * * |
|
| - | 516 | ************************************************************ |
|
| - | 517 | */ |
|
| - | 518 | else { |
|
| - | 519 | if (ply == 1 && (!smp_split_at_root || !NextRootMoveParallel() || |
|
| - | 520 | alpha == o_alpha)) |
|
| - | 521 | return 0; |
|
| 243 |
|
522 | tblocks = ~thread[tree->thread_id].blocks; |
| - | 523 | while (tblocks) { |
|
| - | 524 | temp = LSB(tblocks); |
|
| - | 525 | used = block[temp + tree->thread_id * 64 + 1]; |
|
| - | 526 | if (used->joinable && !used->joined) |
|
| - | 527 | unused++; |
|
| - | 528 | Clear(temp, tblocks); |
|
| - | 529 | } |
|
| - | 530 | if (unused > (int) smp_gratuitous_limit) // Pierre-Marie Baty -- added type cast |
|
| - | 531 | return 0; |
|
| - | 532 | } |
|
| 244 | return 1; |
533 | return 1; |
| 245 | } |
534 | } |
| 246 | 535 | ||
| 247 | /* modified 04/ |
536 | /* modified 11/04/15 */ |
| 248 | /* |
537 | /* |
| 249 | ******************************************************************************* |
538 | ******************************************************************************* |
| 250 | * * |
539 | * * |
| 251 | * |
540 | * ThreadStop() is called from SearchMoveList() when it detects a beta * |
| - | 541 | * cutoff (fail high) at a node that is being searched in parallel. We need * |
|
| - | 542 | * to stop all threads here, and since this threading algorithm is recursive * |
|
| - | 543 | * it may be necessary to stop other threads that are helping search this * |
|
| - | 544 | * branch further down into the tree. This function simply sets appropriate * |
|
| 252 | * |
545 | * tree->stop variables to 1, which will stop those particular threads * |
| 253 | * |
546 | * instantly and return them to the idle loop in ThreadWait(). * |
| 254 | * * |
547 | * * |
| 255 | ******************************************************************************* |
548 | ******************************************************************************* |
| 256 | */ |
549 | */ |
| - | 550 | void ThreadStop(TREE * RESTRICT tree) { |
|
| - | 551 | int proc; |
|
| - | 552 | ||
| - | 553 | Lock(tree->lock); |
|
| - | 554 | tree->stop = 1; |
|
| - | 555 | tree->joinable = 0; |
|
| - | 556 | for (proc = 0; proc < (int) smp_max_threads; proc++) // Pierre-Marie Baty -- added type cast |
|
| - | 557 | if (tree->siblings[proc]) |
|
| - | 558 | ThreadStop(tree->siblings[proc]); |
|
| - | 559 | Unlock(tree->lock); |
|
| - | 560 | } |
|
| - | 561 | ||
| - | 562 | /* modified 11/04/15 */ |
|
| - | 563 | /* |
|
| - | 564 | ******************************************************************************* |
|
| - | 565 | * * |
|
| - | 566 | * ThreadTrace() is a debugging tool that simply walks the split block tree * |
|
| - | 567 | * and displays interesting data to help debug the parallel search whenever * |
|
| - | 568 | * changes break things. * |
|
| - | 569 | * * |
|
| - | 570 | ******************************************************************************* |
|
| - | 571 | */ |
|
| 257 | void |
572 | void ThreadTrace(TREE * RESTRICT tree, int depth, int brief) { |
| - | 573 | int proc, i; |
|
| - | 574 | ||
| - | 575 | Lock(tree->lock); |
|
| - | 576 | Lock(lock_io); |
|
| - | 577 | if (!brief) { |
|
| - | 578 | for (i = 0; i < 4 * depth; i++) |
|
| - | 579 | Print(4095, " "); |
|
| - | 580 | depth++; |
|
| - | 581 | Print(4095, "block[%d] thread=%d ply=%d nprocs=%d ", |
|
| - | 582 | FindBlockID(tree), tree->thread_id, tree->ply, tree->nprocs); |
|
| - | 583 | Print(4095, "joined=%d joinable=%d stop=%d nodes=%d", tree->joined, |
|
| - | 584 | tree->joinable, tree->stop, tree->nodes_searched); |
|
| - | 585 | Print(4095, " parent=%d\n", FindBlockID(tree->parent)); |
|
| - | 586 | } else { |
|
| - | 587 | if (tree->nprocs > 1) { |
|
| - | 588 | for (i = 0; i < 4 * depth; i++) |
|
| - | 589 | Print(4095, " "); |
|
| - | 590 | depth++; |
|
| 258 |
|
591 | Print(4095, "(ply %d)", tree->ply); |
| - | 592 | } |
|
| - | 593 | } |
|
| - | 594 | if (tree->nprocs) { |
|
| - | 595 | if (!brief) { |
|
| - | 596 | for (i = 0; i < 4 * depth; i++) |
|
| - | 597 | Print(4095, " "); |
|
| - | 598 | Print(4095, " parent=%d sibling threads=", |
|
| - | 599 | FindBlockID(tree->parent)); |
|
| 259 |
|
600 | for (proc = 0; proc < (int) smp_max_threads; proc++) // Pierre-Marie Baty -- added type cast |
| - | 601 | if (tree->siblings[proc]) |
|
| - | 602 | Print(4095, " %d(%d)", proc, FindBlockID(tree->siblings[proc])); |
|
| - | 603 | Print(4095, "\n"); |
|
| 260 |
|
604 | } else { |
| 261 |
|
605 | if (tree->nprocs > 1) { |
| - | 606 | Print(4095, " helping= "); |
|
| - | 607 | for (proc = 0; proc < (int) smp_max_threads; proc++) // Pierre-Marie Baty -- added type cast |
|
| - | 608 | if (tree->siblings[proc]) { |
|
| 262 |
|
609 | if (proc == tree->thread_id) |
| - | 610 | Print(4095, "["); |
|
| - | 611 | Print(4095, "%d", proc); |
|
| 263 |
|
612 | if (proc == tree->thread_id) |
| - | 613 | Print(4095, "]"); |
|
| - | 614 | Print(4095, " "); |
|
| - | 615 | } |
|
| - | 616 | Print(4095, "\n"); |
|
| - | 617 | } |
|
| - | 618 | } |
|
| - | 619 | } |
|
| - | 620 | Unlock(lock_io); |
|
| - | 621 | for (proc = 0; proc < (int) smp_max_threads; proc++) // Pierre-Marie Baty -- added type cast |
|
| - | 622 | if (tree->siblings[proc]) |
|
| - | 623 | ThreadTrace(tree->siblings[proc], depth, brief); |
|
| - | 624 | Unlock(tree->lock); |
|
| - | 625 | } |
|
| - | 626 | ||
| - | 627 | /* modified 11/04/15 */ |
|
| - | 628 | /* |
|
| - | 629 | ******************************************************************************* |
|
| - | 630 | * * |
|
| - | 631 | * ThreadWait() is the idle loop for the N threads that are created at the * |
|
| - | 632 | * beginning when Crafty searches. Threads are "parked" here waiting on a * |
|
| - | 633 | * pointer to something they should search (a parameter block built in the * |
|
| - | 634 | * function Split() in this case. When this pointer becomes non-zero, each * |
|
| - | 635 | * thread "parked" here will immediately call SearchMoveList() and begin the * |
|
| - | 636 | * parallel search as directed. * |
|
| - | 637 | * * |
|
| - | 638 | * Upon entry, all threads except for the "master" will arrive here with a * |
|
| - | 639 | * value of zero (0) in the waiting parameter below. This indicates that * |
|
| - | 640 | * they will search and them be done. The "master" will arrive here with a * |
|
| - | 641 | * pointer to the parent split block in "waiting" which says I will sit here * |
|
| - | 642 | * waiting for work OR when the waiting split block has no threads working * |
|
| - | 643 | * on it, at which point I should return which will let me "unsplit" here * |
|
| - | 644 | * and clean things up. The call to here in Split() passes this block * |
|
| - | 645 | * address while threads that are helping get here with a zero. * |
|
| - | 646 | * * |
|
| - | 647 | ******************************************************************************* |
|
| - | 648 | */ |
|
| 264 |
|
649 | int ThreadWait(int tid, TREE * RESTRICT waiting) { |
| - | 650 | int value, tstart, tend; |
|
| 265 | 651 | ||
| 266 | /* |
652 | /* |
| 267 | ************************************************************ |
653 | ************************************************************ |
| 268 | * * |
654 | * * |
| 269 | * |
655 | * When we reach this point, one of three possible * |
| 270 | * |
656 | * conditions is true (1) we already have work to do, as * |
| 271 | * |
657 | * we are the "master thread" and we have already split * |
| 272 | * |
658 | * the tree, we are coming here to join in; (2) we are * |
| 273 | * |
659 | * the master, and we are waiting on our split point to * |
| 274 | * |
660 | * complete, so we come here to join and help currently * |
| - | 661 | * active threads; (3) we have no work to do, so we will * |
|
| - | 662 | * spin until Join() locates a split pont we can join to * |
|
| - | 663 | * help out. * |
|
| - | 664 | * * |
|
| 275 | * |
665 | * Note that when we get here, the parent already has a * |
| - | 666 | * split block and does not need to call Join(), it simply * |
|
| 276 | * |
667 | * falls through the while spin loop below because its * |
| 277 | * |
668 | * "tree" pointer is already non-zero. * |
| 278 | * * |
669 | * * |
| 279 | ************************************************************ |
670 | ************************************************************ |
| 280 | */ |
671 | */ |
| - | 672 | while (1) { |
|
| 281 |
|
673 | tstart = ReadClock(); |
| - | 674 | while (!thread[tid].tree && (!waiting || waiting->nprocs) && !Join(tid) && |
|
| - | 675 | !thread[tid].terminate) |
|
| 282 |
|
676 | Pause(); |
| - | 677 | tend = ReadClock(); |
|
| 283 | if ( |
678 | if (!thread[tid].tree) |
| - | 679 | thread[tid].tree = waiting; |
|
| - | 680 | thread[tid].idle += tend - tstart; |
|
| - | 681 | if (thread[tid].tree == waiting || thread[tid].terminate) |
|
| 284 |
|
682 | return 0; |
| - | 683 | /* |
|
| - | 684 | ************************************************************ |
|
| - | 685 | * * |
|
| - | 686 | * Once we get here, we have a good split point, so we are * |
|
| - | 687 | * ready to participate in a parallel search. Once we * |
|
| - | 688 | * return from SearchMoveList() we copy our results back * |
|
| - | 689 | * to the parent via CopyToParent() before we look for a * |
|
| - | 690 | * new split point. If we are a parent, we will slip out * |
|
| 285 |
|
691 | * of the spin loop at the top and return to the normal * |
| - | 692 | * serial search to finish up here. * |
|
| - | 693 | * * |
|
| - | 694 | * When we return from SearchMoveList(), we need to * |
|
| - | 695 | * decrement the "nprocs" value since there is now one * |
|
| - | 696 | * less thread working at this split point. * |
|
| - | 697 | * * |
|
| - | 698 | * Note: CopyToParent() marks the current split block as * |
|
| - | 699 | * unused once the copy is completed, so we don't have to * |
|
| - | 700 | * do anything about that here. * |
|
| - | 701 | * * |
|
| - | 702 | ************************************************************ |
|
| - | 703 | */ |
|
| 286 |
|
704 | value = |
| - | 705 | SearchMoveList(thread[tid].tree, thread[tid].tree->ply, |
|
| - | 706 | thread[tid].tree->depth, thread[tid].tree->wtm, |
|
| 287 |
|
707 | thread[tid].tree->alpha, thread[tid].tree->beta, |
| - | 708 | thread[tid].tree->searched, thread[tid].tree->in_check, 0, parallel); |
|
| 288 |
|
709 | tstart = ReadClock(); |
| - | 710 | Lock(thread[tid].tree->parent->lock); |
|
| - | 711 | thread[tid].tree->parent->joinable = 0; |
|
| 289 |
|
712 | CopyToParent((TREE *) thread[tid].tree->parent, thread[tid].tree, value); |
| - | 713 | thread[tid].tree->parent->nprocs--; |
|
| - | 714 | thread[tid].tree->parent->siblings[tid] = 0; |
|
| - | 715 | Unlock(thread[tid].tree->parent->lock); |
|
| 290 |
|
716 | thread[tid].tree = 0; |
| 291 | |
717 | tend = ReadClock(); |
| - | 718 | thread[tid].idle += tend - tstart; |
|
| 292 | } |
719 | } |
| - | 720 | } |
|
| - | 721 | ||
| 293 |
|
722 | /* modified 11/04/15 */ |
| - | 723 | /* |
|
| - | 724 | ******************************************************************************* |
|
| - | 725 | * * |
|
| - | 726 | * CopyFromParent() is used to copy data from a parent thread to a child * |
|
| - | 727 | * thread. This only copies the appropriate parts of the TREE structure to * |
|
| - | 728 | * avoid burning memory bandwidth by copying everything. * |
|
| - | 729 | * * |
|
| - | 730 | ******************************************************************************* |
|
| - | 731 | */ |
|
| 294 |
|
732 | void CopyFromParent(TREE * RESTRICT child) { |
| 295 |
|
733 | TREE *parent = child->parent; |
| 296 |
|
734 | int i, ply; |
| 297 | max_split_blocks = Max(max_split_blocks, max_); |
- | |
| - | 735 | ||
| 298 | /* |
736 | /* |
| 299 | ************************************************************ |
737 | ************************************************************ |
| 300 | * * |
738 | * * |
| 301 | * We have allocated a split block. Now we copy the tree * |
739 | * We have allocated a split block. Now we copy the tree * |
| 302 | * search state from the parent block to the child in * |
740 | * search state from the parent block to the child in * |
| 303 | * preparation for starting the parallel search. * |
741 | * preparation for starting the parallel search. * |
| 304 | * * |
742 | * * |
| 305 | ************************************************************ |
743 | ************************************************************ |
| 306 | */ |
744 | */ |
| 307 |
|
745 | ply = parent->ply; |
| 308 | child->used = 1; |
- | |
| 309 | child->stop = 0; |
- | |
| 310 | for (i = 0; i < smp_max_threads; i++) |
- | |
| 311 | child->siblings[i] = 0; |
- | |
| 312 | child->nprocs = 0; |
- | |
| 313 | child->ply = ply; |
746 | child->ply = ply; |
| 314 | child->position = parent->position; |
747 | child->position = parent->position; |
| 315 | child->pv[ply - 1] = parent->pv[ply - 1]; |
- | |
| 316 | child->pv[ply] = parent->pv[ply]; |
- | |
| 317 | child->next_status[ply] = parent->next_status[ply]; |
- | |
| 318 | child->save_hash_key[ply] = parent->save_hash_key[ply]; |
- | |
| 319 | child->save_pawn_hash_key[ply] = parent->save_pawn_hash_key[ply]; |
- | |
| 320 | child->rep_index = parent->rep_index; |
- | |
| 321 | for (i = 0; i <= |
748 | for (i = 0; i <= rep_index + parent->ply; i++) |
| 322 | child->rep_list[i] = parent->rep_list[i]; |
749 | child->rep_list[i] = parent->rep_list[i]; |
| 323 |
|
750 | for (i = ply - 1; i < MAXPLY; i++) |
| 324 | child-> |
751 | child->killers[i] = parent->killers[i]; |
| 325 | for (i = 1; i <= ply |
752 | for (i = ply - 1; i <= ply; i++) { |
| 326 | child->status[i] = parent->status[i]; |
- | |
| 327 | child->curmv[i] = parent->curmv[i]; |
753 | child->curmv[i] = parent->curmv[i]; |
| 328 | child->inchk[i] = parent->inchk[i]; |
- | |
| 329 | child-> |
754 | child->pv[i] = parent->pv[i]; |
| 330 | } |
755 | } |
| 331 |
|
756 | child->in_check = parent->in_check; |
| - | 757 | child->last[ply] = child->move_list; |
|
| - | 758 | child->status[ply] = parent->status[ply]; |
|
| 332 |
|
759 | child->status[1] = parent->status[1]; |
| - | 760 | child->save_hash_key[ply] = parent->save_hash_key[ply]; |
|
| - | 761 | child->save_pawn_hash_key[ply] = parent->save_pawn_hash_key[ply]; |
|
| 333 | child->nodes_searched = 0; |
762 | child->nodes_searched = 0; |
| 334 | child->fail_highs = 0; |
763 | child->fail_highs = 0; |
| 335 | child->fail_high_first_move = 0; |
764 | child->fail_high_first_move = 0; |
| 336 | child->evaluations = 0; |
765 | child->evaluations = 0; |
| 337 | child->egtb_probes = 0; |
766 | child->egtb_probes = 0; |
| 338 | child-> |
767 | child->egtb_hits = 0; |
| 339 | child->extensions_done = 0; |
768 | child->extensions_done = 0; |
| 340 | child->qchecks_done = 0; |
769 | child->qchecks_done = 0; |
| 341 | child->reductions_done = 0; |
- | |
| 342 | child->moves_fpruned = 0; |
770 | child->moves_fpruned = 0; |
| - | 771 | child->moves_mpruned = 0; |
|
| - | 772 | for (i = 0; i < 16; i++) { |
|
| - | 773 | child->LMR_done[i] = 0; |
|
| - | 774 | child->null_done[i] = 0; |
|
| - | 775 | } |
|
| 343 | child->alpha = parent->alpha; |
776 | child->alpha = parent->alpha; |
| 344 | child->beta = parent->beta; |
777 | child->beta = parent->beta; |
| 345 | child->value = parent->value; |
778 | child->value = parent->value; |
| 346 | child-> |
779 | child->wtm = parent->wtm; |
| 347 | child->depth = parent->depth; |
780 | child->depth = parent->depth; |
| 348 | parent->siblings[thread] = child; |
- | |
| 349 | child->thread_id = thread; |
- | |
| 350 | child-> |
781 | child->searched = parent->searched; |
| 351 | parent->nprocs++; |
- | |
| 352 |
|
782 | strcpy(child->root_move_text, parent->root_move_text); |
| 353 |
|
783 | strcpy(child->remaining_moves_text, parent->remaining_moves_text); |
| 354 | } |
784 | } |
| 355 | 785 | ||
| 356 | /* modified 04/ |
786 | /* modified 11/04/15 */ |
| 357 | /* |
787 | /* |
| 358 | ******************************************************************************* |
788 | ******************************************************************************* |
| 359 | * * |
789 | * * |
| 360 | * CopyToParent() is used to copy data from a child thread to a parent * |
790 | * CopyToParent() is used to copy data from a child thread to a parent * |
| 361 | * thread. This only copies the appropriate parts of the TREE structure to * |
791 | * thread. This only copies the appropriate parts of the TREE structure to * |
| 362 | * avoid burning memory bandwidth by copying everything. * |
792 | * avoid burning memory bandwidth by copying everything. * |
| 363 | * * |
793 | * * |
| 364 | ******************************************************************************* |
794 | ******************************************************************************* |
| 365 | */ |
795 | */ |
| 366 | void CopyToParent(TREE * RESTRICT parent, TREE * RESTRICT child, int value) { |
796 | void CopyToParent(TREE * RESTRICT parent, TREE * RESTRICT child, int value) { |
| 367 | int ply = parent->ply; |
797 | int i, ply = parent->ply, which; |
| 368 | 798 | ||
| 369 | /* |
799 | /* |
| 370 | ************************************************************ |
800 | ************************************************************ |
| 371 | * * |
801 | * * |
| 372 | * |
802 | * The only concern here is to make sure that the info is * |
| 373 | * copied to the parent if our score is > than the |
803 | * only copied to the parent if our score is > than the * |
| 374 | * value, and that we were not stopped for any |
804 | * parent value, and that we were not stopped for any * |
| 375 | * which could produce a partial score that is |
805 | * reason which could produce a partial score that is * |
| - | 806 | * worthless and dangerous to use. * |
|
| - | 807 | * * |
|
| - | 808 | * One important special case. If we get here with the * |
|
| - | 809 | * thread->stop flag set, and ply is 1, then we need to * |
|
| - | 810 | * clear the "this move has been searched" flag in the ply * |
|
| - | 811 | * 1 move list since we did not complete the search. If * |
|
| - | 812 | * we fail to do this, then a move being searched in * |
|
| - | 813 | * parallel at the root will be "lost" for this iteration * |
|
| - | 814 | * and won't be searched again until the next iteration. * |
|
| 376 | * * |
815 | * * |
| 377 | * In any case, we add our statistical counters to the * |
816 | * In any case, we add our statistical counters to the * |
| 378 | * parent's totals no matter whether we finished or not * |
817 | * parent's totals no matter whether we finished or not * |
| 379 | * since the total nodes searched and such should consider * |
818 | * since the total nodes searched and such should consider * |
| 380 | * everything searched, not just the "useful stuff." * |
819 | * everything searched, not just the "useful stuff." * |
| - | 820 | * * |
|
| - | 821 | * After we finish copying everything, we mark this split * |
|
| - | 822 | * block as free in the split block bitmap. * |
|
| 381 | * * |
823 | * * |
| 382 | ************************************************************ |
824 | ************************************************************ |
| 383 | */ |
825 | */ |
| 384 | if (child->nodes_searched && !child->stop && value > parent->value && |
826 | if (child->nodes_searched && !child->stop && value > parent->value && |
| 385 | !abort_search) { |
827 | !abort_search) { |
| 386 | parent->pv[ply] = child->pv[ply]; |
828 | parent->pv[ply] = child->pv[ply]; |
| 387 | parent->value = value; |
829 | parent->value = value; |
| 388 | parent->cutmove = child->curmv[ply]; |
830 | parent->cutmove = child->curmv[ply]; |
| 389 | } |
831 | } |
| - | 832 | if (child->stop && ply == 1) |
|
| - | 833 | for (which = 0; which < n_root_moves; which++) |
|
| - | 834 | if (root_moves[which].move == child->curmv[ply]) { |
|
| - | 835 | root_moves[which].status &= 7; |
|
| - | 836 | break; |
|
| - | 837 | } |
|
| 390 | parent->nodes_searched += child->nodes_searched; |
838 | parent->nodes_searched += child->nodes_searched; |
| 391 | parent->fail_highs += child->fail_highs; |
839 | parent->fail_highs += child->fail_highs; |
| 392 | parent->fail_high_first_move += child->fail_high_first_move; |
840 | parent->fail_high_first_move += child->fail_high_first_move; |
| 393 | parent->evaluations += child->evaluations; |
841 | parent->evaluations += child->evaluations; |
| 394 | parent->egtb_probes += child->egtb_probes; |
842 | parent->egtb_probes += child->egtb_probes; |
| 395 | parent-> |
843 | parent->egtb_hits += child->egtb_hits; |
| 396 | parent->extensions_done += child->extensions_done; |
844 | parent->extensions_done += child->extensions_done; |
| 397 | parent->qchecks_done += child->qchecks_done; |
845 | parent->qchecks_done += child->qchecks_done; |
| 398 | parent->reductions_done += child->reductions_done; |
- | |
| 399 | parent->moves_fpruned += child->moves_fpruned; |
846 | parent->moves_fpruned += child->moves_fpruned; |
| - | 847 | parent->moves_mpruned += child->moves_mpruned; |
|
| 400 |
|
848 | for (i = 1; i < 16; i++) { |
| - | 849 | parent->LMR_done[i] += child->LMR_done[i]; |
|
| - | 850 | parent->null_done[i] += child->null_done[i]; |
|
| - | 851 | } |
|
| - | 852 | which = FindBlockID(child) - 64 * child->thread_id - 1; |
|
| - | 853 | Set(which, thread[child->thread_id].blocks); |
|
| 401 | } |
854 | } |
| 402 | 855 | ||
| - | 856 | /* modified 11/04/15 */ |
|
| 403 | /* |
857 | /* |
| 404 | ******************************************************************************* |
858 | ******************************************************************************* |
| 405 | * * |
859 | * * |
| 406 | * |
860 | * GetBlock() is used to allocate a split block and fill in only SMP- * |
| 407 | * |
861 | * critical information. The child process will copy the rest of the split * |
| 408 | * |
862 | * block information as needed. * |
| 409 | * |
863 | * * |
| - | 864 | * When we arrive here, the parent split block must be locked since we are * |
|
| - | 865 | * going to change data in that block as well as copy data from that block * |
|
| - | 866 | * the current split block. The only exception is when this is the original * |
|
| - | 867 | * split operation, since this is done "out of sight" of other threads which * |
|
| - | 868 | * means no locks are needed until after the "joinable" flag is set, which * |
|
| - | 869 | * exposes this split point to other threads instantly. * |
|
| 410 | * * |
870 | * * |
| 411 | ******************************************************************************* |
871 | ******************************************************************************* |
| 412 | */ |
872 | */ |
| - | 873 | TREE *GetBlock(TREE * /*RESTRICT*/ parent, int tid) { // Pierre-Marie Baty -- keyword not present in declaration |
|
| - | 874 | TREE *child; |
|
| - | 875 | static int warnings = 0; |
|
| - | 876 | int i, unused; |
|
| - | 877 | /* |
|
| - | 878 | ************************************************************ |
|
| - | 879 | * * |
|
| - | 880 | * One NUMA-related trick is that we only allocate a split * |
|
| - | 881 | * block in the thread's local memory. Each thread has a * |
|
| - | 882 | * group of split blocks that were first touched by the * |
|
| - | 883 | * correct CPU so that the split blocks page-faulted into * |
|
| - | 884 | * local memory for that specific processor. If we can't * |
|
| - | 885 | * find an optimally-placed block, we return a zero which * |
|
| - | 886 | * will prevent this thread from joining the split point. * |
|
| - | 887 | * This is highly unlikely as it would mean the current * |
|
| - | 888 | * thread has 64 active split blocks where it is waiting * |
|
| - | 889 | * on other threads to complete the last bit of work at * |
|
| - | 890 | * each. This is extremely unlikely. * |
|
| - | 891 | * * |
|
| - | 892 | * Here we use a simple 64 bit bit-map per thread that * |
|
| - | 893 | * indicates which blocks are free (1) and which blocks * |
|
| - | 894 | * used (0). We simply use LSB() to find the rightmost * |
|
| - | 895 | * one-bit set and that is the local block number. We * |
|
| - | 896 | * convert that to a global block number by doing the * |
|
| - | 897 | * simple computation: * |
|
| - | 898 | * * |
|
| - | 899 | * global = local + 64 * tid + 1 * |
|
| - | 900 | * * |
|
| - | 901 | * Each thread has exactly 64 split blocks, and block 0 * |
|
| - | 902 | * is the "master block" that never gets allocated or * |
|
| - | 903 | * freed. Once we find a free block for the current * |
|
| - | 904 | * thread, we zero that bit so that the block won't be * |
|
| - | 905 | * used again until it is released. * |
|
| - | 906 | * * |
|
| - | 907 | ************************************************************ |
|
| - | 908 | */ |
|
| 413 |
|
909 | if (thread[tid].blocks) { |
| - | 910 | unused = LSB(thread[tid].blocks); |
|
| - | 911 | Clear(unused, thread[tid].blocks); |
|
| - | 912 | Set(unused, thread[tid].max_blocks); |
|
| - | 913 | } else { |
|
| - | 914 | if (++warnings < 6) |
|
| - | 915 | Print(2048, |
|
| - | 916 | "WARNING. local SMP block cannot be allocated, thread %d\n", tid); |
|
| - | 917 | return 0; |
|
| - | 918 | } |
|
| - | 919 | child = block[unused + tid * 64 + 1]; |
|
| - | 920 | /* |
|
| - | 921 | ************************************************************ |
|
| - | 922 | * * |
|
| - | 923 | * Found a split block. Now we need to fill in only the * |
|
| - | 924 | * critical information that can't be delayed due to race * |
|
| - | 925 | * conditions. When we get here, the parent split block * |
|
| - | 926 | * must be locked, that lets us safely update the number * |
|
| - | 927 | * of processors working here, etc, without any ugly race * |
|
| - | 928 | * conditions that would corrupt this critical data. * |
|
| - | 929 | * * |
|
| - | 930 | ************************************************************ |
|
| - | 931 | */ |
|
| 414 |
|
932 | for (i = 0; i < (int) smp_max_threads; i++) // Pierre-Marie Baty -- added type cast |
| - | 933 | child->siblings[i] = 0; |
|
| - | 934 | child->nprocs = 0; |
|
| - | 935 | child->stop = 0; |
|
| - | 936 | child->joinable = 0; |
|
| - | 937 | child->joined = 0; |
|
| - | 938 | child->parent = parent; |
|
| - | 939 | child->thread_id = tid; |
|
| - | 940 | parent->nprocs++; |
|
| - | 941 | parent->siblings[tid] = child; |
|
| - | 942 | parent->joined = 1; |
|
| - | 943 | return child; |
|
| 415 | } |
944 | } |
| 416 | 945 | ||
| 417 | /* modified 06/07/09 */ |
- | |
| 418 | /* |
946 | /* |
| 419 | ******************************************************************************* |
947 | ******************************************************************************* |
| 420 | * * |
948 | * * |
| 421 | * |
949 | * WaitForAllThreadsInitialized() waits until all smp_max_threads are * |
| 422 | * |
950 | * initialized. We have to initialize each thread and malloc() its split * |
| 423 | * |
951 | * blocks before we start the actual parallel search. Otherwise we will see * |
| 424 | * (first) process was running. All threads will hang here via a custom * |
- | |
| 425 | * WaitForALlThreadsInitialized() procedure so that all the local thread * |
- | |
| 426 | * |
952 | * invalid memory accesses and crash instantly. * |
| 427 | * * |
953 | * * |
| 428 | ******************************************************************************* |
954 | ******************************************************************************* |
| 429 | */ |
955 | */ |
| 430 | void |
956 | void WaitForAllThreadsInitialized(void) { |
| 431 | int i, j; |
- | |
| 432 | #if defined(AFFINITY) |
- | |
| 433 | int64_t k; |
- | |
| 434 | cpu_set_t cpuset; |
- | |
| 435 | pthread_t current_thread = pthread_self(); |
- | |
| 436 | - | ||
| 437 | k = (int64_t) tid; |
- | |
| 438 | CPU_ZERO(&cpuset); |
- | |
| 439 | CPU_SET(k, &cpuset); |
- | |
| 440 | pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset); |
- | |
| 441 | #endif |
- | |
| 442 | #if !defined(UNIX) |
- | |
| 443 | ThreadMalloc((uint64_t) tid); |
- | |
| 444 | #endif |
- | |
| 445 | for (i = 0; i < MAX_BLOCKS_PER_CPU; i++) { |
- | |
| 446 |
|
957 | while (initialized_threads < (int) smp_max_threads); /* Do nothing */ // Pierre-Marie Baty -- added type cast |
| 447 | sizeof(TREE)); |
- | |
| 448 | block[(uint64_t) tid * MAX_BLOCKS_PER_CPU + i + 1]->used = 0; |
- | |
| 449 | block[(uint64_t) tid * MAX_BLOCKS_PER_CPU + i + 1]->parent = NULL; |
- | |
| 450 | LockInit(block[(uint64_t) tid * MAX_BLOCKS_PER_CPU + i + 1]->lock); |
- | |
| 451 | for (j = 0; j < 64; j++) |
- | |
| 452 | block[(uint64_t) tid * MAX_BLOCKS_PER_CPU + i + 1]->cache_n[j] = ~0ull; |
- | |
| 453 | } |
- | |
| 454 | Lock(lock_smp); |
- | |
| 455 | initialized_threads++; |
- | |
| 456 | Unlock(lock_smp); |
- | |
| 457 | WaitForAllThreadsInitialized(); |
- | |
| 458 | ThreadWait((uint64_t) tid, (TREE *) 0); |
- | |
| 459 | Lock(lock_smp); |
- | |
| 460 | smp_threads--; |
- | |
| 461 | Unlock(lock_smp); |
- | |
| 462 | return 0; |
- | |
| 463 | } |
958 | } |
| 464 | 959 | ||
| 465 | |
960 | # if !defined (UNIX) |
| 466 | /* modified 01/17/09 */ |
961 | /* modified 01/17/09 */ |
| 467 | /* |
962 | /* |
| 468 | ******************************************************************************* |
963 | ******************************************************************************* |
| 469 | * * |
964 | * * |
| 470 | * ThreadMalloc() is called from the ThreadInit() function. It malloc's the * |
965 | * ThreadMalloc() is called from the ThreadInit() function. It malloc's the * |
| Line 473... | Line 968... | ||
| 473 | * * |
968 | * * |
| 474 | ******************************************************************************* |
969 | ******************************************************************************* |
| 475 | */ |
970 | */ |
| 476 | extern void *WinMalloc(size_t, int); |
971 | extern void *WinMalloc(size_t, int); |
| 477 | void ThreadMalloc(int64_t tid) { |
972 | void ThreadMalloc(int64_t tid) { |
| 478 | int i |
973 | int i; |
| 479 | 974 | ||
| 480 | for (i = |
975 | for (i = (int) tid * 64 + 1; i < (int) tid * 64 + 65; i++) { // Pierre-Marie Baty -- added type casts |
| 481 | if (block[i] == NULL) |
976 | if (block[i] == NULL) |
| 482 | block[i] = |
977 | block[i] = |
| 483 | (TREE *) ((~(size_t) 127) & (127 + (size_t) WinMalloc(sizeof(TREE) + |
978 | (TREE *) ((~(size_t) 127) & (127 + (size_t) WinMalloc(sizeof(TREE) + |
| 484 | 127, (int) tid))); // Pierre-Marie Baty -- added type cast |
979 | 127, (int) tid))); // Pierre-Marie Baty -- added type cast |
| 485 | block[i]->used = 0; |
- | |
| 486 | block[i]->parent = NULL; |
980 | block[i]->parent = NULL; |
| 487 | LockInit(block[i]->lock); |
981 | LockInit(block[i]->lock); |
| 488 | } |
982 | } |
| 489 | } |
983 | } |
| - | 984 | # endif |
|
| 490 | #endif |
985 | #endif |
| 491 | - | ||
| 492 | /* modified 01/17/09 */ |
- | |
| 493 | /* |
- | |
| 494 | ******************************************************************************* |
- | |
| 495 | * * |
- | |
| 496 | * ThreadStop() is called from SearchParallel() when it detects a beta * |
- | |
| 497 | * cutoff (fail high) at a node that is being searched in parallel. We need * |
- | |
| 498 | * to stop all threads here, and since this threading algorithm is recursive * |
- | |
| 499 | * it may be necessary to stop other threads that are helping search this * |
- | |
| 500 | * branch further down into the tree. This function simply sets tree->stop * |
- | |
| 501 | * to 1, which will stop that particular thread instantly and return it to * |
- | |
| 502 | * the idle loop in ThreadWait(). * |
- | |
| 503 | * * |
- | |
| 504 | ******************************************************************************* |
- | |
| 505 | */ |
- | |
| 506 | void ThreadStop(TREE * RESTRICT tree) { |
- | |
| 507 | int proc; |
- | |
| 508 | - | ||
| 509 | Lock(tree->lock); |
- | |
| 510 | tree->stop = 1; |
- | |
| 511 | for (proc = 0; proc < smp_max_threads; proc++) |
- | |
| 512 | if (tree->siblings[proc]) |
- | |
| 513 | ThreadStop(tree->siblings[proc]); |
- | |
| 514 | Unlock(tree->lock); |
- | |
| 515 | } |
- | |
| 516 | - | ||
| 517 | /* modified 04/23/14 */ |
- | |
| 518 | /* |
- | |
| 519 | ******************************************************************************* |
- | |
| 520 | * * |
- | |
| 521 | * ThreadWait() is the idle loop for the N threads that are created at the * |
- | |
| 522 | * beginning when Crafty searches. Threads are "parked" here waiting on a * |
- | |
| 523 | * pointer to something they should search (a parameter block built in the * |
- | |
| 524 | * function Thread() in this case. When this pointer becomes non-zero, each * |
- | |
| 525 | * thread "parked" here will immediately call SearchParallel() and begin the * |
- | |
| 526 | * parallel search as directed. * |
- | |
| 527 | * * |
- | |
| 528 | ******************************************************************************* |
- | |
| 529 | */ |
- | |
| 530 | int ThreadWait(int64_t tid, TREE * RESTRICT waiting) { |
- | |
| 531 | int value, tstart, tend; |
- | |
| 532 | - | ||
| 533 | /* |
- | |
| 534 | ************************************************************ |
- | |
| 535 | * * |
- | |
| 536 | * The N initial threads enter here and are kept penned * |
- | |
| 537 | * here forever. However, once a thread leaves here it * |
- | |
| 538 | * may well re-enter ThreadWait() from the top while it * |
- | |
| 539 | * waits for a parallel search to complete. While it * |
- | |
| 540 | * waits here it can also join in to help other busy * |
- | |
| 541 | * threads search their subtrees as well. * |
- | |
| 542 | * * |
- | |
| 543 | ************************************************************ |
- | |
| 544 | */ |
- | |
| 545 | while (1) { |
- | |
| 546 | tstart = ReadClock(); |
- | |
| 547 | Lock(lock_split); |
- | |
| 548 | Lock(lock_smp); |
- | |
| 549 | smp_idle++; |
- | |
| 550 | if (!waiting && smp_split == 0) |
- | |
| 551 | smp_split = 1; |
- | |
| 552 | if (!thread[tid].tree) |
- | |
| 553 | thread[tid].idle = 1; |
- | |
| 554 | Unlock(lock_smp); |
- | |
| 555 | Unlock(lock_split); |
- | |
| 556 | /* |
- | |
| 557 | ************************************************************ |
- | |
| 558 | * * |
- | |
| 559 | * We can exit if our thread[i] is non-zero, or if we are * |
- | |
| 560 | * waiting on others to finish a block that *we* have to * |
- | |
| 561 | * return through. When the busy count on such a block * |
- | |
| 562 | * hits zero, we return immediately which unwinds the * |
- | |
| 563 | * search as it should be. * |
- | |
| 564 | * * |
- | |
| 565 | ************************************************************ |
- | |
| 566 | */ |
- | |
| 567 | while (!thread[tid].tree && (!waiting || waiting->nprocs)) |
- | |
| 568 | Pause(); |
- | |
| 569 | thread[tid].idle = 0; |
- | |
| 570 | tend = ReadClock(); |
- | |
| 571 | Lock(lock_smp); |
- | |
| 572 | idle_time += tend - tstart; |
- | |
| 573 | if (!thread[tid].tree) |
- | |
| 574 | thread[tid].tree = waiting; |
- | |
| 575 | /* |
- | |
| 576 | ************************************************************ |
- | |
| 577 | * * |
- | |
| 578 | * We either have work to do, or threads we were waiting * |
- | |
| 579 | * on have finished their work. * |
- | |
| 580 | * * |
- | |
| 581 | ************************************************************ |
- | |
| 582 | */ |
- | |
| 583 | smp_idle--; |
- | |
| 584 | Unlock(lock_smp); |
- | |
| 585 | /* |
- | |
| 586 | ************************************************************ |
- | |
| 587 | * * |
- | |
| 588 | * If we are waiting on a block and the busy count is now * |
- | |
| 589 | * zero, we simply return to finish up the bookkeeping at * |
- | |
| 590 | * that point. * |
- | |
| 591 | * * |
- | |
| 592 | ************************************************************ |
- | |
| 593 | */ |
- | |
| 594 | if (thread[tid].tree == waiting || thread[tid].tree == (TREE *) - 1) |
- | |
| 595 | return 0; |
- | |
| 596 | /* |
- | |
| 597 | ************************************************************ |
- | |
| 598 | * * |
- | |
| 599 | * Else our thread[i] pointer is non-zero, meaning we have * |
- | |
| 600 | * been assigned something to search. Hi-ho, hi-ho, it's * |
- | |
| 601 | * off to work we go... * |
- | |
| 602 | * * |
- | |
| 603 | ************************************************************ |
- | |
| 604 | */ |
- | |
| 605 | value = |
- | |
| 606 | SearchParallel(thread[tid].tree, thread[tid].tree->alpha, |
- | |
| 607 | thread[tid].tree->beta, thread[tid].tree->value, |
- | |
| 608 | thread[tid].tree->side, thread[tid].tree->depth, |
- | |
| 609 | thread[tid].tree->ply); |
- | |
| 610 | Lock(thread[tid].tree->parent->lock); |
- | |
| 611 | CopyToParent((TREE *) thread[tid].tree->parent, thread[tid].tree, value); |
- | |
| 612 | thread[tid].tree->parent->nprocs--; |
- | |
| 613 | thread[tid].tree->parent->siblings[tid] = 0; |
- | |
| 614 | Unlock(thread[tid].tree->parent->lock); |
- | |
| 615 | thread[tid].tree = 0; |
- | |
| 616 | } |
- | |
| 617 | } |
- | |