Rev 33 | Rev 154 | Go to most recent revision | Show entire file | Ignore whitespace | Details | Blame | Last modification | View Log | RSS feed
| Rev 33 | Rev 108 | ||
|---|---|---|---|
| Line 2... | Line 2... | ||
| 2 | #include "data.h" | 2 | #include "data.h" | 
| 3 | #include "epdglue.h" | 3 | #include "epdglue.h" | 
| - | 4 | #if (CPUS > 1) | |
| 4 | /* modified 04/ | 5 | /* modified 11/04/15 */ | 
| 5 | /* | 6 | /* | 
| 6 |  ******************************************************************************* | 7 |  ******************************************************************************* | 
| 7 |  *                                                                             * | 8 |  *                                                                             * | 
| 8 |  *    | 9 |  *   Split() is the driver for the threaded parallel search in Crafty.  The    * | 
| 9 |  *   basic idea is that whenever we notice that one (or more) threads are in   * | 10 |  *   basic idea is that whenever we notice that one (or more) threads are in   * | 
| 10 |  *   their idle loop, we drop into  | 11 |  *   their idle loop, we drop into Split(), from Search(), and begin a new     * | 
| 11 |  *   parallel search from this node.  This is simply a problem of  | 12 |  *   parallel search from this node.  This is simply a problem of establishing * | 
| - | 13 |  *   a new split point, and then letting each thread join this split point and * | |
| - | 14 |  *   copy whatever data they need.                                             * | |
| - | 15 |  *                                                                             * | |
| - | 16 |  *   This is generation II of Split().  The primary differences address two    * | |
| - | 17 |  *   specific performance-robbing issues.  (1) Excessive waiting for a split   * | |
| - | 18 |  *   to be done, and (b) excessive waiting on specific locks.  Generation II   * | |
| - | 19 |  *   addresses both of these to significantly improve performance.             * | |
| - | 20 |  *                                                                             * | |
| - | 21 |  *   The main difference between Gen I and Gen II is the effort required to    * | |
| 12 |  *    | 22 |  *   split the tree and which thread(s) expend this effort.  In generation I,  * | 
| - | 23 |  *   the parent thread was responsible for allocating a split block for each   * | |
| - | 24 |  *   child thread, and then copying the necessary data from the parent split   * | |
| - | 25 |  *   block to these child split blocks.  When all of this was completed, the   * | |
| - | 26 |  *   child processes were released to start the parallel search after being    * | |
| - | 27 |  *   held while the split / copy operations were done.  In the generation II   * | |
| - | 28 |  *   Split() we now simply allocate a new split block for THIS thread, flag    * | |
| - | 29 |  *   the parent split block as joinable, and then go directly to ThreadWait()  * | |
| - | 30 |  *   which will drop us back in to the search immediately.  The idle threads   * | |
| - | 31 |  *   are continually looping on Join() which will jump them right into this    * | |
| - | 32 |  *   split block letting them do ALL the work of allocating a split block,     * | |
| - | 33 |  *   filling it in, and then copying the data to their local split block.      * | |
| - | 34 |  *   This distributes the split overhead among all the threads that split,     * | |
| - | 35 |  *   rather than this thread having to do all the work while the other threads * | |
| - | 36 |  *   sit idle.                                                                 * | |
| - | 37 |  *                                                                             * | |
| - | 38 |  *   Generation II is also much more lightweight, in that it copies only the   * | |
| - | 39 |  *   bare minimum from parent to child.  Generation I safely copied far too    * | |
| - | 40 |  *   much since this code was being changed regularly, but that is no longer   * | |
| - | 41 |  *   necessary overhead.                                                       * | |
| - | 42 |  *                                                                             * | |
| - | 43 |  *   Generation II has a zero contention split algorithm.  In the past, when a * | |
| - | 44 |  *   thread became idle, it posted a global split request and any thread that  * | |
| - | 45 |  *   was at an appropriate point would try to split.  But while it was doing   * | |
| - | 46 |  *   the splitting, other threads that were also willing to split would "get   * | |
| - | 47 |  *   in line" because Crafty used a global lock to prevent two threads from    * | |
| - | 48 |  *   attempting to split at the same instant in time.  They got in line, and   * | |
| - | 49 |  *   waited for the original splitter to release the lock, but now they have   * | |
| - | 50 |  *   no idle threads to split with.  A waste of time.  Now we allow ANY thread * | |
| - | 51 |  *   to attempt to split at the current ply.  When we do what might be called  * | |
| - | 52 |  *   a "gratuitous split" the only restriction is that if we have existing     * | |
| - | 53 |  *   "gratuitous split points" (split blocks that are joinable but have not    * | |
| - | 54 |  *   yet been joined), then we limit the number of such splits (per thread) to * | |
| - | 55 |  *   avoid excessive overhead.                                                 * | |
| - | 56 |  *                                                                             * | |
| - | 57 |  *   Generation II takes another idea from DTS, the idea of "late-join".  The  * | |
| - | 58 |  *   idea is fairly simple.  If, when a thread becomes idle, there are already * | |
| - | 59 |  *   other split points being searched in parallel, then we will try to join   * | |
| - | 60 |  *   one of them rather than waiting for someone to ask us to help.  We use    * | |
| - | 61 |  *   some simple criteria:  (1) The split point must be joinable, which simply * | |
| - | 62 |  *   means that no processor has exited the split point yet (which would let   * | |
| - | 63 |  *   us know there is no more work here and a join would be futile);  (2) We   * | |
| - | 64 |  *   compute an "interest" value which is a simple formula based on depth at   * | |
| - | 65 |  *   the split point, and the number of moves already searched.  It seems less * | |
| - | 66 |  *   risky to choose a split point with max depth AND minimum moves already    * | |
| 13 |  *    | 67 |  *   searched so that there is plenty to do.  This was quite simple to add     * | 
| - | 68 |  *   after the rest of the Generation II rewrite.  In fact, this is now THE    * | |
| - | 69 |  *   way threads join a split point, period, which further simplifies the code * | |
| - | 70 |  *   and improves efficiency.  IE, a thread can split when idle threads are    * | |
| - | 71 |  *   noticed, or if it is far enough away from the tips to make the cost       * | |
| - | 72 |  *   negligible.  At that point any idle thread(s) can join immediately, those * | |
| - | 73 |  *   that become idle later can join when they are ready.                      * | |
| 14 |  *                                                                             * | 74 |  *                                                                             * | 
| 15 |  *   There are a number of settable options via the command-line or .craftyrc  * | 75 |  *   There are a number of settable options via the command-line or .craftyrc  * | 
| 16 |  *   initialization file.  Here's a concise explanation for each option and an * | 76 |  *   initialization file.  Here's a concise explanation for each option and an * | 
| 17 |  *   occasional suggestion for testing/tuning.                                 * | 77 |  *   occasional suggestion for testing/tuning.                                 * | 
| - | 78 |  *                                                                             * | |
| - | 79 |  *   smp_affinity (command = smpaffinity=<n> is used to enable or disable      * | |
| - | 80 |  *      processor affinity.  -1 disables affinity and lets threads run on any  * | |
| - | 81 |  *      available core.  If you use an integer <n> then thread zero will bind  * | |
| - | 82 |  *      itself to cpu <n> and each additional thread will bind to the next     * | |
| - | 83 |  *      higher cpu number.  This is useful if you try to run two copies of     * | |
| - | 84 |  *      crafty on the same machine, now you can cause one to bind to the first * | |
| - | 85 |  *      <n> cores, and the second to the last <n> cores.  For the first        * | |
| - | 86 |  *      instance of Crafty, you would use smpaffinity=0, and for the second    * | |
| - | 87 |  *      smpaffinity=8, assuming you are running 8 threads per copy on a 16 cpu * | |
| - | 88 |  *      machine.  If you get this wrong, you can have more than one thread on  * | |
| - | 89 |  *      the same cpu which will significantly impact performance.              * | |
| 18 |  *                                                                             * | 90 |  *                                                                             * | 
| 19 |  *   smp_max_threads (command = smpmt=n) sets the total number of allowable    * | 91 |  *   smp_max_threads (command = smpmt=n) sets the total number of allowable    * | 
| 20 |  *      threads for the search.  The default is one (1) as Crafty does not     * | 92 |  *      threads for the search.  The default is one (1) as Crafty does not     * | 
| 21 |  *      assume it should use all available resources.  For optimal performance * | 93 |  *      assume it should use all available resources.  For optimal performance * | 
| 22 |  *      this should be set to the number of physical cores your machine has,   * | 94 |  *      this should be set to the number of physical cores your machine has,   * | 
| 23 |  *      which does NOT include hyperthreaded cores.                            * | 95 |  *      which does NOT include hyperthreaded cores.                            * | 
| 24 |  *                                                                             * | 96 |  *                                                                             * | 
| 25 |  *   smp_split_group (command = smpgroup=n) sets the maximum number of threads * | 97 |  *   smp_split_group (command = smpgroup=n) sets the maximum number of threads * | 
| 26 |  *      at any single split point, with the exception of split points  | 98 |  *      at any single split point, with the exception of split points fairly   * | 
| 27 |  *       | 99 |  *      close to the root where ALL threads are allowed to split together,     * | 
| - | 100 |  *      ignoring this limit.  Note that this is ignored in the first 1/2 of    * | |
| - | 101 |  *      the tree (the nodes closer to the root).  There it is actually good to * | |
| 28 |  *       | 102 |  *      split and get all active threads involved.                             * | 
| 29 |  *                                                                             * | 103 |  *                                                                             * | 
| 30 |  *   smp_min_split_depth (command = smpmin=n) avoids splitting when remaining  * | 104 |  *   smp_min_split_depth (command = smpmin=n) avoids splitting when remaining  * | 
| 31 |  *      depth < n.  This is used to balance splitting overhead cost against    * | 105 |  *      depth < n.  This is used to balance splitting overhead cost against    * | 
| 32 |  *      the speed gain the parallel search  | 106 |  *      the speed gain the parallel search produces.  The default is currently * | 
| 33 |  *      5 (which could change with future generations of Intel hardware) but   * | 107 |  *      5 (which could change with future generations of Intel hardware) but   * | 
| 34 |  *      values between 4 and 8 will work.  Larger values allow somewhat fewer  * | 108 |  *      values between 4 and 8 will work.  Larger values allow somewhat fewer  * | 
| 35 |  *      splits, which reduces overhead, but it also increases the percentage   * | 109 |  *      splits, which reduces overhead, but it also increases the percentage   * | 
| 36 |  *      of the time where a thread is waiting on work.                         * | 110 |  *      of the time where a thread is waiting on work.                         * | 
| 37 |  *                                                                             * | 111 |  *                                                                             * | 
| 38 |  *   smp_split_nodes (command = smpnodes=n) will not allow a thread to split   * | - | |
| 39 |  *      until it has searched at least N nodes, to prevent excessive splitting * | - | |
| 40 |  *      out near the tips in endgames.  The default is 2000.  Larger values    * | - | |
| 41 |  *      will reduce splitting overhead but will increase the amoutn of time    * | - | |
| 42 |  *      a thread is waiting on work.                                           * | - | |
| 43 |  *                                                                             * | - | |
| 44 |  *   smp_split_at_root ( | 112 |  *   smp_split_at_root (command = smproot=0 or 1) enables (1) or disables (0)  * | 
| 45 |  *      splitting the tree at the root.  This defaults to 1 which produces the * | 113 |  *      splitting the tree at the root.  This defaults to 1 which produces the * | 
| 46 |  *      best performance by a signficiant margin.  But it can be disabled if   * | 114 |  *      best performance by a signficiant margin.  But it can be disabled if   * | 
| 47 |  *      you are playing with code changes.                                     * | 115 |  *      you are playing with code changes.                                     * | 
| 48 |  *                                                                             * | 116 |  *                                                                             * | 
| - | 117 |  *   smp_gratuitous_depth (command = smpgd=<n>) controls " gratuitous splits"  * | |
| - | 118 |  *      which are splits that are done without any idle threads.  This sets a  * | |
| - | 119 |  *      depth limit (remaining depth) that must be present before such a split * | |
| - | 120 |  *      can be done.  Making this number larger will reduce the number of      * | |
| - | 121 |  *      these splits.  Making it too small will increase overhead slightly and * | |
| - | 122 |  *      increase split block usage significantly.                              * | |
| - | 123 |  *                                                                             * | |
| - | 124 |  *   smp_gratuitous_limit (command = smpgl=<n>) limits the number of these     * | |
| 49 |  *    | 125 |  *      splits that a thread can do.  Once a thread has this number of         * | 
| - | 126 |  *      unjoined split points, it will not be allowed to split any more until  * | |
| 50 |  *    | 127 |  *      one or more threads join at least one of the existing split points.    * | 
| 51 |  *    | 128 |  *      In the smp search statistics, where you see output that looks like     * | 
| - | 129 |  *      this:                                                                  * | |
| - | 130 |  *                                                                             * | |
| - | 131 |  *        splits=m(n) ...                                                      * | |
| 52 |  *    | 132 |  *                                                                             * | 
| 53 |  *    | 133 |  *      m is the total splits done, n is the number of "wasted splits" which   * | 
| 54 |  *    | 134 |  *      are basically gratuitous splits where no thread joined before this     * | 
| - | 135 |  *      split point was completed and deallocated.                             * | |
| 55 |  *    | 136 |  *                                                                             * | 
| - | 137 |  *   The best way to tune all of these paramenters is to use the "autotune"    * | |
| - | 138 |  *   command (see autotune.c and help autotune) which will automatically run   * | |
| - | 139 |  *   tests and optimize the parameters.  More details are in the autotune.c    * | |
| - | 140 |  *   source file.                                                              * | |
| 56 |  *                                                                             * | 141 |  *                                                                             * | 
| 57 |  *   A few basic "rules of the road" for anyone interested in changing or      * | 142 |  *   A few basic "rules of the road" for anyone interested in changing or      * | 
| 58 |  *   adding to any of this code.                                               * | 143 |  *   adding to any of this code.                                               * | 
| 59 |  *                                                                             * | 144 |  *                                                                             * | 
| 60 |  *   1.  If, at any time, you want to modify your private split block, no lock * | 145 |  *   1.  If, at any time, you want to modify your private split block, no lock * | 
| 61 |  *       is required.                                                          * | 146 |  *       is required.                                                          * | 
| 62 |  *                                                                             * | 147 |  *                                                                             * | 
| 63 |  *   2.  If, at any time, you want to modify another split block, such as the  * | 148 |  *   2.  If, at any time, you want to modify another split block, such as the  * | 
| 64 |  *       parent split block shared move list, you must acquire the lock in the * | 149 |  *       parent split block shared move list, you must acquire the lock in the * | 
| 65 |  *       split block first.  IE  | 150 |  *       split block first.  IE tree->parent->lock to lock the parent split    * | 
| 66 |  *       block during NextMove() and such | 151 |  *       block during NextMove() and such.                                     * | 
| 67 |  *       search-related data, NOT the SMP data within the split block (parent  * | - | |
| 68 |  *       pointer, sibling pointers, number of processors working here, etc).   * | - | |
| 69 |  *       Modifying those falls under the next lock below.                      * | - | |
| 70 |  *                                                                             * | 152 |  *                                                                             * | 
| 71 |  *   3.  If you want to modify any SMP-related  | 153 |  *   3.  If you want to modify any SMP-related data that spans multiple split  * | 
| 72 |  *        | 154 |  *       blocks, such as telling sibling processes to stop, etc, then you must * | 
| 73 |  *       acquire the global " | 155 |  *       acquire the global "lock_smp" lock first.  This prevents a deadlock   * | 
| 74 |  *        | 156 |  *       caused by two different threads walking the split block chain from    * | 
| 75 |  *        | 157 |  *       different directions, and acquiring the split block locks in          * | 
| - | 158 |  *       different orders, which could cause a catastrophic deadlock to occur. * | |
| 76 |  *        | 159 |  *       This is an infrequent event so the overhead is not significant.       * | 
| 77 |  *                                                                             * | 160 |  *                                                                             * | 
| 78 |  *   4.  If you want to do any sort of I/O operation, you must acquire the     * | 161 |  *   4.  If you want to do any sort of I/O operation, you must acquire the     * | 
| 79 |  *       "lock_io" lock first.  Since threads share descriptors, there are     * | 162 |  *       "lock_io" lock first.  Since threads share descriptors, there are     * | 
| 80 |  *       lots of potential race conditions, from the simple tty-output getting * | 163 |  *       lots of potential race conditions, from the simple tty-output getting * | 
| 81 |  *       interlaced from different threads, to outright data corruption in the * | 164 |  *       interlaced from different threads, to outright data corruption in the * | 
| 82 |  *       book or log files.                                                    * | 165 |  *       book or log files.                                                    * | 
| 83 |  *                                                                             * | - | |
| 84 |  *   5.  There is a specific "lock_split" that is used to prevent a lot of     * | - | |
| 85 |  *       queued waiting to do splits, but it is unlikely that lock would ever  * | - | |
| 86 |  *       be used anywhere other than where it is currently accessed in         * | - | |
| 87 |  *       Search() at the parallel split initiation code.                       * | - | |
| 88 |  *                                                                             * | - | |
| 89 |  *   6.  If you want to alter the root move list, you must first acquire       * | - | |
| 90 |  *       lock_root, since the root move list is shared and multiple threads    * | - | |
| 91 |  *       can attempt to modify it at the same time.  Overlooking this can      * | - | |
| 92 |  *       result in a corrupted root move list with one or more moves missing   * | - | |
| 93 |  *       or duplicated.                                                        * | - | |
| 94 |  *                                                                             * | 166 |  *                                                                             * | 
| 95 |  *   Some of the bugs caused by failing to acquire the correct lock will only  * | 167 |  *   Some of the bugs caused by failing to acquire the correct lock will only  * | 
| 96 |  *   occur infrequently, and they are extremely difficult to find.  Some only  * | 168 |  *   occur infrequently, and they are extremely difficult to find.  Some only  * | 
| 97 |  *   show up in a public game where everyone is watching, to cause maximum     * | 169 |  *   show up in a public game where everyone is watching, to cause maximum     * | 
| 98 |  *   embarassment and causes the program to do something extremely stupid.     * | 170 |  *   embarassment and causes the program to do something extremely stupid.     * | 
| 99 |  *                                                                             * | 171 |  *                                                                             * | 
| 100 |  ******************************************************************************* | 172 |  ******************************************************************************* | 
| 101 |  */ | 173 |  */ | 
| 102 | int | 174 | int Split(TREE * RESTRICT tree) { | 
| - | 175 | TREE *child; | |
| 103 | int | 176 | int tid, tstart, tend; | 
| 104 | 177 | ||
| 105 | /* | 178 | /* | 
| 106 |  ************************************************************ | 179 |  ************************************************************ | 
| 107 |  *                                                          * | 180 |  *                                                          * | 
| 108 |  *   | 181 |  *  Here we prepare to split the tree.  All we really do in * | 
| - | 182 |  *  the Generation II threading is grab a split block for   * | |
| 109 |  *   | 183 |  *  this thread, then flag the parent as "joinable" and     * | 
| - | 184 |  *  then jump right to ThreadWait() to resume where we left * | |
| - | 185 |  *  off, with the expectation (but not a requirement) that  * | |
| - | 186 |  *  other threads will join us to help.                     * | |
| - | 187 |  *                                                          * | |
| - | 188 |  *  Idle threads are sitting in ThreadWait() repeatedly     * | |
| - | 189 |  *  calling Join() to find them a split point, which we are * | |
| - | 190 |  *  fixing to provide.  They will then join as quickly as   * | |
| 110 |  *   | 191 |  *  they can, and other threads that become idle later can  * | 
| - | 192 |  *  also join without any further splitting needed.         * | |
| - | 193 |  *                                                          * | |
| - | 194 |  *  If we are unable to allocate a split block, we simply   * | |
| - | 195 |  *  abort this attempted split and return to the search     * | |
| - | 196 |  *  since other threads will also split quickly.            * | |
| 111 |  *                                                          * | 197 |  *                                                          * | 
| 112 |  ************************************************************ | 198 |  ************************************************************ | 
| 113 |  */ | 199 |  */ | 
| - | 200 | tstart = ReadClock(); | |
| 114 | 
 | 201 | tree->nprocs = 0; | 
| 115 | for ( | 202 | for (tid = 0; tid < (int) smp_max_threads; tid++) // Pierre-Marie Baty -- added type cast | 
| 116 | 
 | 203 | tree->siblings[tid] = 0; | 
| 117 | 
 | 204 | child = GetBlock(tree, tree->thread_id); | 
| 118 | 
 | 205 | if (!child) | 
| 119 | return 0; | 206 | return 0; | 
| - | 207 | CopyFromParent(child); | |
| - | 208 | thread[tree->thread_id].tree = child; | |
| - | 209 | tree->joined = 0; | |
| - | 210 | tree->joinable = 1; | |
| - | 211 |   parallel_splits++; | |
| 120 | 
 | 212 | smp_split = 0; | 
| - | 213 | tend = ReadClock(); | |
| - | 214 | thread[tree->thread_id].idle += tend - tstart; | |
| 121 | /* | 215 | /* | 
| 122 |  ************************************************************ | 216 |  ************************************************************ | 
| 123 |  *                                                          * | 217 |  *                                                          * | 
| - | 218 |  *  We have successfully created a split point, which means * | |
| 124 |  *   | 219 |  *  we are done.  The instant we set the "joinable" flag,   * | 
| - | 220 |  *  idle threads may begin to join in at this split point   * | |
| - | 221 |  *  to help.  Since this thread may finish before any or    * | |
| 125 |  *   | 222 |  *  all of the other parallel threads, this thread is sent  * | 
| 126 |  *   | 223 |  *  to ThreadWait() which will immediately send it to       * | 
| - | 224 |  *  SearchMoveList() like the other threads; however, going * | |
| - | 225 |  *  to ThreadWait() allows this thread to join others if it * | |
| - | 226 |  *  runs out of work to do.  We do pass ThreadWait() the    * | |
| - | 227 |  *  address of the parent split block, so that if this      * | |
| 127 |  *   | 228 |  *  thread becomes idle, and this thread block shows no     * | 
| - | 229 |  *  threads are still busy, then this thread can return to  * | |
| - | 230 |  *  here and then back up into the previous ply as it       * | |
| 128 |  *   | 231 |  *  should.  Note that no other thread can back up to the   * | 
| 129 |  *   | 232 |  *  previous ply since their recursive call stacks are not  * | 
| - | 233 |  *  set for that, while this call stack will bring us back  * | |
| - | 234 |  *  to this point where we return to the normal search,     * | |
| 130 |  *   | 235 |  *  which we just completed.                                * | 
| 131 |  *                                                          * | 236 |  *                                                          * | 
| - | 237 |  ************************************************************ | |
| - | 238 |  */ | |
| - | 239 | ThreadWait(tree->thread_id, tree); | |
| - | 240 | if (!tree->joined) | |
| - | 241 |     parallel_splits_wasted++; | |
| - | 242 | return 1; | |
| - | 243 | } | |
| - | 244 | ||
| - | 245 | /* modified 12/16/15 */ | |
| - | 246 | /* | |
| - | 247 |  ******************************************************************************* | |
| - | 248 |  *                                                                             * | |
| - | 249 |  *   Join() is called just when we enter the usual spin-loop waiting for work. * | |
| 132 |  *   | 250 |  *   We take a quick look at all active split blocks to see if any look        * | 
| - | 251 |  *   "joinable".  If so, we compute an "interest" value, which will be defined * | |
| - | 252 |  *   below.  We then join the most interesting split point directly. This      * | |
| - | 253 |  *   split point might have been created specifically for this thread to join, * | |
| - | 254 |  *   or it might be one that was already active when this thread became idle,  * | |
| 133 |  *   | 255 |  *   which allows us to join that existing split point and not request a new   * | 
| 134 |  *   | 256 |  *   split operation, saving time.                                             * | 
| - | 257 |  *                                                                             * | |
| - | 258 |  ******************************************************************************* | |
| - | 259 |  */ | |
| - | 260 | int Join(int64_t tid) { | |
| - | 261 | TREE *tree, *join_block, *child; | |
| - | 262 | int interest, best_interest, current, pass = 0; | |
| - | 263 | ||
| - | 264 | /* | |
| - | 265 |  ************************************************************ | |
| 135 |  *                                                          * | 266 |  *                                                          * | 
| 136 |  *  Special case 1:  In the loop to allocate a split block  * | - | |
| 137 |  *  and copy current data to the new child thread, we skip  * | - | |
| 138 |  *  over the current splitting thread's data.  After the    * | - | |
| 139 |  *   | 267 |  *  First we pass over ALL split blocks, looking for those  * | 
| 140 |  *   | 268 |  *  flagged as "joinable" (which means they are actually    * | 
| 141 |  *   | 269 |  *  active split points and that no processor at that split * | 
| 142 |  *  up through the split block, so it will always call      * | - | |
| 143 |  *  ThreadWait() telling it to do so.  There was a peculiar * | - | |
| 144 |  *  condition that caused an inefficiency.  If we are at    * | - | |
| 145 |  *   | 270 |  *  point has run out of work (there is no point in joining * | 
| 146 |  *   | 271 |  *  a split point with no remaining work) and no fail high  * | 
| 147 |  *   | 272 |  *  has been found which would raise the "stop" flag.) This * | 
| 148 |  *   | 273 |  *  is "racy" because we do not acquire any locks, which    * | 
| 149 |  *   | 274 |  *  means that the busy threads continue working, and there * | 
| 150 |  *   | 275 |  *  is a small probability that the split point will        * | 
| 151 |  *   | 276 |  *  disappear while we are in this loop.  To resolve the    * | 
| 152 |  *   | 277 |  *  potential race, after we find the most attractive split * | 
| 153 |  *   | 278 |  *  point, we acquire the lock for that split block and     * | 
| 154 |  *   | 279 |  *  test again, but this time if the block is joinable, we  * | 
| 155 |  *   | 280 |  *  can safely join under control of the lock, which is not * | 
| 156 |  *   | 281 |  *  held for very long at all.  If the block is not         * | 
| 157 |  *   | 282 |  *  joinable once we acquire the lock, we abort joining     * | 
| 158 |  *   | 283 |  *  since it is futile.  Note that if this happens, we will * | 
| 159 |  *   | 284 |  *  try to find an existing split point we can join three   * | 
| 160 |  *   | 285 |  *  times before we exit, setting split to 1 to ask other   * | 
| 161 |  *   | 286 |  *  threads to produce more candidate split points.         * | 
| 162 |  *                                                          * | 287 |  *                                                          * | 
| - | 288 |  ************************************************************ | |
| - | 289 |  */ | |
| - | 290 | for (pass = 0; pass < 3; pass++) { | |
| - | 291 | best_interest = -999999; | |
| - | 292 | join_block = 0; | |
| - | 293 | for (current = 0; current <= (int) smp_max_threads * 64; current++) { // Pierre-Marie Baty -- added type cast | |
| - | 294 | tree = block[current]; | |
| - | 295 | if (tree->joinable && (tree->ply <= tree->depth / 2 || | |
| - | 296 | tree->nprocs < (int) smp_split_group)) { // Pierre-Marie Baty -- added type cast | |
| - | 297 | interest = tree->depth * 2 - tree->searched[0]; | |
| - | 298 | if (interest > best_interest) { | |
| - | 299 | best_interest = interest; | |
| - | 300 | join_block = tree; | |
| - | 301 |         } | |
| - | 302 |       } | |
| - | 303 |     } | |
| - | 304 | /* | |
| - | 305 |  ************************************************************ | |
| - | 306 |  *                                                          * | |
| - | 307 |  *  Now we acquire the lock for this split block, and then  * | |
| 163 |  *   | 308 |  *  check to see if the block is still flagged as joinable. * | 
| - | 309 |  *  If so, we set things up, and then we get pretty tricky  * | |
| 164 |  *   | 310 |  *  as we then release the lock, and then copy the data     * | 
| 165 |  *   | 311 |  *  from the parent to our split block.  There is a chance  * | 
| 166 |  *   | 312 |  *  that while we are copying this data, the split point    * | 
| - | 313 |  *  gets completed by other threads.  Which would leave an  * | |
| - | 314 |  *  apparent race condition exposed where we start copying  * | |
| 167 |  *   | 315 |  *  data here, the split point is completed, the parent     * | 
| 168 |  *   | 316 |  *  block is released and then reacquired and we continue   * | 
| - | 317 |  *  if nothing has happened here, getting data copied from  * | |
| 169 |  *   | 318 |  *  two different positions.                                * | 
| 170 |  *   | 319 |  *                                                          * | 
| - | 320 |  *  Fortunately, we linked this new split block to the old  * | |
| - | 321 |  *  (original parent).  If that split block is released, we * | |
| - | 322 |  *  will discover this because that operation will also set * | |
| - | 323 |  *  our "stop" flag which will prevent us from using this   * | |
| - | 324 |  *  data and breaking things.  We allow threads to copy     * | |
| - | 325 |  *  this data without any lock protection to eliminate a    * | |
| - | 326 |  *  serialization (each node would copy the data serially,  * | |
| - | 327 |  *  rather than all at once) with the only consequence to   * | |
| - | 328 |  *  this being the overhead of copying and then throwing    * | |
| - | 329 |  *  the data away, which can happen on occasion even if we  * | |
| - | 330 |  *  used a lock for protection, since once we release the   * | |
| - | 331 |  *  lock it still takes time to get into the search and we  * | |
| - | 332 |  *  could STILL find that this split block has already been * | |
| - | 333 |  *  completed, once again.  Less contention and serial      * | |
| - | 334 |  *  computing improves performance.                         * | |
| 171 |  *                                                          * | 335 |  *                                                          * | 
| 172 |  ************************************************************ | 336 |  ************************************************************ | 
| 173 |  */ | 337 |  */ | 
| - | 338 | if (join_block) { | |
| - | 339 | Lock(join_block->lock); | |
| - | 340 | if (join_block->joinable) { | |
| - | 341 | child = GetBlock(join_block, (int) tid); // Pierre-Marie Baty -- added type cast | |
| 174 | 
 | 342 | Unlock(join_block->lock); | 
| 175 | 
 | 343 | if (child) { | 
| 176 | 
 | 344 | CopyFromParent(child); | 
| 177 | 
 | 345 | thread[tid].tree = child; | 
| 178 | 
 | 346 |           parallel_joins++; | 
| 179 | 
 | 347 | return 1; | 
| - | 348 |         } | |
| 180 | 
 | 349 | } else { | 
| 181 | 
 | 350 | Unlock(join_block->lock); | 
| 182 | break; | 351 | break; | 
| - | 352 |       } | |
| 183 |     } | 353 |     } | 
| 184 |   } | 354 |   } | 
| 185 | CopyToChild(tree, tree->thread_id); | - | |
| 186 |   nblocks++; | - | |
| 187 | /* | 355 | /* | 
| 188 |  ************************************************************ | 356 |  ************************************************************ | 
| 189 |  *                                                          * | 357 |  *                                                          * | 
| 190 |  *   | 358 |  *  We did not acquire a split point to join, so we set     * | 
| 191 |  *   | 359 |  *  smp_split to 1 to ask busy threads to create joinable   * | 
| 192 |  *  threads can begin the parallel search.  We also flag    * | - | |
| 193 |  *  them as "not idle" so that a split immediately after we * | - | |
| 194 |  *   | 360 |  *  split points.                                           * | 
| 195 |  *                                                          * | 361 |  *                                                          * | 
| 196 |  ************************************************************ | 362 |  ************************************************************ | 
| 197 |  */ | 363 |  */ | 
| 198 | 
 | 364 | smp_split = 1; | 
| - | 365 | return 0; | |
| - | 366 | } | |
| - | 367 | ||
| - | 368 | /* modified 11/04/15 */ | |
| - | 369 | /* | |
| - | 370 |  ******************************************************************************* | |
| - | 371 |  *                                                                             * | |
| - | 372 |  *   ThreadInit() is called after a process is created.  Its main task is to   * | |
| - | 373 |  *   initialize the process local memory so that it will fault in and be       * | |
| - | 374 |  *   allocated on the local node rather than the node where the original       * | |
| - | 375 |  *   (first) process was running.  All threads will hang here via a custom     * | |
| - | 376 |  *   WaitForALlThreadsInitialized() procedure so that all the local thread     * | |
| - | 377 |  *   blocks are usable before the search actually begins.                      * | |
| - | 378 |  *                                                                             * | |
| - | 379 |  ******************************************************************************* | |
| - | 380 |  */ | |
| - | 381 | void *STDCALL ThreadInit(void *t) { | |
| - | 382 | int tid = (int64_t) t; | |
| - | 383 | ||
| - | 384 | ThreadAffinity(tid); | |
| - | 385 | #  if !defined(UNIX) | |
| - | 386 | ThreadMalloc((uint64_t) tid); | |
| - | 387 | #  endif | |
| 199 | 
 | 388 | thread[tid].blocks = 0xffffffffffffffffull; | 
| - | 389 | Lock(lock_smp); | |
| - | 390 |   initialized_threads++; | |
| - | 391 | Unlock(lock_smp); | |
| - | 392 | WaitForAllThreadsInitialized(); | |
| - | 393 | ThreadWait(tid, (TREE *) 0); | |
| - | 394 | Lock(lock_smp); | |
| - | 395 |   smp_threads--; | |
| - | 396 | Unlock(lock_smp); | |
| - | 397 | return 0; | |
| - | 398 | } | |
| - | 399 | ||
| - | 400 | /* modified 01/08/15 */ | |
| - | 401 | /* | |
| - | 402 |  ******************************************************************************* | |
| - | 403 |  *                                                                             * | |
| - | 404 |  *   ThreadAffinity() is called to "pin" a thread to a specific processor.  It * | |
| - | 405 |  *   is a "noop" (no-operation) if Crafty was not compiled with -DAFFINITY, or * | |
| - | 406 |  *   if smp_affinity is negative (smpaffinity=-1 disables affinity).  It       * | |
| - | 407 |  *   simply sets the affinity for the current thread to the requested CPU and  * | |
| - | 408 |  *   returns.  NOTE:  If hyperthreading is enabled, there is no guarantee that * | |
| - | 409 |  *   this will work as expected and pin one thread per physical core.  It      * | |
| - | 410 |  *   depends on how the O/S numbers the SMT cores.                             * | |
| - | 411 |  *                                                                             * | |
| - | 412 |  ******************************************************************************* | |
| - | 413 |  */ | |
| 200 | 
 | 414 | void ThreadAffinity(int cpu) { | 
| - | 415 | #  if defined(AFFINITY) | |
| - | 416 |   cpu_set_t cpuset; | |
| 201 | 
 | 417 | pthread_t current_thread = pthread_self(); | 
| - | 418 | ||
| 202 | 
 | 419 | if (smp_affinity >= 0) { | 
| - | 420 | CPU_ZERO(&cpuset); | |
| - | 421 | CPU_SET(cpu + smp_affinity, &cpuset); | |
| - | 422 | pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset); | |
| 203 | 
 | 423 |   } | 
| - | 424 | #  endif | |
| - | 425 | } | |
| - | 426 | ||
| - | 427 | /* modified 11/04/15 */ | |
| - | 428 | /* | |
| - | 429 |  ******************************************************************************* | |
| - | 430 |  *                                                                             * | |
| - | 431 |  *   ThreadSplit() is used to determine if we should split at the current ply. * | |
| - | 432 |  *   There are some basic constraints on when splits can be done, such as the  * | |
| - | 433 |  *   depth remaining in the search (don't split to near the tips), and have we * | |
| - | 434 |  *   searched at least one move to get a score or bound (YBW condition).       * | |
| - | 435 |  *                                                                             * | |
| - | 436 |  *   If those conditions are satisfied, AND either a thread has requested a    * | |
| - | 437 |  *   split OR we are far enough away from the tips of the tree to justify a    * | |
| - | 438 |  *   "gratuitout split" then we return "success."  A "gratuitout split" is a   * | |
| - | 439 |  *   split done without any idle threads.  Since splits are not free, we only  * | |
| - | 440 |  *   do this well away from tips to limit overhead.  We do this so that when a * | |
| - | 441 |  *   thread becomes idle, it will find these split points immediately and not  * | |
| - | 442 |  *   have to wait for a split after the fact.                                  * | |
| - | 443 |  *                                                                             * | |
| - | 444 |  ******************************************************************************* | |
| - | 445 |  */ | |
| - | 446 | int ThreadSplit(TREE *RESTRICT tree, int ply, int depth, int alpha, int o_alpha, // Pierre-Marie Baty -- missing keyword | |
| - | 447 | int done) { | |
| - | 448 | TREE *used; | |
| - | 449 | int64_t tblocks; | |
| - | 450 | int temp, unused = 0; | |
| - | 451 | ||
| 204 | /* | 452 | /* | 
| 205 |  ************************************************************ | 453 |  ************************************************************ | 
| 206 |  *                                                          * | 454 |  *                                                          * | 
| 207 |  *  One final test before leaving.  We test all threads to  * | - | |
| 208 |  *   | 455 |  *  First, we see if we meet the basic criteria to create a * | 
| 209 |  *  smp_split_group limit, but a thread could go idle after * | - | |
| 210 |  *  our loop to pick up idle threads).  If so we set split  * | - | |
| 211 |  *  back to one (actually a temporary split variable since  * | - | |
| 212 |  *   | 456 |  *  split point, that being that we must not be too far     * | 
| 213 |  *   | 457 |  *  from the root (smp_min_split_depth).                    * | 
| 214 |  *                                                          * | 458 |  *                                                          * | 
| 215 |  ************************************************************ | 459 |  ************************************************************ | 
| 216 |  */ | 460 |  */ | 
| 217 | 
 | 461 | if (depth < (int) smp_min_split_depth) // Pierre-Marie Baty -- added type cast | 
| 218 | if (thread[proc].idle) | - | |
| 219 | 
 | 462 | return 0; | 
| 220 | /* | 463 | /* | 
| 221 |  ************************************************************ | 464 |  ************************************************************ | 
| 222 |  *                                                          * | 465 |  *                                                          * | 
| 223 |  *   | 466 |  *  If smp_split is NOT set, we are checking to see if it   * | 
| 224 |  *   | 467 |  *  is acceptable to do a gratuitous split here.            * | 
| 225 |  *   | 468 |  *                                                          * | 
| 226 |  *   | 469 |  *  (1) if we are too far from the root we do not do        * | 
| 227 |  *   | 470 |  *      gratuitous splits to avoid the overhead.            * | 
| 228 |  *   | 471 |  *                                                          * | 
| 229 |  *   | 472 |  *  (2) if we have searched more than one move at this ply, * | 
| 230 |  *   | 473 |  *      we don't do any further tests to see if a           * | 
| 231 |  *   | 474 |  *      gratuitous split is acceptable, since we have       * | 
| 232 |  *   | 475 |  *      previously done this test at this ply and decided   * | 
| 233 |  *   | 476 |  *      one should not be done.  That condition has likely  * | 
| 234 |  *   | 477 |  *      not changed.                                        * | 
| - | 478 |  *                                                          * | |
| - | 479 |  *  (3) if we have pre-existing gratuitous split points for * | |
| 235 |  *   | 480 |  *      this thread, we make certain we don't create more   * | 
| 236 |  *   | 481 |  *      than the gratuitous split limit as excessive splits * | 
| 237 |  *   | 482 |  *      just add to the overhead with no benefit.           * | 
| 238 |  *                                                          * | 483 |  *                                                          * | 
| 239 |  ************************************************************ | 484 |  ************************************************************ | 
| 240 |  */ | 485 |  */ | 
| 241 | 
 | 486 | if (!smp_split) { | 
| - | 487 | if (depth < (int) smp_gratuitous_depth || done > 1) // Pierre-Marie Baty -- added type cast | |
| - | 488 | return 0; | |
| - | 489 | tblocks = ~thread[tree->thread_id].blocks; | |
| 242 | 
 | 490 | while (tblocks) { | 
| - | 491 | temp = LSB(tblocks); | |
| - | 492 | used = block[temp + tree->thread_id * 64 + 1]; | |
| - | 493 | if (used->joinable && !used->joined) | |
| - | 494 |         unused++; | |
| - | 495 | Clear(temp, tblocks); | |
| - | 496 |     } | |
| - | 497 | if (unused > (int) smp_gratuitous_limit) // Pierre-Marie Baty -- added type cast | |
| - | 498 | return 0; | |
| - | 499 |   } | |
| - | 500 | /* | |
| - | 501 |  ************************************************************ | |
| - | 502 |  *                                                          * | |
| - | 503 |  *  If smp_split IS set, we are checking to see if it is    * | |
| - | 504 |  *  acceptable to do a split because there are idle threads * | |
| - | 505 |  *  that need work to do.                                   * | |
| - | 506 |  *                                                          * | |
| - | 507 |  *  The only reason this would be false is if we have a     * | |
| - | 508 |  *  pre-existing split point that is joinable but has not   * | |
| - | 509 |  *  been joined. If one exists, there is no need to split   * | |
| - | 510 |  *  again as there is already an accessible split point.    * | |
| - | 511 |  *  Otherwise, if we are at the root and we are either not  * | |
| - | 512 |  *  allowed to split at the root, or we have additional     * | |
| - | 513 |  *  root moves that have to be searched one at a time using * | |
| - | 514 |  *  all available threads we also can not split here.       * | |
| - | 515 |  *                                                          * | |
| - | 516 |  ************************************************************ | |
| - | 517 |  */ | |
| - | 518 | else { | |
| - | 519 | if (ply == 1 && (!smp_split_at_root || !NextRootMoveParallel() || | |
| - | 520 | alpha == o_alpha)) | |
| - | 521 | return 0; | |
| 243 | 
 | 522 | tblocks = ~thread[tree->thread_id].blocks; | 
| - | 523 | while (tblocks) { | |
| - | 524 | temp = LSB(tblocks); | |
| - | 525 | used = block[temp + tree->thread_id * 64 + 1]; | |
| - | 526 | if (used->joinable && !used->joined) | |
| - | 527 |         unused++; | |
| - | 528 | Clear(temp, tblocks); | |
| - | 529 |     } | |
| - | 530 | if (unused > (int) smp_gratuitous_limit) // Pierre-Marie Baty -- added type cast | |
| - | 531 | return 0; | |
| - | 532 |   } | |
| 244 | return 1; | 533 | return 1; | 
| 245 | } | 534 | } | 
| 246 | 535 | ||
| 247 | /* modified 04/ | 536 | /* modified 11/04/15 */ | 
| 248 | /* | 537 | /* | 
| 249 |  ******************************************************************************* | 538 |  ******************************************************************************* | 
| 250 |  *                                                                             * | 539 |  *                                                                             * | 
| 251 |  *    | 540 |  *   ThreadStop() is called from SearchMoveList() when it detects a beta       * | 
| - | 541 |  *   cutoff (fail high) at a node that is being searched in parallel.  We need * | |
| - | 542 |  *   to stop all threads here, and since this threading algorithm is recursive * | |
| - | 543 |  *   it may be necessary to stop other threads that are helping search this    * | |
| - | 544 |  *   branch further down into the tree.  This function simply sets appropriate * | |
| 252 |  *    | 545 |  *   tree->stop variables to 1, which will stop those particular threads       * | 
| 253 |  *    | 546 |  *   instantly and return them to the idle loop in ThreadWait().               * | 
| 254 |  *                                                                             * | 547 |  *                                                                             * | 
| 255 |  ******************************************************************************* | 548 |  ******************************************************************************* | 
| 256 |  */ | 549 |  */ | 
| - | 550 | void ThreadStop(TREE * RESTRICT tree) { | |
| - | 551 | int proc; | |
| - | 552 | ||
| - | 553 | Lock(tree->lock); | |
| - | 554 | tree->stop = 1; | |
| - | 555 | tree->joinable = 0; | |
| - | 556 | for (proc = 0; proc < (int) smp_max_threads; proc++) // Pierre-Marie Baty -- added type cast | |
| - | 557 | if (tree->siblings[proc]) | |
| - | 558 | ThreadStop(tree->siblings[proc]); | |
| - | 559 | Unlock(tree->lock); | |
| - | 560 | } | |
| - | 561 | ||
| - | 562 | /* modified 11/04/15 */ | |
| - | 563 | /* | |
| - | 564 |  ******************************************************************************* | |
| - | 565 |  *                                                                             * | |
| - | 566 |  *   ThreadTrace() is a debugging tool that simply walks the split block tree  * | |
| - | 567 |  *   and displays interesting data to help debug the parallel search whenever  * | |
| - | 568 |  *   changes break things.                                                     * | |
| - | 569 |  *                                                                             * | |
| - | 570 |  ******************************************************************************* | |
| - | 571 |  */ | |
| 257 | void | 572 | void ThreadTrace(TREE * RESTRICT tree, int depth, int brief) { | 
| - | 573 | int proc, i; | |
| - | 574 | ||
| - | 575 | Lock(tree->lock); | |
| - | 576 | Lock(lock_io); | |
| - | 577 | if (!brief) { | |
| - | 578 | for (i = 0; i < 4 * depth; i++) | |
| - | 579 | Print(4095, " "); | |
| - | 580 |     depth++; | |
| - | 581 | Print(4095, "block[%d] thread=%d ply=%d nprocs=%d ", | |
| - | 582 | FindBlockID(tree), tree->thread_id, tree->ply, tree->nprocs); | |
| - | 583 | Print(4095, "joined=%d joinable=%d stop=%d nodes=%d", tree->joined, | |
| - | 584 | tree->joinable, tree->stop, tree->nodes_searched); | |
| - | 585 | Print(4095, " parent=%d\n", FindBlockID(tree->parent)); | |
| - | 586 | } else { | |
| - | 587 | if (tree->nprocs > 1) { | |
| - | 588 | for (i = 0; i < 4 * depth; i++) | |
| - | 589 | Print(4095, " "); | |
| - | 590 |       depth++; | |
| 258 | 
 | 591 | Print(4095, "(ply %d)", tree->ply); | 
| - | 592 |     } | |
| - | 593 |   } | |
| - | 594 | if (tree->nprocs) { | |
| - | 595 | if (!brief) { | |
| - | 596 | for (i = 0; i < 4 * depth; i++) | |
| - | 597 | Print(4095, " "); | |
| - | 598 | Print(4095, " parent=%d sibling threads=", | |
| - | 599 | FindBlockID(tree->parent)); | |
| 259 | 
 | 600 | for (proc = 0; proc < (int) smp_max_threads; proc++) // Pierre-Marie Baty -- added type cast | 
| - | 601 | if (tree->siblings[proc]) | |
| - | 602 | Print(4095, " %d(%d)", proc, FindBlockID(tree->siblings[proc])); | |
| - | 603 | Print(4095, "\n"); | |
| 260 | 
 | 604 | } else { | 
| 261 | 
 | 605 | if (tree->nprocs > 1) { | 
| - | 606 | Print(4095, " helping= "); | |
| - | 607 | for (proc = 0; proc < (int) smp_max_threads; proc++) // Pierre-Marie Baty -- added type cast | |
| - | 608 | if (tree->siblings[proc]) { | |
| 262 | 
 | 609 | if (proc == tree->thread_id) | 
| - | 610 | Print(4095, "["); | |
| - | 611 | Print(4095, "%d", proc); | |
| 263 | 
 | 612 | if (proc == tree->thread_id) | 
| - | 613 | Print(4095, "]"); | |
| - | 614 | Print(4095, " "); | |
| - | 615 |           } | |
| - | 616 | Print(4095, "\n"); | |
| - | 617 |       } | |
| - | 618 |     } | |
| - | 619 |   } | |
| - | 620 | Unlock(lock_io); | |
| - | 621 | for (proc = 0; proc < (int) smp_max_threads; proc++) // Pierre-Marie Baty -- added type cast | |
| - | 622 | if (tree->siblings[proc]) | |
| - | 623 | ThreadTrace(tree->siblings[proc], depth, brief); | |
| - | 624 | Unlock(tree->lock); | |
| - | 625 | } | |
| - | 626 | ||
| - | 627 | /* modified 11/04/15 */ | |
| - | 628 | /* | |
| - | 629 |  ******************************************************************************* | |
| - | 630 |  *                                                                             * | |
| - | 631 |  *   ThreadWait() is the idle loop for the N threads that are created at the   * | |
| - | 632 |  *   beginning when Crafty searches.  Threads are "parked" here waiting on a   * | |
| - | 633 |  *   pointer to something they should search (a parameter block built in the   * | |
| - | 634 |  *   function Split() in this case.  When this pointer becomes non-zero, each  * | |
| - | 635 |  *   thread "parked" here will immediately call SearchMoveList() and begin the * | |
| - | 636 |  *   parallel search as directed.                                              * | |
| - | 637 |  *                                                                             * | |
| - | 638 |  *   Upon entry, all threads except for the "master" will arrive here with a   * | |
| - | 639 |  *   value of zero (0) in the waiting parameter below.  This indicates that    * | |
| - | 640 |  *   they will search and them be done.  The "master" will arrive here with a  * | |
| - | 641 |  *   pointer to the parent split block in "waiting" which says I will sit here * | |
| - | 642 |  *   waiting for work OR when the waiting split block has no threads working   * | |
| - | 643 |  *   on it, at which point I should return which will let me "unsplit" here    * | |
| - | 644 |  *   and clean things up.  The call to here in Split() passes this block       * | |
| - | 645 |  *   address while threads that are helping get here with a zero.              * | |
| - | 646 |  *                                                                             * | |
| - | 647 |  ******************************************************************************* | |
| - | 648 |  */ | |
| 264 | 
 | 649 | int ThreadWait(int tid, TREE * RESTRICT waiting) { | 
| - | 650 | int value, tstart, tend; | |
| 265 | 651 | ||
| 266 | /* | 652 | /* | 
| 267 |  ************************************************************ | 653 |  ************************************************************ | 
| 268 |  *                                                          * | 654 |  *                                                          * | 
| 269 |  *   | 655 |  *  When we reach this point, one of three possible         * | 
| 270 |  *   | 656 |  *  conditions is true (1) we already have work to do, as   * | 
| 271 |  *   | 657 |  *  we are the "master thread" and we have already split    * | 
| 272 |  *   | 658 |  *  the tree, we are coming here to join in;  (2) we are    * | 
| 273 |  *   | 659 |  *  the master, and we are waiting on our split point to    * | 
| 274 |  *   | 660 |  *  complete, so we come here to join and help currently    * | 
| - | 661 |  *  active threads;  (3) we have no work to do, so we will  * | |
| - | 662 |  *  spin until Join() locates a split pont we can join to   * | |
| - | 663 |  *  help out.                                               * | |
| - | 664 |  *                                                          * | |
| 275 |  *   | 665 |  *  Note that when we get here, the parent already has a    * | 
| - | 666 |  *  split block and does not need to call Join(), it simply * | |
| 276 |  *   | 667 |  *  falls through the while spin loop below because its     * | 
| 277 |  *   | 668 |  *  "tree" pointer is already non-zero.                     * | 
| 278 |  *                                                          * | 669 |  *                                                          * | 
| 279 |  ************************************************************ | 670 |  ************************************************************ | 
| 280 |  */ | 671 |  */ | 
| - | 672 | while (1) { | |
| 281 | 
 | 673 | tstart = ReadClock(); | 
| - | 674 | while (!thread[tid].tree && (!waiting || waiting->nprocs) && !Join(tid) && | |
| - | 675 | !thread[tid].terminate) | |
| 282 | 
 | 676 | Pause(); | 
| - | 677 | tend = ReadClock(); | |
| 283 | if ( | 678 | if (!thread[tid].tree) | 
| - | 679 | thread[tid].tree = waiting; | |
| - | 680 | thread[tid].idle += tend - tstart; | |
| - | 681 | if (thread[tid].tree == waiting || thread[tid].terminate) | |
| 284 | 
 | 682 | return 0; | 
| - | 683 | /* | |
| - | 684 |  ************************************************************ | |
| - | 685 |  *                                                          * | |
| - | 686 |  *  Once we get here, we have a good split point, so we are * | |
| - | 687 |  *  ready to participate in a parallel search.  Once we     * | |
| - | 688 |  *  return from SearchMoveList() we copy our results back   * | |
| - | 689 |  *  to the parent via CopyToParent() before we look for a   * | |
| - | 690 |  *  new split point.  If we are a parent, we will slip out  * | |
| 285 | 
 | 691 | * of the spin loop at the top and return to the normal * | 
| - | 692 |  *  serial search to finish up here.                        * | |
| - | 693 |  *                                                          * | |
| - | 694 |  *  When we return from SearchMoveList(), we need to        * | |
| - | 695 |  *  decrement the "nprocs" value since there is now one     * | |
| - | 696 |  *  less thread working at this split point.                * | |
| - | 697 |  *                                                          * | |
| - | 698 |  *  Note:  CopyToParent() marks the current split block as  * | |
| - | 699 |  *  unused once the copy is completed, so we don't have to  * | |
| - | 700 |  *  do anything about that here.                            * | |
| - | 701 |  *                                                          * | |
| - | 702 |  ************************************************************ | |
| - | 703 |  */ | |
| 286 | 
 | 704 |     value = | 
| - | 705 | SearchMoveList(thread[tid].tree, thread[tid].tree->ply, | |
| - | 706 | thread[tid].tree->depth, thread[tid].tree->wtm, | |
| 287 | 
 | 707 | thread[tid].tree->alpha, thread[tid].tree->beta, | 
| - | 708 | thread[tid].tree->searched, thread[tid].tree->in_check, 0, parallel); | |
| 288 | 
 | 709 | tstart = ReadClock(); | 
| - | 710 | Lock(thread[tid].tree->parent->lock); | |
| - | 711 | thread[tid].tree->parent->joinable = 0; | |
| 289 | 
 | 712 | CopyToParent((TREE *) thread[tid].tree->parent, thread[tid].tree, value); | 
| - | 713 | thread[tid].tree->parent->nprocs--; | |
| - | 714 | thread[tid].tree->parent->siblings[tid] = 0; | |
| - | 715 | Unlock(thread[tid].tree->parent->lock); | |
| 290 | 
 | 716 | thread[tid].tree = 0; | 
| 291 |      | 717 | tend = ReadClock(); | 
| - | 718 | thread[tid].idle += tend - tstart; | |
| 292 |   } | 719 |   } | 
| - | 720 | } | |
| - | 721 | ||
| 293 | 
 | 722 | /* modified 11/04/15 */ | 
| - | 723 | /* | |
| - | 724 |  ******************************************************************************* | |
| - | 725 |  *                                                                             * | |
| - | 726 |  *   CopyFromParent() is used to copy data from a parent thread to a child     * | |
| - | 727 |  *   thread.  This only copies the appropriate parts of the TREE structure to  * | |
| - | 728 |  *   avoid burning memory bandwidth by copying everything.                     * | |
| - | 729 |  *                                                                             * | |
| - | 730 |  ******************************************************************************* | |
| - | 731 |  */ | |
| 294 | 
 | 732 | void CopyFromParent(TREE * RESTRICT child) { | 
| 295 | 
 | 733 | TREE *parent = child->parent; | 
| 296 | 
 | 734 | int i, ply; | 
| 297 | max_split_blocks = Max(max_split_blocks, max_); | - | |
| - | 735 | ||
| 298 | /* | 736 | /* | 
| 299 |  ************************************************************ | 737 |  ************************************************************ | 
| 300 |  *                                                          * | 738 |  *                                                          * | 
| 301 |  *  We have allocated a split block.  Now we copy the tree  * | 739 |  *  We have allocated a split block.  Now we copy the tree  * | 
| 302 |  *  search state from the parent block to the child in      * | 740 |  *  search state from the parent block to the child in      * | 
| 303 |  *  preparation for starting the parallel search.           * | 741 |  *  preparation for starting the parallel search.           * | 
| 304 |  *                                                          * | 742 |  *                                                          * | 
| 305 |  ************************************************************ | 743 |  ************************************************************ | 
| 306 |  */ | 744 |  */ | 
| 307 | 
 | 745 | ply = parent->ply; | 
| 308 | child->used = 1; | - | |
| 309 | child->stop = 0; | - | |
| 310 | for (i = 0; i < smp_max_threads; i++) | - | |
| 311 | child->siblings[i] = 0; | - | |
| 312 | child->nprocs = 0; | - | |
| 313 | child->ply = ply; | 746 | child->ply = ply; | 
| 314 | child->position = parent->position; | 747 | child->position = parent->position; | 
| 315 | child->pv[ply - 1] = parent->pv[ply - 1]; | - | |
| 316 | child->pv[ply] = parent->pv[ply]; | - | |
| 317 | child->next_status[ply] = parent->next_status[ply]; | - | |
| 318 | child->save_hash_key[ply] = parent->save_hash_key[ply]; | - | |
| 319 | child->save_pawn_hash_key[ply] = parent->save_pawn_hash_key[ply]; | - | |
| 320 | child->rep_index = parent->rep_index; | - | |
| 321 | for (i = 0; i <= | 748 | for (i = 0; i <= rep_index + parent->ply; i++) | 
| 322 | child->rep_list[i] = parent->rep_list[i]; | 749 | child->rep_list[i] = parent->rep_list[i]; | 
| 323 | 
 | 750 | for (i = ply - 1; i < MAXPLY; i++) | 
| 324 | child-> | 751 | child->killers[i] = parent->killers[i]; | 
| 325 | for (i = 1; i <= ply | 752 | for (i = ply - 1; i <= ply; i++) { | 
| 326 | child->status[i] = parent->status[i]; | - | |
| 327 | child->curmv[i] = parent->curmv[i]; | 753 | child->curmv[i] = parent->curmv[i]; | 
| 328 | child->inchk[i] = parent->inchk[i]; | - | |
| 329 | child-> | 754 | child->pv[i] = parent->pv[i]; | 
| 330 |   } | 755 |   } | 
| 331 | 
 | 756 | child->in_check = parent->in_check; | 
| - | 757 | child->last[ply] = child->move_list; | |
| - | 758 | child->status[ply] = parent->status[ply]; | |
| 332 | 
 | 759 | child->status[1] = parent->status[1]; | 
| - | 760 | child->save_hash_key[ply] = parent->save_hash_key[ply]; | |
| - | 761 | child->save_pawn_hash_key[ply] = parent->save_pawn_hash_key[ply]; | |
| 333 | child->nodes_searched = 0; | 762 | child->nodes_searched = 0; | 
| 334 | child->fail_highs = 0; | 763 | child->fail_highs = 0; | 
| 335 | child->fail_high_first_move = 0; | 764 | child->fail_high_first_move = 0; | 
| 336 | child->evaluations = 0; | 765 | child->evaluations = 0; | 
| 337 | child->egtb_probes = 0; | 766 | child->egtb_probes = 0; | 
| 338 | child-> | 767 | child->egtb_hits = 0; | 
| 339 | child->extensions_done = 0; | 768 | child->extensions_done = 0; | 
| 340 | child->qchecks_done = 0; | 769 | child->qchecks_done = 0; | 
| 341 | child->reductions_done = 0; | - | |
| 342 | child->moves_fpruned = 0; | 770 | child->moves_fpruned = 0; | 
| - | 771 | child->moves_mpruned = 0; | |
| - | 772 | for (i = 0; i < 16; i++) { | |
| - | 773 | child->LMR_done[i] = 0; | |
| - | 774 | child->null_done[i] = 0; | |
| - | 775 |   } | |
| 343 | child->alpha = parent->alpha; | 776 | child->alpha = parent->alpha; | 
| 344 | child->beta = parent->beta; | 777 | child->beta = parent->beta; | 
| 345 | child->value = parent->value; | 778 | child->value = parent->value; | 
| 346 | child-> | 779 | child->wtm = parent->wtm; | 
| 347 | child->depth = parent->depth; | 780 | child->depth = parent->depth; | 
| 348 | parent->siblings[thread] = child; | - | |
| 349 | child->thread_id = thread; | - | |
| 350 | child-> | 781 | child->searched = parent->searched; | 
| 351 | parent->nprocs++; | - | |
| 352 | 
 | 782 | strcpy(child->root_move_text, parent->root_move_text); | 
| 353 | 
 | 783 | strcpy(child->remaining_moves_text, parent->remaining_moves_text); | 
| 354 | } | 784 | } | 
| 355 | 785 | ||
| 356 | /* modified 04/ | 786 | /* modified 11/04/15 */ | 
| 357 | /* | 787 | /* | 
| 358 |  ******************************************************************************* | 788 |  ******************************************************************************* | 
| 359 |  *                                                                             * | 789 |  *                                                                             * | 
| 360 |  *   CopyToParent() is used to copy data from a child thread to a parent       * | 790 |  *   CopyToParent() is used to copy data from a child thread to a parent       * | 
| 361 |  *   thread.  This only copies the appropriate parts of the TREE structure to  * | 791 |  *   thread.  This only copies the appropriate parts of the TREE structure to  * | 
| 362 |  *   avoid burning memory bandwidth by copying everything.                     * | 792 |  *   avoid burning memory bandwidth by copying everything.                     * | 
| 363 |  *                                                                             * | 793 |  *                                                                             * | 
| 364 |  ******************************************************************************* | 794 |  ******************************************************************************* | 
| 365 |  */ | 795 |  */ | 
| 366 | void CopyToParent(TREE * RESTRICT parent, TREE * RESTRICT child, int value) { | 796 | void CopyToParent(TREE * RESTRICT parent, TREE * RESTRICT child, int value) { | 
| 367 | int ply = parent->ply; | 797 | int i, ply = parent->ply, which; | 
| 368 | 798 | ||
| 369 | /* | 799 | /* | 
| 370 |  ************************************************************ | 800 |  ************************************************************ | 
| 371 |  *                                                          * | 801 |  *                                                          * | 
| 372 |  *   | 802 |  *  The only concern here is to make sure that the info is  * | 
| 373 |  *  copied to the parent if our score is > than the  | 803 |  *  only copied to the parent if our score is > than the    * | 
| 374 |  *  value, and that we were not stopped for any | 804 |  *  parent value, and that we were not stopped for any      * | 
| 375 |  *  which could produce a partial score that is  | 805 |  *  reason which could produce a partial score that is      * | 
| - | 806 |  *  worthless and dangerous to use.                         * | |
| - | 807 |  *                                                          * | |
| - | 808 |  *  One important special case.  If we get here with the    * | |
| - | 809 |  *  thread->stop flag set, and ply is 1, then we need to    * | |
| - | 810 |  *  clear the "this move has been searched" flag in the ply * | |
| - | 811 |  *  1 move list since we did not complete the search.  If   * | |
| - | 812 |  *  we fail to do this, then a move being searched in       * | |
| - | 813 |  *  parallel at the root will be "lost" for this iteration  * | |
| - | 814 |  *  and won't be searched again until the next iteration.   * | |
| 376 |  *                                                          * | 815 |  *                                                          * | 
| 377 |  *  In any case, we add our statistical counters to the     * | 816 |  *  In any case, we add our statistical counters to the     * | 
| 378 |  *  parent's totals no matter whether we finished or not    * | 817 |  *  parent's totals no matter whether we finished or not    * | 
| 379 |  *  since the total nodes searched and such should consider * | 818 |  *  since the total nodes searched and such should consider * | 
| 380 |  *  everything searched, not just the "useful stuff."       * | 819 |  *  everything searched, not just the "useful stuff."       * | 
| - | 820 |  *                                                          * | |
| - | 821 |  *  After we finish copying everything, we mark this split  * | |
| - | 822 |  *  block as free in the split block bitmap.                * | |
| 381 |  *                                                          * | 823 |  *                                                          * | 
| 382 |  ************************************************************ | 824 |  ************************************************************ | 
| 383 |  */ | 825 |  */ | 
| 384 | if (child->nodes_searched && !child->stop && value > parent->value && | 826 | if (child->nodes_searched && !child->stop && value > parent->value && | 
| 385 | !abort_search) { | 827 | !abort_search) { | 
| 386 | parent->pv[ply] = child->pv[ply]; | 828 | parent->pv[ply] = child->pv[ply]; | 
| 387 | parent->value = value; | 829 | parent->value = value; | 
| 388 | parent->cutmove = child->curmv[ply]; | 830 | parent->cutmove = child->curmv[ply]; | 
| 389 |   } | 831 |   } | 
| - | 832 | if (child->stop && ply == 1) | |
| - | 833 | for (which = 0; which < n_root_moves; which++) | |
| - | 834 | if (root_moves[which].move == child->curmv[ply]) { | |
| - | 835 | root_moves[which].status &= 7; | |
| - | 836 | break; | |
| - | 837 |       } | |
| 390 | parent->nodes_searched += child->nodes_searched; | 838 | parent->nodes_searched += child->nodes_searched; | 
| 391 | parent->fail_highs += child->fail_highs; | 839 | parent->fail_highs += child->fail_highs; | 
| 392 | parent->fail_high_first_move += child->fail_high_first_move; | 840 | parent->fail_high_first_move += child->fail_high_first_move; | 
| 393 | parent->evaluations += child->evaluations; | 841 | parent->evaluations += child->evaluations; | 
| 394 | parent->egtb_probes += child->egtb_probes; | 842 | parent->egtb_probes += child->egtb_probes; | 
| 395 | parent-> | 843 | parent->egtb_hits += child->egtb_hits; | 
| 396 | parent->extensions_done += child->extensions_done; | 844 | parent->extensions_done += child->extensions_done; | 
| 397 | parent->qchecks_done += child->qchecks_done; | 845 | parent->qchecks_done += child->qchecks_done; | 
| 398 | parent->reductions_done += child->reductions_done; | - | |
| 399 | parent->moves_fpruned += child->moves_fpruned; | 846 | parent->moves_fpruned += child->moves_fpruned; | 
| - | 847 | parent->moves_mpruned += child->moves_mpruned; | |
| 400 | 
 | 848 | for (i = 1; i < 16; i++) { | 
| - | 849 | parent->LMR_done[i] += child->LMR_done[i]; | |
| - | 850 | parent->null_done[i] += child->null_done[i]; | |
| - | 851 |   } | |
| - | 852 | which = FindBlockID(child) - 64 * child->thread_id - 1; | |
| - | 853 | Set(which, thread[child->thread_id].blocks); | |
| 401 | } | 854 | } | 
| 402 | 855 | ||
| - | 856 | /* modified 11/04/15 */ | |
| 403 | /* | 857 | /* | 
| 404 |  ******************************************************************************* | 858 |  ******************************************************************************* | 
| 405 |  *                                                                             * | 859 |  *                                                                             * | 
| 406 |  *    | 860 |  *   GetBlock() is used to allocate a split block and fill in only SMP-        * | 
| 407 |  *    | 861 |  *   critical information.  The child process will copy the rest of the split  * | 
| 408 |  *    | 862 |  *   block information as needed.                                              * | 
| 409 |  *    | 863 |  *                                                                             * | 
| - | 864 |  *   When we arrive here, the parent split block must be locked since we are   * | |
| - | 865 |  *   going to change data in that block as well as copy data from that block   * | |
| - | 866 |  *   the current split block.  The only exception is when this is the original * | |
| - | 867 |  *   split operation, since this is done "out of sight" of other threads which * | |
| - | 868 |  *   means no locks are needed until after the "joinable" flag is set, which   * | |
| - | 869 |  *   exposes this split point to other threads instantly.                      * | |
| 410 |  *                                                                             * | 870 |  *                                                                             * | 
| 411 |  ******************************************************************************* | 871 |  ******************************************************************************* | 
| 412 |  */ | 872 |  */ | 
| - | 873 | TREE *GetBlock(TREE * /*RESTRICT*/ parent, int tid) { // Pierre-Marie Baty -- keyword not present in declaration | |
| - | 874 | TREE *child; | |
| - | 875 | static int warnings = 0; | |
| - | 876 | int i, unused; | |
| - | 877 | /* | |
| - | 878 |  ************************************************************ | |
| - | 879 |  *                                                          * | |
| - | 880 |  *  One NUMA-related trick is that we only allocate a split * | |
| - | 881 |  *  block in the thread's local memory.  Each thread has a  * | |
| - | 882 |  *  group of split blocks that were first touched by the    * | |
| - | 883 |  *  correct CPU so that the split blocks page-faulted into  * | |
| - | 884 |  *  local memory for that specific processor.  If we can't  * | |
| - | 885 |  *  find an optimally-placed block, we return a zero which  * | |
| - | 886 |  *  will prevent this thread from joining the split point.  * | |
| - | 887 |  *  This is highly unlikely as it would mean the current    * | |
| - | 888 |  *  thread has 64 active split blocks where it is waiting   * | |
| - | 889 |  *  on other threads to complete the last bit of work at    * | |
| - | 890 |  *  each.  This is extremely unlikely.                      * | |
| - | 891 |  *                                                          * | |
| - | 892 |  *  Here we use a simple 64 bit bit-map per thread that     * | |
| - | 893 |  *  indicates which blocks are free (1) and which blocks    * | |
| - | 894 |  *  used (0).  We simply use LSB() to find the rightmost    * | |
| - | 895 |  *  one-bit set and that is the local block number.  We     * | |
| - | 896 |  *  convert that to a global block number by doing the      * | |
| - | 897 |  *  simple computation:                                     * | |
| - | 898 |  *                                                          * | |
| - | 899 |  *     global = local + 64 * tid + 1                        * | |
| - | 900 |  *                                                          * | |
| - | 901 |  *  Each thread has exactly 64 split blocks, and block 0    * | |
| - | 902 |  *  is the "master block" that never gets allocated or      * | |
| - | 903 |  *  freed.  Once we find a free block for the current       * | |
| - | 904 |  *  thread, we zero that bit so that the block won't be     * | |
| - | 905 |  *  used again until it is released.                        * | |
| - | 906 |  *                                                          * | |
| - | 907 |  ************************************************************ | |
| - | 908 |  */ | |
| 413 | 
 | 909 | if (thread[tid].blocks) { | 
| - | 910 | unused = LSB(thread[tid].blocks); | |
| - | 911 | Clear(unused, thread[tid].blocks); | |
| - | 912 | Set(unused, thread[tid].max_blocks); | |
| - | 913 | } else { | |
| - | 914 | if (++warnings < 6) | |
| - | 915 | Print(2048, | |
| - | 916 | "WARNING. local SMP block cannot be allocated, thread %d\n", tid); | |
| - | 917 | return 0; | |
| - | 918 |   } | |
| - | 919 | child = block[unused + tid * 64 + 1]; | |
| - | 920 | /* | |
| - | 921 |  ************************************************************ | |
| - | 922 |  *                                                          * | |
| - | 923 |  *  Found a split block.  Now we need to fill in only the   * | |
| - | 924 |  *  critical information that can't be delayed due to race  * | |
| - | 925 |  *  conditions.  When we get here, the parent split block   * | |
| - | 926 |  *  must be locked, that lets us safely update the number   * | |
| - | 927 |  *  of processors working here, etc, without any ugly race  * | |
| - | 928 |  *  conditions that would corrupt this critical data.       * | |
| - | 929 |  *                                                          * | |
| - | 930 |  ************************************************************ | |
| - | 931 |  */ | |
| 414 | 
 | 932 | for (i = 0; i < (int) smp_max_threads; i++) // Pierre-Marie Baty -- added type cast | 
| - | 933 | child->siblings[i] = 0; | |
| - | 934 | child->nprocs = 0; | |
| - | 935 | child->stop = 0; | |
| - | 936 | child->joinable = 0; | |
| - | 937 | child->joined = 0; | |
| - | 938 | child->parent = parent; | |
| - | 939 | child->thread_id = tid; | |
| - | 940 | parent->nprocs++; | |
| - | 941 | parent->siblings[tid] = child; | |
| - | 942 | parent->joined = 1; | |
| - | 943 | return child; | |
| 415 | } | 944 | } | 
| 416 | 945 | ||
| 417 | /* modified 06/07/09 */ | - | |
| 418 | /* | 946 | /* | 
| 419 |  ******************************************************************************* | 947 |  ******************************************************************************* | 
| 420 |  *                                                                             * | 948 |  *                                                                             * | 
| 421 |  *    | 949 |  *   WaitForAllThreadsInitialized() waits until all smp_max_threads are        * | 
| 422 |  *    | 950 |  *   initialized.  We have to initialize each thread and malloc() its split    * | 
| 423 |  *    | 951 |  *   blocks before we start the actual parallel search.  Otherwise we will see * | 
| 424 |  *   (first) process was running.  All threads will hang here via a custom     * | - | |
| 425 |  *   WaitForALlThreadsInitialized() procedure so that all the local thread     * | - | |
| 426 |  *    | 952 |  *   invalid memory accesses and crash instantly.                              * | 
| 427 |  *                                                                             * | 953 |  *                                                                             * | 
| 428 |  ******************************************************************************* | 954 |  ******************************************************************************* | 
| 429 |  */ | 955 |  */ | 
| 430 | void | 956 | void WaitForAllThreadsInitialized(void) { | 
| 431 | int i, j; | - | |
| 432 | #if defined(AFFINITY) | - | |
| 433 | int64_t k; | - | |
| 434 |   cpu_set_t cpuset; | - | |
| 435 | pthread_t current_thread = pthread_self(); | - | |
| 436 | - | ||
| 437 | k = (int64_t) tid; | - | |
| 438 | CPU_ZERO(&cpuset); | - | |
| 439 | CPU_SET(k, &cpuset); | - | |
| 440 | pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset); | - | |
| 441 | #endif | - | |
| 442 | #if !defined(UNIX) | - | |
| 443 | ThreadMalloc((uint64_t) tid); | - | |
| 444 | #endif | - | |
| 445 | for (i = 0; i < MAX_BLOCKS_PER_CPU; i++) { | - | |
| 446 | 
 | 957 | while (initialized_threads < (int) smp_max_threads); /* Do nothing */ // Pierre-Marie Baty -- added type cast | 
| 447 | sizeof(TREE)); | - | |
| 448 | block[(uint64_t) tid * MAX_BLOCKS_PER_CPU + i + 1]->used = 0; | - | |
| 449 | block[(uint64_t) tid * MAX_BLOCKS_PER_CPU + i + 1]->parent = NULL; | - | |
| 450 | LockInit(block[(uint64_t) tid * MAX_BLOCKS_PER_CPU + i + 1]->lock); | - | |
| 451 | for (j = 0; j < 64; j++) | - | |
| 452 | block[(uint64_t) tid * MAX_BLOCKS_PER_CPU + i + 1]->cache_n[j] = ~0ull; | - | |
| 453 |   } | - | |
| 454 | Lock(lock_smp); | - | |
| 455 |   initialized_threads++; | - | |
| 456 | Unlock(lock_smp); | - | |
| 457 | WaitForAllThreadsInitialized(); | - | |
| 458 | ThreadWait((uint64_t) tid, (TREE *) 0); | - | |
| 459 | Lock(lock_smp); | - | |
| 460 |   smp_threads--; | - | |
| 461 | Unlock(lock_smp); | - | |
| 462 | return 0; | - | |
| 463 | } | 958 | } | 
| 464 | 959 | ||
| 465 |  | 960 | #  if !defined (UNIX) | 
| 466 | /* modified 01/17/09 */ | 961 | /* modified 01/17/09 */ | 
| 467 | /* | 962 | /* | 
| 468 |  ******************************************************************************* | 963 |  ******************************************************************************* | 
| 469 |  *                                                                             * | 964 |  *                                                                             * | 
| 470 |  *   ThreadMalloc() is called from the ThreadInit() function.  It malloc's the * | 965 |  *   ThreadMalloc() is called from the ThreadInit() function.  It malloc's the * | 
| Line 473... | Line 968... | ||
| 473 |  *                                                                             * | 968 |  *                                                                             * | 
| 474 |  ******************************************************************************* | 969 |  ******************************************************************************* | 
| 475 |  */ | 970 |  */ | 
| 476 | extern void *WinMalloc(size_t, int); | 971 | extern void *WinMalloc(size_t, int); | 
| 477 | void ThreadMalloc(int64_t tid) { | 972 | void ThreadMalloc(int64_t tid) { | 
| 478 | int i | 973 | int i; | 
| 479 | 974 | ||
| 480 | for (i = | 975 | for (i = (int) tid * 64 + 1; i < (int) tid * 64 + 65; i++) { // Pierre-Marie Baty -- added type casts | 
| 481 | if (block[i] == NULL) | 976 | if (block[i] == NULL) | 
| 482 | block[i] = | 977 | block[i] = | 
| 483 | (TREE *) ((~(size_t) 127) & (127 + (size_t) WinMalloc(sizeof(TREE) + | 978 | (TREE *) ((~(size_t) 127) & (127 + (size_t) WinMalloc(sizeof(TREE) + | 
| 484 | 127, (int) tid))); // Pierre-Marie Baty -- added type cast | 979 | 127, (int) tid))); // Pierre-Marie Baty -- added type cast | 
| 485 | block[i]->used = 0; | - | |
| 486 | block[i]->parent = NULL; | 980 | block[i]->parent = NULL; | 
| 487 | LockInit(block[i]->lock); | 981 | LockInit(block[i]->lock); | 
| 488 |   } | 982 |   } | 
| 489 | } | 983 | } | 
| - | 984 | #  endif | |
| 490 | #endif | 985 | #endif | 
| 491 | - | ||
| 492 | /* modified 01/17/09 */ | - | |
| 493 | /* | - | |
| 494 |  ******************************************************************************* | - | |
| 495 |  *                                                                             * | - | |
| 496 |  *   ThreadStop() is called from SearchParallel() when it detects a beta       * | - | |
| 497 |  *   cutoff (fail high) at a node that is being searched in parallel.  We need * | - | |
| 498 |  *   to stop all threads here, and since this threading algorithm is recursive * | - | |
| 499 |  *   it may be necessary to stop other threads that are helping search this    * | - | |
| 500 |  *   branch further down into the tree.  This function simply sets tree->stop  * | - | |
| 501 |  *   to 1, which will stop that particular thread instantly and return it to   * | - | |
| 502 |  *   the idle loop in ThreadWait().                                            * | - | |
| 503 |  *                                                                             * | - | |
| 504 |  ******************************************************************************* | - | |
| 505 |  */ | - | |
| 506 | void ThreadStop(TREE * RESTRICT tree) { | - | |
| 507 | int proc; | - | |
| 508 | - | ||
| 509 | Lock(tree->lock); | - | |
| 510 | tree->stop = 1; | - | |
| 511 | for (proc = 0; proc < smp_max_threads; proc++) | - | |
| 512 | if (tree->siblings[proc]) | - | |
| 513 | ThreadStop(tree->siblings[proc]); | - | |
| 514 | Unlock(tree->lock); | - | |
| 515 | } | - | |
| 516 | - | ||
| 517 | /* modified 04/23/14 */ | - | |
| 518 | /* | - | |
| 519 |  ******************************************************************************* | - | |
| 520 |  *                                                                             * | - | |
| 521 |  *   ThreadWait() is the idle loop for the N threads that are created at the   * | - | |
| 522 |  *   beginning when Crafty searches.  Threads are "parked" here waiting on a   * | - | |
| 523 |  *   pointer to something they should search (a parameter block built in the   * | - | |
| 524 |  *   function Thread() in this case.  When this pointer becomes non-zero, each * | - | |
| 525 |  *   thread "parked" here will immediately call SearchParallel() and begin the * | - | |
| 526 |  *   parallel search as directed.                                              * | - | |
| 527 |  *                                                                             * | - | |
| 528 |  ******************************************************************************* | - | |
| 529 |  */ | - | |
| 530 | int ThreadWait(int64_t tid, TREE * RESTRICT waiting) { | - | |
| 531 | int value, tstart, tend; | - | |
| 532 | - | ||
| 533 | /* | - | |
| 534 |  ************************************************************ | - | |
| 535 |  *                                                          * | - | |
| 536 |  *  The N initial threads enter here and are kept penned    * | - | |
| 537 |  *  here forever.  However, once a thread leaves here it    * | - | |
| 538 |  *  may well re-enter ThreadWait() from the top while it    * | - | |
| 539 |  *  waits for a parallel search to complete.  While it      * | - | |
| 540 |  *  waits here it can also join in to help other busy       * | - | |
| 541 |  *  threads search their subtrees as well.                  * | - | |
| 542 |  *                                                          * | - | |
| 543 |  ************************************************************ | - | |
| 544 |  */ | - | |
| 545 | while (1) { | - | |
| 546 | tstart = ReadClock(); | - | |
| 547 | Lock(lock_split); | - | |
| 548 | Lock(lock_smp); | - | |
| 549 |     smp_idle++; | - | |
| 550 | if (!waiting && smp_split == 0) | - | |
| 551 | smp_split = 1; | - | |
| 552 | if (!thread[tid].tree) | - | |
| 553 | thread[tid].idle = 1; | - | |
| 554 | Unlock(lock_smp); | - | |
| 555 | Unlock(lock_split); | - | |
| 556 | /* | - | |
| 557 |  ************************************************************ | - | |
| 558 |  *                                                          * | - | |
| 559 |  *  We can exit if our thread[i] is non-zero, or if we are  * | - | |
| 560 |  *  waiting on others to finish a block that *we* have to   * | - | |
| 561 |  *  return through.  When the busy count on such a block    * | - | |
| 562 |  *  hits zero, we return immediately which unwinds the      * | - | |
| 563 |  *  search as it should be.                                 * | - | |
| 564 |  *                                                          * | - | |
| 565 |  ************************************************************ | - | |
| 566 |  */ | - | |
| 567 | while (!thread[tid].tree && (!waiting || waiting->nprocs)) | - | |
| 568 | Pause(); | - | |
| 569 | thread[tid].idle = 0; | - | |
| 570 | tend = ReadClock(); | - | |
| 571 | Lock(lock_smp); | - | |
| 572 | idle_time += tend - tstart; | - | |
| 573 | if (!thread[tid].tree) | - | |
| 574 | thread[tid].tree = waiting; | - | |
| 575 | /* | - | |
| 576 |  ************************************************************ | - | |
| 577 |  *                                                          * | - | |
| 578 |  *  We either have work to do, or threads we were waiting   * | - | |
| 579 |  *  on have finished their work.                            * | - | |
| 580 |  *                                                          * | - | |
| 581 |  ************************************************************ | - | |
| 582 |  */ | - | |
| 583 |     smp_idle--; | - | |
| 584 | Unlock(lock_smp); | - | |
| 585 | /* | - | |
| 586 |  ************************************************************ | - | |
| 587 |  *                                                          * | - | |
| 588 |  *  If we are waiting on a block and the busy count is now  * | - | |
| 589 |  *  zero, we simply return to finish up the bookkeeping at  * | - | |
| 590 |  *  that point.                                             * | - | |
| 591 |  *                                                          * | - | |
| 592 |  ************************************************************ | - | |
| 593 |  */ | - | |
| 594 | if (thread[tid].tree == waiting || thread[tid].tree == (TREE *) - 1) | - | |
| 595 | return 0; | - | |
| 596 | /* | - | |
| 597 |  ************************************************************ | - | |
| 598 |  *                                                          * | - | |
| 599 |  *  Else our thread[i] pointer is non-zero, meaning we have * | - | |
| 600 |  *  been assigned something to search.  Hi-ho, hi-ho, it's  * | - | |
| 601 |  *  off to work we go...                                    * | - | |
| 602 |  *                                                          * | - | |
| 603 |  ************************************************************ | - | |
| 604 |  */ | - | |
| 605 |     value = | - | |
| 606 | SearchParallel(thread[tid].tree, thread[tid].tree->alpha, | - | |
| 607 | thread[tid].tree->beta, thread[tid].tree->value, | - | |
| 608 | thread[tid].tree->side, thread[tid].tree->depth, | - | |
| 609 | thread[tid].tree->ply); | - | |
| 610 | Lock(thread[tid].tree->parent->lock); | - | |
| 611 | CopyToParent((TREE *) thread[tid].tree->parent, thread[tid].tree, value); | - | |
| 612 | thread[tid].tree->parent->nprocs--; | - | |
| 613 | thread[tid].tree->parent->siblings[tid] = 0; | - | |
| 614 | Unlock(thread[tid].tree->parent->lock); | - | |
| 615 | thread[tid].tree = 0; | - | |
| 616 |   } | - | |
| 617 | } | - | |