attachment:gpusync_rtss13_litmus.patch of Publications

Attachment 'gpusync_rtss13_litmus.patch'

   1 From 04f79cb08aad1373e16b7d5442f14d7276e27ec9 Mon Sep 17 00:00:00 2001
   2 From: Glenn Elliott <gelliott@cs.unc.edu>
   3 Date: Sun, 19 May 2013 23:21:29 -0400
   4 Subject: [PATCH] squash commit of GPUSync on Linux 3.0
   5 
   6 Squashed commit of the following:
   7 
   8 commit 44326648c2ea81b9a32619644fe9c665ed0d9e0b
   9 Author: Glenn Elliott <gelliott@cs.unc.edu>
  10 Date:   Mon May 14 16:51:05 2012 -0400
  11 
  12     Final GPUSync implementation.
  13 
  14 commit af6eeb156c7da47ff5df03a3da04432c8ac4460c
  15 Author: Glenn Elliott <gelliott@cs.unc.edu>
  16 Date:   Fri Apr 27 19:52:34 2012 -0400
  17 
  18     fix minor bugs. there is still a bug in GEDF PAI.
  19 
  20 commit 52056e94a94517e250f7f4e36e7470a4b002404e
  21 Author: Glenn Elliott <gelliott@cs.unc.edu>
  22 Date:   Thu Apr 26 15:20:22 2012 -0400
  23 
  24     No-op useless unlock calls.
  25 
  26 commit 040301747953ae9a2017def70e6004affe1e9aeb
  27 Author: Glenn Elliott <gelliott@cs.unc.edu>
  28 Date:   Wed Apr 25 19:30:48 2012 -0400
  29 
  30     Fix/test C-EDF for GPU RTSS12.
  31 
  32 commit 58f04ff13ac7128609ee468eb317c71817474a84
  33 Author: Glenn Elliott <gelliott@cs.unc.edu>
  34 Date:   Wed Apr 25 17:31:08 2012 -0400
  35 
  36     Port rtss12 features to C-EDF (untested)
  37 
  38 commit 4aabc4a7f68aae11c79b39fa65a9c54d3f3451e7
  39 Author: Glenn Elliott <gelliott@cs.unc.edu>
  40 Date:   Mon Apr 23 19:22:58 2012 -0400
  41 
  42     Match replica before spinlocks in ikglp_unlock()
  43 
  44 commit 3025aea8d0ed6ee4ab68281e5cbcc76ec4dab1e2
  45 Author: Glenn Elliott <gelliott@cs.unc.edu>
  46 Date:   Mon Apr 23 19:18:19 2012 -0400
  47 
  48     Fix line-endings. :P
  49 
  50 commit 6f436b63bc551bbd9f9ddcf4d8a960d7d847948e
  51 Author: Glenn Elliott <gelliott@cs.unc.edu>
  52 Date:   Mon Apr 23 19:11:32 2012 -0400
  53 
  54     Tested/Fixed IKGLP heurs. OS WORK FINALLY DONE!
  55 
  56 commit fa43d7a6bb9b0e748f23529424ac5eebd849d9d7
  57 Author: Glenn Elliott <gelliott@cs.unc.edu>
  58 Date:   Mon Apr 23 12:12:58 2012 -0400
  59 
  60     Donees cannot be amongst the top-m requests.
  61 
  62 commit 372db158e2a5c7e2b455262c0959eb13da4433b9
  63 Author: Glenn Elliott <gelliott@cs.unc.edu>
  64 Date:   Fri Apr 20 22:48:32 2012 -0400
  65 
  66     Donor dequeue heuristic for IKGLP.  Untested.
  67 
  68 commit 273e902c50ef94966815a92c2af5ab8c5b2d77ce
  69 Author: Glenn Elliott <gelliott@cs.unc.edu>
  70 Date:   Fri Apr 20 21:20:21 2012 -0400
  71 
  72     Untested donee selection heuristic for IKGLP.
  73 
  74 commit c6d04216a123f8e0b50eb78bbb1eaf646a1ca4e0
  75 Author: Glenn Elliott <gelliott@cs.unc.edu>
  76 Date:   Wed Apr 18 23:18:32 2012 -0400
  77 
  78     Added hooks for IKGLP affinity and a little logic.
  79 
  80     simple IKGLP is already done.  it does:
  81     1) auto gpu de/registration.
  82     2) distruption amongst simultanous users across queues
  83     3) calls default IKGLP routines when appropriate.
  84 
  85     Remaining work:
  86     1) FQ advisement.
  87     2) Donor stealing advisement.
  88     3) Donee selection advisement.
  89 
  90 commit 149ef3b424a49e6b928c5e23fea83380ed95ea38
  91 Author: Glenn Elliott <gelliott@cs.unc.edu>
  92 Date:   Wed Apr 18 21:33:21 2012 -0400
  93 
  94     Zap line-endings
  95 
  96 commit f916cdb8e6a9ee2c917fddb7351e6bb39f6c953e
  97 Author: Glenn Elliott <gelliott@cs.unc.edu>
  98 Date:   Wed Apr 18 21:30:36 2012 -0400
  99 
 100     Added support for simult-users in kfmlp
 101 
 102 commit 6ab36ca992441f7353840c70fc91d99a500a940e
 103 Author: Glenn Elliott <gelliott@cs.unc.edu>
 104 Date:   Wed Apr 18 16:24:56 2012 -0400
 105 
 106     Fixed and tested aff-aware KFMLP. (finally!)
 107 
 108 commit 440aa2083245b81583980e3f4177f3b4cc805556
 109 Author: Glenn Elliott <gelliott@cs.unc.edu>
 110 Date:   Mon Apr 16 20:18:07 2012 -0400
 111 
 112     make gpu registration a little more robust
 113 
 114 commit 8675824ed85d6e83a24e77dabaf3a5c02c91ef6f
 115 Author: Glenn Elliott <gelliott@cs.unc.edu>
 116 Date:   Mon Apr 16 20:09:15 2012 -0400
 117 
 118     Implement GPU-affinity-aware kfmlp (untested)
 119 
 120 commit 0b865246946a97dc03a81ccf55bf84acce923c4b
 121 Author: Glenn Elliott <gelliott@cs.unc.edu>
 122 Date:   Sun Apr 15 19:29:09 2012 -0400
 123 
 124     Infrastructure for affinity-aware k-exclusion
 125 
 126 commit bb4922c968aa1a30fddd6ad9d0f750706c7b3b29
 127 Author: Glenn Elliott <gelliott@cs.unc.edu>
 128 Date:   Sun Apr 15 18:09:59 2012 -0400
 129 
 130     PAI::change_prio(): check work before locking
 131 
 132 commit f4aef3b7d845324eb79a226d87f232dcd8867f3b
 133 Author: Glenn Elliott <gelliott@cs.unc.edu>
 134 Date:   Sun Apr 15 18:06:04 2012 -0400
 135 
 136     Update PAI to support multiGPUs (todo: klitirqd)
 137 
 138 commit 786d383a58108ad3437a38d0e2583859cb94a4ee
 139 Author: Glenn Elliott <gelliott@cs.unc.edu>
 140 Date:   Sun Apr 15 15:05:02 2012 -0400
 141 
 142     remove fifo/rm header files left over
 143 
 144 commit 3f53a88be223f484db011f0f42e843aa57be8fca
 145 Author: Glenn Elliott <gelliott@cs.unc.edu>
 146 Date:   Sun Apr 15 15:04:15 2012 -0400
 147 
 148     add kfmlp as separate file
 149 
 150 commit b3ae67412531cbc583d5697d2366fc58d6dd07e7
 151 Merge: c0667dc adeff95
 152 Author: Glenn Elliott <gelliott@cs.unc.edu>
 153 Date:   Sun Apr 15 15:03:33 2012 -0400
 154 
 155     Merge branch 'wip-gpu-interrupts' into wip-gpu-rtss12
 156 
 157     Conflicts:
 158     	include/litmus/fdso.h
 159     	include/litmus/rt_param.h
 160     	include/litmus/sched_plugin.h
 161     	include/litmus/unistd_32.h
 162     	include/litmus/unistd_64.h
 163     	litmus/Makefile
 164     	litmus/edf_common.c
 165     	litmus/litmus.c
 166     	litmus/locking.c
 167     	litmus/sched_gsn_edf.c
 168     	litmus/sched_plugin.c
 169 
 170 commit c0667dc4894e913048cf8904f0ce9a79b481b556
 171 Author: Glenn Elliott <gelliott@cs.unc.edu>
 172 Date:   Fri Apr 13 16:18:03 2012 -0400
 173 
 174     Move RSM and IKGLP imp. to own .c files
 175 
 176     Also reformated code to be slightly more
 177     standard coding practice compliant.
 178 
 179 commit 8eb55f8fa1a2c3854f0f77b9b8663178c0129f6c
 180 Author: Glenn Elliott <gelliott@cs.unc.edu>
 181 Date:   Wed Apr 11 15:57:59 2012 -0400
 182 
 183     Added support for Dynamic Group Locks (DGLs)
 184 
 185     Added support for Dynamic Group Locks.  Locks
 186     are FIFO ordered (no timestamps), so a big DGL
 187     lock is needed to enqueue for resources atomically.
 188 
 189     Unfortunatly, this requires nested inheritance to use
 190     coarse-grain locking.  Coarse-grain locking is used
 191     when DGLs are enabled.  Fine-grain locking is used
 192     when DGLs are disabled.
 193 
 194     TODO: Clean up IKGLP implementatio.  There is
 195     a lot of needless debug/TRACE work.
 196 
 197 commit 0c80d0acbbc2103a744f2b2b76cb66ddeb28ebbf
 198 Author: Glenn Elliott <gelliott@cs.unc.edu>
 199 Date:   Mon Apr 9 00:18:24 2012 -0400
 200 
 201     Fix IKGLP bugs discovered in test.
 202 
 203     Apply fixes to the IKGLP.  Also, break binheap.h
 204     into binheap.h/.c
 205 
 206 commit f5c9f29c1d17131870ec113cc357b40d2f087bc2
 207 Author: Glenn Elliott <gelliott@cs.unc.edu>
 208 Date:   Wed Apr 4 23:11:47 2012 -0400
 209 
 210     Cleanup use of binheap_entry().
 211 
 212     The entry is already casted by the macro.
 213 
 214 commit 0ccecdaf12334b2241ee5185b04eda4f91f95fe2
 215 Author: Glenn Elliott <gelliott@cs.unc.edu>
 216 Date:   Wed Apr 4 23:05:47 2012 -0400
 217 
 218     Untested implementation of IKGLP.
 219 
 220     I don't like coding so much w/o testing, but it's
 221     sort of hard to do without both lock() and unlock().
 222 
 223 commit d2f4875d7a183cc3c95c27c193af2c0cd1d1c555
 224 Author: Glenn Elliott <gelliott@cs.unc.edu>
 225 Date:   Sat Mar 31 19:56:20 2012 -0400
 226 
 227     Infrastructure of IKGLP. lock/unlock are stubs
 228 
 229 commit 62f2907f445b08f958acf1cc1a0c29736d4ba206
 230 Author: Glenn Elliott <gelliott@cs.unc.edu>
 231 Date:   Fri Mar 30 16:43:52 2012 -0400
 232 
 233     Nested inheritance with fine-grained locking.
 234 
 235     Minor hack to lockdep was required too allow
 236     the inheritance propagation locking logic to
 237     work.
 238 
 239 commit d0961e328a2a4c026c884c768b798cb882922708
 240 Merge: fb0c271 4be8ef6
 241 Author: Glenn Elliott <gelliott@cs.unc.edu>
 242 Date:   Fri Mar 23 11:16:13 2012 -0400
 243 
 244     Merge branch 'wip-binary-heap' into wip-nested-locks
 245 
 246 commit fb0c271c1e8a4d4eac440d3e47d35f19235e07ac
 247 Author: Glenn Elliott <gelliott@cs.unc.edu>
 248 Date:   Fri Mar 23 11:16:09 2012 -0400
 249 
 250     blah
 251 
 252 commit 8973214f010cf55fbf18cb88471d6c99ed6ff575
 253 Author: Glenn Elliott <gelliott@cs.unc.edu>
 254 Date:   Thu Mar 22 14:45:39 2012 -0400
 255 
 256     Introduction of basic nesting foundations.
 257 
 258 commit 4be8ef609123d4b4d281976f6bf5e65024e66b0b
 259 Author: Glenn Elliott <gelliott@cs.unc.edu>
 260 Date:   Wed Mar 21 19:25:02 2012 -0400
 261 
 262     Make C-EDF work with simplified binheap_delete
 263 
 264 commit bf57086c9aa497c016efc208a0ceb66f262ab18b
 265 Author: Glenn Elliott <gelliott@cs.unc.edu>
 266 Date:   Wed Mar 21 19:24:13 2012 -0400
 267 
 268     Make GSN-EDF work with simlified binheap_delete()
 269 
 270 commit 33f5fe82661086d27467821aaf418364774e360a
 271 Author: Glenn Elliott <gelliott@cs.unc.edu>
 272 Date:   Wed Mar 21 19:21:46 2012 -0400
 273 
 274     Simplify binheap_delete and add binheap_decrease
 275 
 276 commit ee525fe7ba4edf4da2d293629ffdff2caa9ad02b
 277 Author: Glenn Elliott <gelliott@cs.unc.edu>
 278 Date:   Wed Mar 21 16:39:26 2012 -0400
 279 
 280     C-EDF: Use binary heap instead of binomial heap.
 281 
 282     Use binary heap for ordering priority of CPUs.
 283 
 284 commit bdce67bc2babc2e5b3b2440964e9cf819ac814dc
 285 Author: Glenn Elliott <gelliott@cs.unc.edu>
 286 Date:   Wed Mar 21 16:26:27 2012 -0400
 287 
 288     GSN-EDF: Use binary heap instead of binomial heap.
 289 
 290     Use binary heap to track CPU priorities.
 291 
 292 commit 5b73afc4eb1b0303cb92eb29a2ecc59c1db69537
 293 Author: Glenn Elliott <gelliott@cs.unc.edu>
 294 Date:   Wed Mar 21 14:59:52 2012 -0400
 295 
 296     Binary heap implementation
 297 
 298     Motivation: Linux's prio_heap.h is of fixed size. Litmus's binomial
 299     heap may be overkill (and perhaps not general enough) for some applications.
 300 
 301     Implemented in the style of linked lists.
 302 
 303 commit adeff95dcdcf88789e983f20b0657f29286de8d7
 304 Author: Glenn Elliott <gelliott@cs.unc.edu>
 305 Date:   Mon Mar 5 14:59:07 2012 -0500
 306 
 307     Remove option for threading of all softirqs.
 308 
 309 commit 3e41d4826b0aa175c3f194548fa6ab20cd1cc32d
 310 Author: Glenn Elliott <gelliott@cs.unc.edu>
 311 Date:   Sun Mar 4 21:20:45 2012 -0500
 312 
 313     Clean up PAI.
 314 
 315 commit 12d312072e3f4caa6e4e500d5a23c85402494cd1
 316 Merge: 6a00f20 3d1c6d4
 317 Author: Glenn Elliott <gelliott@cs.unc.edu>
 318 Date:   Sun Mar 4 20:52:29 2012 -0500
 319 
 320     Merge branch 'wip-pai' into wip-gpu-interrupts
 321 
 322     Conflicts:
 323     	include/litmus/affinity.h
 324     	kernel/sched.c
 325     	kernel/softirq.c
 326     	litmus/Kconfig
 327     	litmus/affinity.c
 328     	litmus/litmus.c
 329     	litmus/preempt.c
 330     	litmus/sched_cedf.c
 331     	litmus/sched_gsn_edf.c
 332 
 333 commit 3d1c6d44d3f133909d1c594351c2b7c779b1d7d4
 334 Author: Glenn Elliott <gelliott@cs.unc.edu>
 335 Date:   Sun Mar 4 16:09:04 2012 -0500
 336 
 337     Some cleanup of PAI
 338 
 339 commit 6a00f206debf8a5c8899055726ad127dbeeed098
 340 Author: Jonathan Herman <hermanjl@cs.unc.edu>
 341 Date:   Thu Feb 16 19:13:16 2012 -0500
 342 
 343     Typo in macro
 344 
 345 commit 83b11ea1c6ad113519c488853cf06e626c95a64d
 346 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 347 Date:   Tue Jan 24 09:36:12 2012 +0100
 348 
 349     Feather-Trace: keep track of interrupt-related interference.
 350 
 351     Increment a processor-local counter whenever an interrupt is handled.
 352     This allows Feather-Trace to include a (truncated) counter and a flag
 353     to report interference from interrupts. This could be used to filter
 354     samples that were disturbed by interrupts.
 355 
 356 commit f5264e2cb8213dad425cb2d2db564edbc443a51a
 357 Author: Glenn Elliott <gelliott@cs.unc.edu>
 358 Date:   Fri Jan 20 11:09:15 2012 -0500
 359 
 360     Fix bugs in tracing and PAI handling
 361 
 362 commit 1a582a2c5e361e01a4c64f185bb1a23c3f70701a
 363 Author: Glenn Elliott <gelliott@cs.unc.edu>
 364 Date:   Sat Jan 14 16:56:47 2012 -0500
 365 
 366     Port PAI interrupts to GSN-EDF, C-RM/RM-SRT/FIFO.
 367 
 368 commit 53a6dbb9f5337e77fce9c2672488c1c5e0621beb
 369 Author: Glenn Elliott <gelliott@cs.unc.edu>
 370 Date:   Sat Jan 14 14:20:07 2012 -0500
 371 
 372     Completed PAI for C-EDF.
 373 
 374 commit 5d7dcfa10ea0dd283773a301e3ce610a7797d582
 375 Author: Glenn Elliott <gelliott@cs.unc.edu>
 376 Date:   Wed Jan 11 14:37:13 2012 -0500
 377 
 378     PAI implementation, C-RM, C-FIFO.
 379 
 380 commit 5bd89a34d89f252619d83fef3c9325e24311389e
 381 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 382 Date:   Thu Jul 28 01:15:58 2011 -0400
 383 
 384     Litmus core: simplify np-section protocol
 385 
 386     User a 32-bit word for all non-preemptive section flags.
 387     Set the "please yield soon" flag atomically when
 388     accessing it on remotely-scheduled tasks.
 389 
 390 commit 81b8eb2ae452c241df9b3a1fb2116fa4d5adcb75
 391 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 392 Date:   Tue Jul 26 22:03:18 2011 -0400
 393 
 394     C-EDF: rename lock -> cluster_lock
 395 
 396     The macro lock conflicts with locking protocols...
 397 
 398 commit 71083a7604e93e44536edde032706348f3a752ca
 399 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 400 Date:   Mon Jul 25 15:31:55 2011 -0400
 401 
 402     locking: use correct timestamp
 403 
 404 commit e079932a0a1aab6adbc42fedefc6caa2d9a8af2b
 405 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 406 Date:   Sat Jul 23 23:40:10 2011 -0400
 407 
 408     Feather-trace: let userspace add overhead events
 409 
 410     This is useful for measuring locking-related overheads
 411     that are partially recorded in userspace.
 412 
 413 commit 12982f31a233250c7a62b17fb4bd13594cb78777
 414 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 415 Date:   Sat Jul 23 23:38:57 2011 -0400
 416 
 417     ftdev: let bufffer-specific code handle writes from userspace
 418 
 419     This allows us to splice in information into logs from events
 420     that were recorded in userspace.
 421 
 422 commit 49e5b0c0d7c09bef5b9bfecaaac3f0ea2cf24e43
 423 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 424 Date:   Sat Jul 23 23:06:20 2011 -0400
 425 
 426     ftdev: remove event activation hack
 427 
 428     Instead of doing the hackisch 'write commands to device' thing,
 429     let's  just use a real ioctl() interface.
 430 
 431 commit 1dead199b4ae68ab98eacec4a661fd5ecb5a2704
 432 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 433 Date:   Sat Feb 5 23:15:09 2011 -0500
 434 
 435     Feather-Trace: keep track of release latency
 436 
 437 commit 4490f9ecf94e28458069a02e8cfcf4f385390499
 438 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 439 Date:   Sat Feb 5 22:57:57 2011 -0500
 440 
 441     Feather-Trace: trace locking-related suspensions
 442 
 443 commit b739b4033c0f55f9194be2793db9e6ace06047db
 444 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 445 Date:   Sat Feb 5 20:11:30 2011 -0500
 446 
 447     Feather-Trace: start with the largest permissible range
 448 
 449     MAX_ORDER is 11, but this is about number of records, not number of pages.
 450 
 451 commit fd6d753fc4e01f91427176ebfcced2c3d3f36c32
 452 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 453 Date:   Tue Feb 8 17:33:44 2011 -0500
 454 
 455     bugfix: add processors in order of increasing indices to clusters
 456 
 457     Pfair expects to look at processors in order of increasing index.
 458     Without this patch, Pfair could deadlock in certain situations.
 459 
 460 commit 2fec12d43b366b7257c602af784b172466d8d4c5
 461 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 462 Date:   Thu Nov 24 13:59:33 2011 -0500
 463 
 464     Pfair: improve robustness of suspensions
 465 
 466     This patch fixes two crash or hang bugs related to suspensions
 467     in Pfair.
 468 
 469     1) When a job was not present at the end of its last subtask, then
 470        its linked_on field was not cleared. This confused the scheduler
 471        when it later resumed. Fix: clear the field.
 472 
 473     2) Just testing for linked_on == NO_CPU is insufficient in the wake_up path
 474        to determine whether a task should be added to the ready queue. If
 475        the task remained linked and then was "preempted" at a later
 476        quantum boundary, then it already is in the ready queue and nothing
 477        is required. Fix: encode need to requeue in task_rt(t)->flags.
 478 
 479 commit d1d6e4c300d858c47b834be145f30973bc2921bf
 480 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 481 Date:   Thu Nov 24 13:42:59 2011 -0500
 482 
 483     Add option to turn off preemption state tracing
 484 
 485     Preemption state tracing is only useful when debugging preemption-
 486     and IPI-related races. Since it creates a lot of clutter in the logs,
 487     this patch turns it off unless explicitly requested.
 488 
 489 commit a7a7f71529d9a6aae02ab3cb64451e036ce9d028
 490 Author: Glenn Elliott <gelliott@cs.unc.edu>
 491 Date:   Wed Nov 2 11:33:44 2011 -0400
 492 
 493     Add unlikely() to rel master check (match pfair).
 494 
 495 commit 89174d049ea77b127fb3f8b3bbd8bc2996d0a535
 496 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 497 Date:   Sat Feb 12 16:40:43 2011 -0500
 498 
 499     bugfix: release master CPU must signal task was picked
 500 
 501 commit ec77ede8baa013138fe03ff45dd57f7bac50e5d4
 502 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 503 Date:   Tue Feb 8 12:41:10 2011 -0500
 504 
 505     Pfair: various fixes concerning release timers
 506 
 507 commit 0720416e5b1bcb825619ba4b212d9056017ffd62
 508 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 509 Date:   Sat Feb 5 21:50:36 2011 -0500
 510 
 511     Pfair: add support for true sporadic releases
 512 
 513     This patch also converts Pfair to implement early releasing such that
 514     no timer wheel is required anymore. This removes the need for a
 515     maximum period restriction.
 516 
 517 commit 399455c0e529bb07760f17e8fe0fddc342b67bc2
 518 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 519 Date:   Sat Feb 5 22:49:52 2011 -0500
 520 
 521     Pfair: add release master support.
 522 
 523     Merged in release master support for Pfair.  Some merge
 524     conflicts had to be resolved.
 525 
 526 commit b4c52e27caa701a16e120b43a0e70ca6529a58a4
 527 Author: Glenn Elliott <gelliott@cs.unc.edu>
 528 Date:   Wed Jun 22 01:30:25 2011 -0400
 529 
 530     C-EDF: Make migration affinity work with Release Master
 531 
 532     Needed to update C-EDF to handle release master.  Also
 533     updated get_nearest_available_cpu() to take NO_CPU instead
 534     of -1 to indicate that there is no release master.  While
 535     NO_CPU is 0xffffffff (-1 in two's complement), we still
 536     translate this value to -1 in case NO_CPU changes.
 537 
 538     Signed-off-by: Andrea Bastoni <bastoni@cs.unc.edu>
 539 
 540 commit b751e4e17e667f11404fc2f290416c0df050e964
 541 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 542 Date:   Thu Feb 10 18:41:38 2011 -0500
 543 
 544     C-EDF: add release master support
 545 
 546     As with GSN-EDF, do not insert release master into CPU heap.
 547 
 548 commit 17e34f413750b26aa493f1f8307f111bc5d487de
 549 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 550 Date:   Thu Feb 10 20:05:15 2011 -0500
 551 
 552     PSN-EDF: add release master support
 553 
 554     We can give up a processor under partitioning, too.
 555 
 556 commit f5bee93f09b907a302e908c3cc3381ffbe826e2b
 557 Author: Glenn Elliott <gelliott@cs.unc.edu>
 558 Date:   Tue Jun 21 02:00:52 2011 -0400
 559 
 560     COMMENT: Correct comment on precise budget enforcement
 561 
 562     Original comment said that this feature wasn't supported,
 563     though it has been since around October 2010.
 564 
 565 commit 592eaca1409e55407e980f71b2ec604ca3610ba5
 566 Author: Glenn Elliott <gelliott@cs.unc.edu>
 567 Date:   Tue Jun 21 01:29:34 2011 -0400
 568 
 569     Avoid needlessly costly migrations.  CONFIG_SCHED_CPU_AFFINITY
 570 
 571     Given a choice between several available CPUs (unlinked) on which
 572     to schedule a task, let the scheduler select the CPU closest to
 573     where that task was previously scheduled.  Hopefully, this will
 574     reduce cache migration penalties.
 575 
 576     Notes: SCHED_CPU_AFFINITY is dependent upon x86 (only x86 is
 577     supported at this time). Also PFair/PD^2 does not make use of
 578     this feature.
 579 
 580     Signed-off-by: Andrea Bastoni <bastoni@cs.unc.edu>
 581 
 582 commit fb8d6602af1cbc09115544056b872b976c6349c3
 583 Author: Andrea Bastoni <bastoni@cs.unc.edu>
 584 Date:   Wed Aug 24 17:32:21 2011 +0200
 585 
 586     Prevent Linux to send IPI and queue tasks on remote CPUs.
 587 
 588     Whether to send IPIs and enqueue tasks on remote runqueues is
 589     plugin-specific. The recent ttwu_queue() mechanism (by calling
 590     ttwu_queue_remote()) interferes with Litmus plugin decisions.
 591 
 592 commit ea62a6fe914f7463f89422dcb1812eb071cbd495
 593 Author: Andrea Bastoni <bastoni@cs.unc.edu>
 594 Date:   Wed Aug 24 12:06:42 2011 +0200
 595 
 596     Update PULL_TIMERS_VECTOR number
 597 
 598     From 2.6.39 the "0xee" vector number that we used for pull_timers
 599     low-level management is is use by invalidate_tlb_X interrupts.
 600     Move the pull_timers vector below the max size of invalidate_tlb.
 601 
 602 commit 56c5c609615322bfbda5adff94ce011eb3d28fef
 603 Author: Andrea Bastoni <bastoni@cs.unc.edu>
 604 Date:   Sat Aug 27 16:10:06 2011 +0200
 605 
 606     Fix prototype mismatching and synch syscall numbers
 607 
 608     * Update prototypes for switched_to(), prio_changed(), select_task_rq().
 609     * Fix missing pid field in printk output.
 610     * Synchronize syscall numbers for arm and x86.
 611 
 612 commit 7b1bb388bc879ffcc6c69b567816d5c354afe42b
 613 Merge: 7d75459 02f8c6a
 614 Author: Andrea Bastoni <bastoni@cs.unc.edu>
 615 Date:   Sat Aug 27 15:43:54 2011 +0200
 616 
 617     Merge 'Linux v3.0' into Litmus
 618 
 619     Some notes:
 620     * Litmus^RT scheduling class is the topmost scheduling class
 621       (above stop_sched_class).
 622     * scheduler_ipi() function (e.g., in smp_reschedule_interrupt())
 623       may increase IPI latencies.
 624     * Added path into schedule() to quickly re-evaluate scheduling
 625       decision without becoming preemptive again. This used to be
 626       a standard path before the removal of BKL.
 627 
 628     Conflicts:
 629     	Makefile
 630     	arch/arm/kernel/calls.S
 631     	arch/arm/kernel/smp.c
 632     	arch/x86/include/asm/unistd_32.h
 633     	arch/x86/kernel/smp.c
 634     	arch/x86/kernel/syscall_table_32.S
 635     	include/linux/hrtimer.h
 636     	kernel/printk.c
 637     	kernel/sched.c
 638     	kernel/sched_fair.c
 639 
 640 commit 3d5537c160c1484e8d562b9828baf679cc53f67a
 641 Author: Glenn Elliott <gelliott@cs.unc.edu>
 642 Date:   Thu Jun 2 16:06:05 2011 -0400
 643 
 644     Full patch for klitirqd with Nvidia GPU support.
 645 
 646 commit 7d754596756240fa918b94cd0c3011c77a638987
 647 Author: Christopher Kenna <cjk@cs.unc.edu>
 648 Date:   Sat Apr 16 20:12:00 2011 -0400
 649 
 650     LITMUS Core: Check for valid class in RT-param syscall.
 651 
 652 commit 6d4cc883ec2470500be6c95fd2e7c6944e89c3e8
 653 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 654 Date:   Sat Feb 12 16:40:43 2011 -0500
 655 
 656     bugfix: release master CPU must signal task was picked
 657 
 658     Otherwise, the release master CPU may try to reschedule in an infinite
 659     loop.
 660 
 661 commit 0f6a8e02773f8c23b5b6a3dbfa044e50c9d7d811
 662 Author: Glenn Elliott <gelliott@cs.unc.edu>
 663 Date:   Thu Mar 31 10:47:01 2011 -0400
 664 
 665     Improve FMLP queue management.
 666 
 667     The next owner of a FMLP-protected resource is dequeued from
 668     the FMLP FIFO queue by unlock() (when the resource is freed by
 669     the previous owner) instead of performing the dequeue by the next
 670     owner immediately after it has been woken up.
 671 
 672     This simplifies the code a little bit and also reduces potential
 673     spinlock contention.
 674 
 675 commit c05eaa8091d2cadc20363d44a85ee454262f4bc2
 676 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 677 Date:   Thu Jan 27 20:11:59 2011 -0500
 678 
 679     Pfair: remove sporadic_release flag
 680 
 681     Instead of having an extra flag, Pfair should just infer sporadic
 682     release based on deadlines like other plugins, too.
 683 
 684 commit 71efbc5459ef95ed902a6980eae646197529364e
 685 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 686 Date:   Fri Jan 7 17:37:01 2011 -0500
 687 
 688     Pfair: support clustered scheduling
 689 
 690     Just like C-EDF is a global scheduler that is split across several
 691     clusters, Pfair can be applied on a per-cluster basis. This patch
 692     changes the Pfair implementation to enable clustering based on the
 693     recently added generic clustering support.
 694 
 695 commit 343d4ead3b12992f494134114cf50e4f37c656c5
 696 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 697 Date:   Thu Jan 27 16:23:46 2011 -0500
 698 
 699     Litmus core: add generic clustering support
 700 
 701     Inspired by the existing C-EDF code, this generic version will build
 702     clusters of CPUs based on a given cache level.
 703 
 704 commit 4ce37704ec0bedb28b5708d32964fca471e793d0
 705 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 706 Date:   Wed Jan 26 20:42:49 2011 -0500
 707 
 708     Litmus core: extract userspace interface from C-EDF
 709 
 710     Make the cluster size configuration in C-EDF generic so that it can be
 711     used by other clustered schedulers.
 712 
 713 commit 963fd846e36b48d5338ef2a134d3ee8d208abc07
 714 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 715 Date:   Sat Jan 29 14:45:49 2011 -0500
 716 
 717     Feather-Trace: rename locking trace points
 718 
 719     Since we don't expect to trace more than one lock type at a time,
 720     having protocol-specific trace points is not required.
 721 
 722 commit 7f0bd4c213ff8dca0eb3bdd887f5c62c8d30fab5
 723 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 724 Date:   Sat Jan 29 15:50:52 2011 -0500
 725 
 726     fdso: pass userpsace config argument to object constructor
 727 
 728     As Glenn pointed out, it is useful for some protocols (e.g.,
 729     k-exclusion protocols) to know the userspace configuration at object
 730     creation time. This patch changes the fdso API to pass the parameter
 731     to the object constructor, which is then in turn passed to the lock
 732     allocater. The return code from the lock allocater is passed to
 733     userspace in return.
 734 
 735     This also fixes some null pointer dereferences in the FDSO code found
 736     by the test suite in liblitmus.
 737 
 738 commit fab768a4cdc49ad7886cac0d0361f8432965a817
 739 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 740 Date:   Sat Jan 29 13:38:24 2011 -0500
 741 
 742     GSN-EDF: re-implement FMLP support
 743 
 744     This introduces the global FMLP based on the generic locking layer.
 745 
 746 commit e705aa52df711112d434ccc87ee5fb5838c205a2
 747 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 748 Date:   Fri Jan 28 19:06:11 2011 -0500
 749 
 750     PSN-EDF: re-implement FMLP support
 751 
 752     Implement the partitioned FMLP with priority boosting based on the
 753     generic lock API.
 754 
 755 commit e593c9dbe858c82e284ff85e625837ae3ab32f1c
 756 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 757 Date:   Fri Jan 28 19:04:08 2011 -0500
 758 
 759     EDF: support priority boosting
 760 
 761     While we are at it, simplify edf_higher_prio() a bit.
 762 
 763 commit fc6482bb7a6a638474565c90159997bd59069297
 764 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 765 Date:   Fri Jan 28 17:30:14 2011 -0500
 766 
 767     FMLP: remove old implementation
 768 
 769 commit e1b81e70c3af9d19d639bc8bdaa5a8fc13bf17a8
 770 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 771 Date:   Fri Jan 28 17:04:58 2011 -0500
 772 
 773     SRP: port to new generic locking API
 774 
 775     This re-enables SRP support under PSN-EDF and demonstrates how the new
 776     locking API should be used.
 777 
 778 commit cc602187d4466374bca031039e145aa1b89aca96
 779 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 780 Date:   Fri Jan 28 16:41:16 2011 -0500
 781 
 782     Litmus core: replace FMLP & SRP system calls with generic syscalls
 783 
 784     This renders the FMLP and SRP unfunctional until they are ported to
 785     the new locking API.
 786 
 787 commit a3db326495d4051bddc657d3b226ad4daa7997c4
 788 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 789 Date:   Fri Jan 28 13:26:15 2011 -0500
 790 
 791     Litmus core: add generic locking API
 792 
 793     Provide a unified userspace interface for plugin-specific locking
 794     protocols.
 795 
 796 commit 2dea9d5e7727b8474981557cbf925687b8f33865
 797 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 798 Date:   Fri Jan 28 12:24:58 2011 -0500
 799 
 800     Litmus core: change plugin locking interface to generic 'allocate_lock()'
 801 
 802     As the number of supported locking protocols is expected to rise,
 803     hard-coding things like priority inheritance in the plugin interface
 804     doesn't scale. Instead, use a new generic lock-ops approach. With this
 805     approach, each plugin can define its own protocol implementation (or
 806     use a generic one), and plugins can support multiple protocols without
 807     having to change the plugin interface for each protocol.
 808 
 809 commit fd8ae31c74975c8499983c9831bff2b136b98434
 810 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 811 Date:   Fri Jan 28 11:54:38 2011 -0500
 812 
 813     fdso: supply object type to constructor and destructor methods
 814 
 815     Passing the object type explicitly will enable generic lock constructors.
 816 
 817 commit a0f243fd1d66c3499f88a690e485e94160ac1a8c
 818 Author: Jonathan Herman <hermanjl@cs.unc.edu>
 819 Date:   Sun Jan 30 15:14:20 2011 -0500
 820 
 821     Fixed is_hrt, is_srt, and is_be macros.
 822 
 823 commit 3cb35a8d90658bd8fb6f9b4f60eb7f97d0643313
 824 Author: Jonathan Herman <hermanjl@cs.unc.edu>
 825 Date:   Sun Jan 30 15:10:49 2011 -0500
 826 
 827     Added task class to feather trace param record.
 828 
 829 commit 904531a6321964579ab0972a8833616e97dbf582
 830 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 831 Date:   Sat Jan 29 20:31:57 2011 -0500
 832 
 833     bugfix: don't let children stay Litmus real-time tasks
 834 
 835     It has always been LITMUS^RT policy that children of real-time tasks
 836     may not skip the admissions test, etc. This used to be enforced, but
 837     was apparently dropped during some port. This commit re-introduces
 838     this policy.  This fixes a kernel panic that occurred when "real-time
 839     children" exited without proper initilization.
 840 
 841 commit 3d8eb93db513bd9caa982f27fee8156405fac754
 842 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 843 Date:   Wed Jan 26 20:36:49 2011 -0500
 844 
 845     Litmus core: add copy_and_chomp() helper
 846 
 847     We read in a line from userspace and remove the trailing newline in a
 848     number of places. This function extracts the common code to avoid
 849     future duplication.
 850 
 851 commit 7dbc4a842d3bcfa755ba82cae46171d0098d4c2c
 852 Author: Jonathan Herman <hermanjl@cs.unc.edu>
 853 Date:   Wed Jan 26 17:47:49 2011 -0500
 854 
 855     Added support for tracing arbitrary actions.
 856 
 857 commit d11808b5c6b032de4284281ed2ff77ae697a4ebd
 858 Author: Christopher Kenna <cjk@cs.unc.edu>
 859 Date:   Sun Jan 9 19:33:49 2011 -0500
 860 
 861     Feather-Trace: dynamic memory allocation and clean exit
 862 
 863     This patch changes Feather-Trace to allocate memory for the minor
 864     devices dynamically, which addresses a long-standing FIXME. It also
 865     provides clean module exit and error conditions for Feather-Trace.
 866 
 867 commit 37eb46be881dde4b405d3d8b48e76b4a8d62ae2c
 868 Author: Christopher Kenna <cjk@cs.unc.edu>
 869 Date:   Fri Jan 7 20:46:25 2011 -0500
 870 
 871     Feather-Trace: register devices with sysfs
 872 
 873     This patch implements support for Feather-Trace devices to use the sysfs
 874     file system and, consequently, udev support.
 875 
 876     This allows us to allocate major/minor numbers for Feather-Trace
 877     devices dynamically, which is desirable because our old static
 878     allocations tend to create conflicts on modern distributions and/or
 879     when there are many cores.
 880 
 881 commit 7648363e5636bd865aeac3236eb4675f0687eb4a
 882 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 883 Date:   Mon Jan 3 07:29:48 2011 -0500
 884 
 885     cleanup C-EDF cluster size configuration
 886 
 887     Refactor the code that determines the C-EDF cluster size.
 888     - Use an enum with symbolic constants instead of magic int values.
 889     - Complain and fail to switch if an unsupported cluster size is requested.
 890     - Default to ALL as suggested by Glenn and Andrea.
 891 
 892 commit 73da50b48b6e7c60add2fcf0b683318b76ecb340
 893 Author: Andrea Bastoni <bastoni@cs.unc.edu>
 894 Date:   Tue Dec 21 18:19:27 2010 -0500
 895 
 896     bugfix: clear scheduled field of the correct CPU upon task_exit in C-EDF
 897 
 898     Do not use the "scheduled_on" field to address the cpus structure
 899     within a cluster. cpus may contain less items than num_online_cpus and
 900     we may cause an out-of-bound access. Instead, use "scheduled_on" to
 901     directly access the per-cpu cpu_entry_t structure.
 902 
 903     Reported-by: Jonathan Herman <hermanjl@cs.unc.edu>
 904 
 905 commit f07bb0a4549916107a7619d0bc4cb5dc09d5744a
 906 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 907 Date:   Mon Nov 29 09:20:03 2010 -0500
 908 
 909     bugfix: avoid underflow in budget_remaining()
 910 
 911     budget_remaining() reports incorrect values due to the operands being
 912     switched, which leads to an integer underflow.
 913 
 914     Reported-by: Chris Kenna <cjk@cs.unc.edu>
 915 
 916 commit 7b544c16beaa1f6ec70a72d53fe84cae95f70a41
 917 Author: Andrea Bastoni <bastoni@cs.unc.edu>
 918 Date:   Thu Nov 25 13:29:31 2010 +0100
 919 
 920     bugfix: fix out-of-bound array access in cedf_activate_plugin()
 921 
 922     Make sure to check for maximum index value when accessing cedf_domain_t
 923     array in cedf_activate_plugin().
 924 
 925     Reported-by: Jeremy Erickson <jerickso@cs.unc.edu>
 926 
 927 commit 2aad06b056054442964f46752bdb098030cdb866
 928 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 929 Date:   Mon Nov 22 01:25:19 2010 -0500
 930 
 931     add optional [function@file:line] tag to TRACE() log
 932 
 933     Add information to each trace message that makes it easier to locate
 934     where it came from. It is disabled by default since this adds a lot of
 935     clutter. Example:
 936 
 937       81281 P1 [gsnedf_schedule@litmus/sched_gsn_edf.c:406]: (rtspin/1483:1) blocks:0 out_of_time:0 np:0 sleep:1 preempt:0 state:0 sig:0
 938       81282 P1 [job_completion@litmus/sched_gsn_edf.c:303]: (rtspin/1483:1) job_completion().
 939       81283 P1 [__add_release@litmus/rt_domain.c:344]: (rtspin/1483:2) add_release(), rel=41941764351
 940       81284 P1 [gsnedf_schedule@litmus/sched_gsn_edf.c:453]: (rtspin/1483:2) scheduled_on = NO_CPU
 941 
 942 commit 7779685f05219ff6e713ee6591644c080f51a8bf
 943 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 944 Date:   Mon Nov 22 00:39:45 2010 -0500
 945 
 946     log job number in TRACE_TASK() and TRACE_CUR()
 947 
 948     For some problems it can be helpful to know which job of a task
 949     generated a log message. This patch changes TRACE_TASK to add :<jobno>
 950     to the existing (<comm>/<pid>) tag.
 951 
 952     The result is a trace such as the following, in which the third job of
 953     rtspin/1511 completes and the fourth job is added to the release
 954     queue.
 955 
 956       137615 P0: (rtspin/1511:3) job_completion().
 957       137616 P0: (rtspin/1511:4) add_release(), rel=262013223089
 958       137617 P0: (rtspin/1511:4) scheduled_on = NO_CPU
 959 
 960     The job number for non-real-time tasks is always zero.
 961 
 962 commit d40413efabc0ab388f6ed83f48b28dc253d47238
 963 Author: Andrea Bastoni <bastoni@cs.unc.edu>
 964 Date:   Fri Nov 19 12:52:08 2010 +0100
 965 
 966     Bugfix: synchronize with all other CPUs before switching plugin
 967 
 968     The CPU triggering the plugin switch should wait until all other CPUs
 969     are in a proper state (synch_on_plugin_switch()) before performing the
 970     actual switch.
 971 
 972     Based on the original patch from Jeremy Erickson <jerickso@cs.unc.edu>.
 973 
 974     This should solve (for most practical cases) the C-EDF-related
 975     plugin-switch problem reported on the ML.
 976 
 977 commit 1726017e944d0086f14f867befbf5ebf07adc7dd
 978 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 979 Date:   Tue Nov 16 11:44:54 2010 -0500
 980 
 981     Improve help message for TRACE() buffer
 982 
 983     It's not being allocated per cpu anymore. Further, provide a hint to
 984     the user where to find the data in userspace.
 985 
 986 commit d922f5eb1c375ab0445240110656c1d793eaad04
 987 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
 988 Date:   Tue Nov 16 11:13:10 2010 -0500
 989 
 990     Make TRACE() buffer size configurable
 991 
 992     Let the user choose an appropriate buffer size (instead of scaling
 993     with NR_CPUS).  The kfifo api requires the buffer to be a power of
 994     two, so enforce this constraint in the configuration.
 995 
 996     This fixes a previously-existing compile-time error for values of
 997     NR_CPU that are not a power of two.
 998 
 999     Based on a patch by Mac Mollison <mollison@cs.unc.edu>.
1000 
1001 commit 6fbc3b495cccf2e4ab7d4ab674b5c576e9946bed
1002 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1003 Date:   Thu Nov 11 16:54:20 2010 -0500
1004 
1005     Workaround: do not set rq->skip_clock_update
1006 
1007     Disabling the clock update seems to be causing problems even in normal
1008     Linux, and causes major bugs under LITMUS^RT. As a workaround, just
1009     disable this "optimization" for now.
1010 
1011     Details: the idle load balancer causes tasks that suspsend to be
1012     marked with set_tsk_need_resched(). When such a task resumes, it may
1013     wrongly trigger the setting of skip_clock_update. However, a
1014     corresponding rescheduling event may not happen immediately, such that
1015     the currently-scheduled task is no longer charged for its execution
1016     time.
1017 
1018 commit 5a0df8d4e9a5da47c804d89426f06e08aa44426f
1019 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1020 Date:   Thu Nov 11 02:54:40 2010 -0500
1021 
1022     Remove LITMUS^RT TRACE_BUG_ON macro
1023 
1024     Linux now has a macro of the same name, which causes namespace
1025     collisions. Since our version is only being used in two places that
1026     haven't triggered in several years, let's just remove it.
1027 
1028 commit 7c1446ddceb89ee1ddbe5d7a90cfd4cb2bc8ad37
1029 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1030 Date:   Thu Nov 11 02:45:32 2010 -0500
1031 
1032     Avoid warning on 64bit builds
1033 
1034     The specifier %u doesn't match sizeof() if sizeof() returns a 64bit
1035     quantity on x86_64. Always cast it to int to avoid the warning.
1036 
1037 commit 98ac0cd2bbe476d79ebf44139a6259cb8d0dc6be
1038 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1039 Date:   Thu Nov 11 02:33:43 2010 -0500
1040 
1041     Cleanup TRACE() implementation
1042 
1043     Since the intial rebased from .24 to .32, the TRACE() implementation
1044     was a hybrid between our old ringbuffer implementation and the new
1045     generic kfifo API. This was a) ugly and b) not save for TRACE()
1046     invoctations during early boot.
1047 
1048     This patch rips out the old parts and replaces the actual buffer with a static kfifo.
1049 
1050     This also increases TRACE() buffer size considerably. As we avoid a
1051     dynamic allocation, this a larger size is less problematic for debug
1052     builds. This helps a bit with holes in the debug log if the
1053     buffer-flushing task is starved.
1054 
1055 commit f599a587e1c7446a76d7d62ed7748f3c4435acd8
1056 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1057 Date:   Wed Nov 10 12:20:48 2010 -0500
1058 
1059     Hook up LITMUS^RT remote preemption support on ARM
1060 
1061     Call into scheduler state machine in the IPI handler.
1062 
1063 commit 2c142d1028f276c6d5e58c553768ae32ed9bda68
1064 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1065 Date:   Wed Nov 10 12:25:43 2010 -0500
1066 
1067     Hook up LITMUS^RT remote preemption support on x86
1068 
1069     Call into scheduler state machine in the IPI handler.
1070 
1071 commit fb3df2ec261d8cd6bcb8206d9d985355214d7767
1072 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1073 Date:   Wed Nov 10 12:10:49 2010 -0500
1074 
1075     Implement proper remote preemption support
1076 
1077     To date, Litmus has just hooked into the smp_send_reschedule() IPI
1078     handler and marked tasks as having to reschedule to implement remote
1079     preemptions. This was never particularly clean, but so far we got away
1080     with it. However, changes in the underlying Linux, and peculartities
1081     of the ARM code (interrupts enabled before context switch) break this
1082     naive approach. This patch introduces new state-machine based remote
1083     preemption support. By examining the local state before calling
1084     set_tsk_need_resched(), we avoid confusing the underlying Linux
1085     scheduler. Further, this patch avoids sending unncessary IPIs.
1086 
1087 commit 516b6601bb5f71035e8859735a25dea0da4a0211
1088 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1089 Date:   Mon Nov 8 20:21:35 2010 -0500
1090 
1091     hook litmus tick function into hrtimer-driven ticks
1092 
1093     Litmus plugins should also be activated if ticks are triggered by
1094     hrtimer.
1095 
1096 commit 34310fd7dbc3ad98d8e7cafa4f872ba71ca00860
1097 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1098 Date:   Mon Nov 8 15:02:09 2010 -0500
1099 
1100     Split out TRACE() from litmus.h and cleanup some includes
1101 
1102     The TRACE() functionality doesn't need all of litmus.h. Currently,
1103     it's impossible to use TRACE() in sched.h due to a circular
1104     dependency. This patch moves TRACE() and friends to
1105     litmus/sched_debug.h, which can be included in sched.h.
1106 
1107     While at it, also fix some minor include ugliness that was revealed by
1108     this change.
1109 
1110 commit c6182ba4a548baf0d1238d0df54e7d38ed299c3e
1111 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1112 Date:   Mon Nov 1 19:40:02 2010 -0400
1113 
1114     sched_trace: make buffer size configurable
1115 
1116     Large sched_trace buffers cause boot problems on the ARM box. Allow
1117     the user to specify smaller buffers.
1118 
1119 commit 8e10e1803e695a08f1fb59e90dac4ba0d8744f89
1120 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1121 Date:   Mon May 31 13:06:50 2010 -0400
1122 
1123     ARM: hookup LITMUS^RT system calls
1124 
1125     Includes the LITMUS^RT-specifc unistd.h extension and modifies the
1126     actual syscall table.
1127 
1128 commit 9907691855fa49ec8ed317fc54a626fcd137c73b
1129 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1130 Date:   Sun May 30 18:52:30 2010 -0400
1131 
1132     ARM: Include LITMUS^RT KConfig
1133 
1134     Make the ARM built aware of the LITMUS^RT-specific options.
1135 
1136 commit dd9d29e1f6ec74af4ff7df1bbe4d05829887475f
1137 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1138 Date:   Mon May 31 15:19:02 2010 -0400
1139 
1140     ARM: provide get_cycles() for RealView PB11{MP,76} and Cortex-A8
1141 
1142     Use the CCNT register to override the default get_cycles() implementation in
1143     arch/arm/asm/timex.h. This is useful for overhead measurements and debugging.
1144 
1145 commit b39ae3793ab590efbdb8aab63a598071782d32b8
1146 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1147 Date:   Mon May 31 15:12:58 2010 -0400
1148 
1149     ARM: allow mach/timex.h to define get_cycles()
1150 
1151     Some platforms have access to a cycle coutner (CCNT) register in the
1152     CP15 coprocessor. This trivial change will allow such platforms to provide
1153     specialized implementations.
1154 
1155 commit 52d5524f64b4f118672f5d80235221fe1c622c18
1156 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1157 Date:   Fri Oct 22 22:47:09 2010 -0400
1158 
1159     C-EDF: move /proc/litmus/cluster_cache to /proc/litmus/plugins/C-EDF/cluster
1160 
1161     Make use of the new per-plugin proc file infrastructure to avoid
1162     littering the global namespace. While at it, also move all the
1163     relevant bits to sched_cedf.c. In the future, each plugin's parameters
1164     should be handled in the respective plugin file.
1165 
1166 commit e06e8374b5c04aeaddf14e9686842011f80f5664
1167 Author: Christopher Kenna <cjk@cs.unc.edu>
1168 Date:   Fri Oct 22 21:04:34 2010 -0400
1169 
1170     Litmus core: refactor the implementation of /proc
1171 
1172 commit 98f56816fcb5c97e0afd21a6e242bb72d5b7a551
1173 Author: Christopher Kenna <cjk@cs.unc.edu>
1174 Date:   Fri Oct 22 17:26:38 2010 -0400
1175 
1176     Litmus core: per-plugin proc directories
1177 
1178     Change the Litmus proc layout so that loaded plugins are visible in
1179     /proc/litmus/plugins/loaded and add Litmus functions make_plugin_proc_dir()
1180     and remove_plugin_proc_dir() to add per-plugin proc directories.
1181 
1182 commit 3dd41424090a0ca3a660218d06afe6ff4441bad3
1183 Merge: 5c54564 f6f94e2
1184 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1185 Date:   Sat Oct 23 01:01:49 2010 -0400
1186 
1187     Merge commit 'v2.6.36' into wip-merge-2.6.36
1188 
1189     Conflicts:
1190     	Makefile
1191     	arch/x86/include/asm/unistd_32.h
1192     	arch/x86/kernel/syscall_table_32.S
1193     	kernel/sched.c
1194     	kernel/time/tick-sched.c
1195 
1196     Relevant API and functions changes (solved in this commit):
1197     - (API) .enqueue_task() (enqueue_task_litmus),
1198       dequeue_task() (dequeue_task_litmus),
1199       [litmus/sched_litmus.c]
1200     - (API) .select_task_rq() (select_task_rq_litmus)
1201       [litmus/sched_litmus.c]
1202     - (API) sysrq_dump_trace_buffer() and sysrq_handle_kill_rt_tasks()
1203       [litmus/sched_trace.c]
1204     - struct kfifo internal buffer name changed (buffer -> buf)
1205       [litmus/sched_trace.c]
1206     - add_wait_queue_exclusive_locked -> __add_wait_queue_tail_exclusive
1207       [litmus/fmlp.c]
1208     - syscall numbers for both x86_32 and x86_64
1209 
1210 commit 5c5456402d467969b217d7fdd6670f8c8600f5a8
1211 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1212 Date:   Wed Sep 22 15:52:03 2010 -0400
1213 
1214     Litmus core: allow PRECISE_ENFORCEMENT
1215 
1216     Allow all kinds of budget enforcement settings now that we have the
1217     supporting infrastructure.
1218 
1219 commit 7caae3d71eae4f5307cae98131390e9d10627c01
1220 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1221 Date:   Mon Oct 18 16:55:37 2010 -0400
1222 
1223     Litmus core: enable precise budget enforcement
1224 
1225     Update the budget enforcement timer after each scheduling decision.
1226 
1227 commit 576b1ad144f81d3fd3bd37d18dab86cd1e8660b0
1228 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1229 Date:   Mon Oct 18 16:01:10 2010 -0400
1230 
1231     Litmus core: add plugin-independent precise budget enforcement infrastructure
1232 
1233     Simple logic: if a task requires precise enforcement, then program a
1234     hr-timer to fire when the task must be descheduled. When the timer
1235     fires, simply activate the scheduler. When we switch to a different
1236     task, either reprogram the timer or cancel it.
1237 
1238 commit 9b718afbc5db5a808804a336c17ba896a9f048a1
1239 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1240 Date:   Wed Sep 22 17:56:59 2010 -0400
1241 
1242     Litmus core: add macro to test for PRECISE_ENFORCEMENT
1243 
1244     Required for EDF-WM. We should implement precise enforcement
1245     in the core distribution soon anyway (once we know how it
1246     works in EDF-WM).
1247 
1248 commit bf34c69c682443b5bf2f9009b1a0039fd60e654f
1249 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1250 Date:   Mon Oct 18 15:58:27 2010 -0400
1251 
1252     Litmus core: add budget_remaining() helper
1253 
1254     Quick way to figure out how much budget a LITMUS^RT job has left.
1255 
1256 commit bd6d5f1dd586a27c2082ad4d95ee58913b471f5c
1257 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1258 Date:   Tue Sep 28 11:08:17 2010 -0400
1259 
1260     hrtimer: add init function to properly set hrtimer_start_on_info params
1261 
1262     This helper function is also useful to remind us that if we use
1263     hrtimer_pull outside the scope of triggering remote releases, we need to
1264     take care of properly set the "state" field of hrtimer_start_on_info
1265     structure.
1266 
1267 commit c8f95e3e04ffc1d96b7b615f8be9b7ac941ead15
1268 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1269 Date:   Wed Sep 22 23:13:03 2010 -0400
1270 
1271     Litmus core: set state to TASK_RUNNING before calling wake_up()
1272 
1273     Having tasks that are !is_running() in shared structures is
1274     very confusing during development and debugging, and can likely
1275     mask bugs and/or create races.
1276 
1277     It seems like a strange choice that Linux changes a task's state
1278     only _after_ activating it. For LITMUS^RT tasks, we change this order.
1279 
1280 commit 8ad8bfcab56a140389df2ed323b56d849e6cf5fb
1281 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1282 Date:   Wed Sep 22 18:17:37 2010 -0400
1283 
1284     rt_domain_t: disable timer TRACE() spam by default
1285 
1286     These messages are highly useful when debugging races,
1287     but they quickly litter the log when looking for something else.
1288 
1289     We keep them around, but by default they shouldn't show up.
1290 
1291 commit 8cc60b37588e130bed9d418bcfbe4d64c3a91935
1292 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1293 Date:   Tue Sep 21 22:49:37 2010 -0400
1294 
1295     rt_domain_t: add add_release_on()
1296 
1297     This API addition allows the calling code to override
1298     the release master for a given rt_domain_t object. This
1299     is particularly useful if a job is supposed to migrate
1300     to a particular CPU. This need arises for example in semi-
1301     partitioned schedulers.
1302 
1303 commit 2ed4499a959f8fc30e430b6644ec83ceb7d49ef6
1304 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1305 Date:   Tue Sep 21 12:16:00 2010 -0400
1306 
1307     PSN-EDF: remove outdated comment
1308 
1309     ...and replace it with a more useful one. We don't directly modify
1310     Linux run queues anymore since (at least) LITMUS^RT 2008.
1311 
1312 commit 136a08dbe8c28e751b01e932420f715edb229f6b
1313 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1314 Date:   Fri Jul 16 10:30:06 2010 -0400
1315 
1316     Bugfix: avoid link error in Feather-Trace on x86
1317 
1318     If no events are defined but Feater-Trace support is enabled, then the current
1319     implementation generates a link error because the __event_table sections is
1320     absent.
1321 
1322     > arch/x86/built-in.o: In function `ft_disable_all_events':
1323     > (.text+0x242af): undefined reference to `__start___event_table'
1324 
1325     As a simple work around, we force zero-element array to always be "allocated"
1326     in the __event_table section. This ensures that we end up with a zero-byte
1327     section if no events are enabled, and does not affect the layout of the section
1328     if events are present.
1329 
1330     > bbb@ludwig:~/dev/litmus2010$ nm vmlinux | grep event_table
1331     > ffffffff81950cdc D __event_table_dummy
1332     > ffffffff81950cdc A __start___event_table
1333     > ffffffff81950cdc A __stop___event_table
1334 
1335 commit cbc5d49e4973400737aab50b60dc5d86e71f5420
1336 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1337 Date:   Sat Jun 19 13:45:36 2010 -0400
1338 
1339     Bugfix: avoid conditional compilation dependent error
1340 
1341     If RELEASE_MASTER is not selected the "info" hrtimer_start_on_info
1342     structure in release_heap structure is not visible and trying to access
1343     "info" from reinit_release_heap() causes the following error:
1344 
1345     error: 'struct release_heap' has no member named 'info'
1346 
1347     info should not be referenced if RELEASE_MASTER is not used.
1348 
1349     The problem was first reported by Glenn <gelliott@cs.unc.edu>
1350 
1351 commit d1aa1956eb23202e4d614574f686e53b8785212c
1352 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1353 Date:   Sat Jun 12 20:22:16 2010 -0400
1354 
1355     Bugfix: change __ARCH_HAS_SEND_PULL_TIMERS in CONFIG_ARCH_HAS_SEND_PULL_TIMERS
1356 
1357     Commit "0c527966 Make release master support optional" uses
1358     __ARCH_HAS_SEND_PULL_TIMERS instead of CONFIG_ARCH_HAS_SEND_PULL_TIMERS
1359     (introduced in commit 0fb33c99) to conditionally compile a pull timer
1360     related code in rt_domain.c. This code is disabled and pull-timer's
1361     state is no longer properly reset. Therefore, a pulled timer cannot be
1362     armed anymore.
1363 
1364 commit 9840983a4f30145bcf0b82b6e2bc8518e7212fb5
1365 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1366 Date:   Wed Jun 2 18:27:47 2010 -0400
1367 
1368     Make litmus_sched_class static
1369 
1370     litmus_sched_class wasn't declared static, but it's not used outside
1371     sched.c, so change it's signature to static.
1372 
1373 commit 753fb14dfb0662e1d38758ffc6876c0ab1c7bd9e
1374 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1375 Date:   Mon May 31 12:52:35 2010 -0400
1376 
1377     Make  platform-specific Feather-Trace depend on !CONFIG_DEBUG_RODATA
1378 
1379     Feather-Trace rewrites instructions in the kernel's .text segment.
1380     This segment may be write-protected if CONFIG_DEBUG_RODATA is selected.
1381     In this case, fall back to the default flag-based Feather-Trace
1382     implementation. In the future, we could either adopt the ftrace method
1383     of rewriting .text addresses using non-.text mappings or we could
1384     consider replacing Feather-Trace with ftrace altogether.
1385 
1386     For now, this patch avoids unexpected runtime errors.
1387 
1388 commit 62c186fde48926a30f4e61332a805430dc1325cd
1389 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1390 Date:   Mon May 31 12:15:51 2010 -0400
1391 
1392     Make PFAIR optional to prevent build and runtime failures.
1393 
1394     The PFAIR plugin always implicitly assumed !NO_HZ (the schedule
1395     is wrong if NO_HZ is enabled) and does not built if hrtimers are absent:
1396 
1397     > litmus/built-in.o: In function `pfair_activate_plugin':
1398     > sched_pfair.c:(.text+0x7f07): undefined reference to `cpu_stagger_offset'
1399     > litmus/built-in.o: In function `init_pfair':
1400     > sched_pfair.c:(.init.text+0x487): undefined reference to `cpu_stagger_offset'
1401 
1402     cpu_stagger_offset() is only available if hrtimers are enabled.
1403 
1404     This patch makes these dependencies explicit.
1405 
1406 commit 4382e90cf851fc1d209a466bab92e256aeb7acf1
1407 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1408 Date:   Mon May 31 00:54:07 2010 -0400
1409 
1410     Make C-EDF depend on x86 and SYSFS
1411 
1412     C-EDF depends on intel_cacheinfo.c (for get_shared_cpu_map()) which is
1413     only available on x86 architectures. Furthermore, get_shared_cpu_map()
1414     is only available if SYSFS filesystem is present.
1415 
1416 commit 8bf9de45b663e4b9ce889eb24929ce773f306339
1417 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1418 Date:   Sun May 30 19:50:52 2010 -0400
1419 
1420     Make smp_send_pull_timers() optional.
1421 
1422     There is currently no need to implement this in ARM.
1423     So let's make it optional instead.
1424 
1425 commit cedc8df1cf1ff935af5455a9d565dac05192a47f
1426 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1427 Date:   Sun May 30 19:46:21 2010 -0400
1428 
1429     Make release master support optional
1430 
1431     Introduces CONFIG_RELEASE_MASTER and makes release
1432     master support dependent on the new symbol. This is
1433     useful because dedicated interrupt handling only applies
1434     to "large" multicore platforms. This will allow us to
1435     not implement smp_send_pull_timers() for all platforms.
1436 
1437 commit 5b54b24c13b7c5dbaa06eae5e1a0075da354289c
1438 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1439 Date:   Sun May 30 18:59:30 2010 -0400
1440 
1441     Make compilation of C-EDF optional.
1442 
1443     C-EDF only makes sense on multicore platforms that have shared caches.
1444     Make it possible to disable it on other platforms, in particular,
1445     on those that do not export get_shared_cpu_map().
1446 
1447 commit 152968b15afb74a6adba6d512c5eebf0280c8f00
1448 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1449 Date:   Sun May 30 18:41:28 2010 -0400
1450 
1451     Make __ARCH_HAS_FEATHER_TRACE a proper CONFIG_ variable.
1452 
1453     The idea of the Feather-Trace default implementation is that LITMUS^RT should
1454     work without a specialized Feather-Trace implementation present. This was
1455     actually broken.
1456 
1457     Changes litmus/feather_trace.h to only include asm/feather_trace.h if actually
1458     promised by the architecture.
1459 
1460 commit a7205820bae197a89fc746f9f3c07e389d7068ba
1461 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1462 Date:   Fri May 28 15:45:23 2010 -0400
1463 
1464     Bugfix: re-insert missing TS_PLUGIN_TICK_END tracing point
1465 
1466     Insert PLUGIN_TICK_END tracing point in litmus_tick(). It was lost during
1467     the porting of 2008.3 to 2010.1.
1468 
1469 commit de2d5dfa2dce8ec40555b3bb6dfe21627e472c52
1470 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1471 Date:   Thu May 20 16:14:00 2010 -0400
1472 
1473     Add support for one single cluster (all cpus) on C-EDF
1474 
1475     - With the "ALL" cluster size option the behavior of C-EDF is
1476       equivalent to G-EDF (one single cluster)
1477 
1478 commit 6f89d4f31485546674187cf3b4d472f230b263d0
1479 Author: Glenn Elliott <gelliott@koruna.cs.unc.edu>
1480 Date:   Thu May 20 14:33:27 2010 -0400
1481 
1482     Added support for choices in budget policy enforcement.
1483 
1484     NO_ENFORCEMENT - A job may execute beyond its declared execution time.
1485       Jobs notify the kernel that they are complete via liblitmus's
1486       sleep_next_period()
1487     QUANTUM_ENFORCEMENT - The kernel terminates a job if its actual execution
1488       time exceeds the declared execution time.
1489     PRECISE_ENFORCEMENT - Hook declared, but not yet implemented.  Plan to
1490       support this policy through hrtimers.  Error thrown if specified.
1491 
1492 commit 521422c4ef2c64731f709030915a7b301709f4b4
1493 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1494 Date:   Sat May 29 23:53:40 2010 -0400
1495 
1496     Update kfifo and spinlock_t in sched_trace.c
1497 
1498     - kfifo needs to be defined and used differently (see include/linux/kfifo.h)
1499     - spinlock -> raw_spinlock
1500     - include slab.h when using kmalloc and friends
1501 
1502     This commit compiles and is the logical end of the merge of Litmus and
1503     2.6.34.
1504 
1505 commit 8e9830a5bdb081fd3f4387db3a3838a687dfdad2
1506 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1507 Date:   Sat May 29 23:50:17 2010 -0400
1508 
1509     Update sched_class and spinlock_t in litmus.c
1510 
1511     - get_rr_interval() changed signature
1512     - load_balance() and move_one_tak() are no longer needed
1513     - spinlock_t -> raw_spinlock_t
1514 
1515     This commit does not compile.
1516 
1517 commit a66246f9e973a68fb9955a2fa7663a2e02afbd30
1518 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1519 Date:   Sat May 29 23:45:13 2010 -0400
1520 
1521     Change most LitmusRT spinlock_t in raw_spinlock_t
1522 
1523     Adapt to new schema for spinlock:
1524     (tglx 20091217)
1525     spinlock - the weakest one, which might sleep in RT
1526     raw_spinlock - spinlock which always spins even on RT
1527     arch_spinlock - the hardware level architecture dependent implementation
1528 
1529     ----
1530 
1531     Most probably, all the spinlocks changed by this commit will be true
1532     spinning lock (raw_spinlock) in PreemptRT (so hopefully we'll need few
1533     changes when porting Litmmus to PreemptRT).
1534 
1535     There are a couple of spinlock that the kernel still defines as
1536     spinlock_t (therefore no changes reported in this commit) that might cause
1537     us troubles:
1538 
1539     - wait_queue_t lock is defined as spinlock_t; it is used in:
1540       * fmlp.c -- sem->wait.lock
1541       * sync.c -- ts_release.wait.lock
1542 
1543     - rwlock_t used in fifo implementation in sched_trace.c
1544       * this need probably to be changed to something always spinning in RT
1545         at the expense of increased locking time.
1546 
1547     ----
1548 
1549     This commit also fixes warnings and errors due to the need to include
1550     slab.h when using kmalloc() and friends.
1551 
1552     ----
1553 
1554     This commit does not compile.
1555 
1556 commit 6ffc1fee98c4b995eb3a0285f4f8fb467cb0306e
1557 Merge: e40152e 7c1ff4c
1558 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1559 Date:   Sat May 29 23:35:01 2010 -0400
1560 
1561     Merge branch 'master' into wip-merge-2.6.34
1562 
1563     Simple merge between master and 2.6.34 with conflicts resolved.
1564 
1565     This commit does not compile, the following main problems are still
1566     unresolved:
1567 
1568     - spinlock -> raw_spinlock API changes
1569     - kfifo API changes
1570     - sched_class API changes
1571 
1572     Conflicts:
1573     	Makefile
1574     	arch/x86/include/asm/hw_irq.h
1575     	arch/x86/include/asm/unistd_32.h
1576     	arch/x86/kernel/syscall_table_32.S
1577     	include/linux/hrtimer.h
1578     	kernel/sched.c
1579     	kernel/sched_fair.c
1580 
1581 commit 7c1ff4c544dd650cceff3cd69a04bcba60856678
1582 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1583 Date:   Fri May 28 10:51:01 2010 -0400
1584 
1585     Add C-EDF Plugin
1586 
1587     Improved C-EDF plugin. C-EDF now supports different cluster sizes (based
1588     on L2 and L3 cache sharing) and supports dynamic changes of cluster size
1589     (this requires reloading the plugin).
1590 
1591 commit 425a6b5043bcc2142804107c853f978ac2fe3040
1592 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1593 Date:   Fri May 28 10:49:09 2010 -0400
1594 
1595     Export shared_cpu_map
1596 
1597     The cpumap of CPUs that share the same cache level is not normally
1598     available outside intel_cacheinfo.c. This commit allows to export such
1599     map.
1600 
1601 commit f85625ccf28d1bffd4dac916babb76b910ebef31
1602 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1603 Date:   Tue Apr 27 11:00:19 2010 -0400
1604 
1605     Synchronize plugin switching
1606 
1607     Make sure the plugin is not used by any CPUs while switching.
1608     The CPU performing the switch sends an IPI to all other CPUs forcing
1609     them to synchronize on an atomic variable.
1610 
1611 commit 8fe2fb8bb1c1cd0194608bc783d0ce7029e8d869
1612 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1613 Date:   Mon Apr 26 13:42:00 2010 -0400
1614 
1615     Measure timer re-arming in the proper location
1616 
1617     hrtimers are properly rearmed during arm_release_timer() and no longer
1618     after rescheduling (with the norqlock mechanism of 2008.3). This commit
1619     accordingly updates the locations where measures are taken.
1620 
1621 commit 5da9b3e7aab0755f6ca19738d33e218e02b19a41
1622 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1623 Date:   Fri Mar 12 12:23:13 2010 -0500
1624 
1625     Bugfix: PSN-EDF should log job_completion events
1626 
1627     Log task completions in job_completion() for PSN-EDF.
1628     This fixes the problem of missing job-completion events for PSN-EDF.
1629 
1630 commit 7a4affe47db86075eb36519049d047f6facab378
1631 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1632 Date:   Tue Mar 2 11:51:07 2010 -0500
1633 
1634     Bugfix: PSN-EDF should only requeue tasks that are not scheduled
1635 
1636     Requeue a task that is already scheduled will cause it to be effectively
1637     in the runqueue twice since scheduled tasks are conceptually the head
1638     of the queue. If a task is still scheduled, then schedule() will do the
1639     right thing and do the requeuing if necessary.
1640 
1641     This fixes crashes reported by Glenn and Andrea.
1642 
1643 commit 0c1a489cb92c996d50adfb84fee5edd7205e0c1b
1644 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1645 Date:   Thu Feb 4 19:47:29 2010 -0500
1646 
1647     Used miscdevice API for sched_trace
1648 
1649     This patch changes sched_trace.c to use the miscdevice API
1650     instead of doing all the cdev management ourselves. This remove a
1651     chunk of code and we get sysfs / udev integration for free.
1652 
1653     On systems with default udev rules, this will result in a /dev/litmus/log
1654     device being created automatically.
1655 
1656 commit 8815090d72fe0fe8f5f67e3bcc8fbe7a5ad1704d
1657 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1658 Date:   Thu Feb 25 19:33:22 2010 -0500
1659 
1660     Bugfix: make fdso syscalls 64bit clean
1661 
1662     This fixes a bug found by liblitmus's regression test suite.
1663     Before:
1664     > ** LITMUS^RT test suite.
1665     > ** Running tests for LINUX.
1666     > ** Testing: don't open FMLP semaphores if FMLP is not supported...
1667     > !! TEST FAILURE open_fmlp_sem(fd, 0) -> -16, Success (expected: EBUSY)
1668     >    at tests/fdso.c:21 (test_fmlp_not_active)
1669     > ** Testing: reject invalid object descriptors... ok.
1670     > ** Testing: reject invalid object types...
1671     > !! TEST FAILURE od_open(0, -1, 0) -> -22, Bad file descriptor (expected: EINVAL)
1672     >    at tests/fdso.c:51 (test_invalid_obj_type)
1673     > ** Testing: reject invalid rt_task pointers... ok.
1674     > ** Result: 2 ok, 2 failed.
1675 
1676     After:
1677     > ** LITMUS^RT test suite.
1678     > ** Running tests for LINUX.
1679     > ** Testing: don't open FMLP semaphores if FMLP is not supported... ok.
1680     > ** Testing: reject invalid object descriptors... ok.
1681     > ** Testing: reject invalid object types... ok.
1682     > ** Testing: reject invalid rt_task pointers... ok.
1683     > ** Result: 4 ok, 0 failed.
1684 
1685 commit 8ad3e04b815e44d084b855cfa3dcda260cdf56ae
1686 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1687 Date:   Fri Feb 19 13:42:35 2010 -0500
1688 
1689     Bugfix: don't inherit od_table across forks
1690 
1691     The od_table is strictly per-thread and should not be inherited across
1692     a fork/clone. This caused memory corruption when a task exited, which
1693     ultimately could lead to oopses in unrelated code.
1694 
1695     Bug and testcase initially reported by Glenn.
1696 
1697 commit 944f051fda9551483399bed556870b0895df1efa
1698 Author: Glenn Elliott <gelliott@cs.unc.edu>
1699 Date:   Fri May 28 10:39:56 2010 -0400
1700 
1701     Bugfix: 1) incorrect FMLP high prio task tracking and 2) race in print statement
1702 
1703     1) High priority task tied to FMLP semaphore in P-EDF scheduling is
1704        incorrectly tracked for tasks acquiring the lock without
1705        contention.  (HP is always set to CPU 0 instead of proper CPU.)
1706     2) Race in a print statement from P-EDF's pi_block() causes NULL
1707        pointer dereference.
1708 
1709 commit 9039e5f731ca5f9a0c69f8523ccfee044111d2e3
1710 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1711 Date:   Wed Feb 3 19:56:21 2010 -0500
1712 
1713     Use generic preemption function in GSN- and PSN-EDF.
1714 
1715     This patch updates non-preemptive section support in
1716     GSN- and PSN-EDF.
1717 
1718 commit f3a6cb9af5cdb01f29ad32b01aa56a14f0da144e
1719 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1720 Date:   Wed Feb 3 19:42:02 2010 -0500
1721 
1722     Introduce generic NP-section aware preemption function
1723 
1724     Dealing with preemptions across CPUs in the presence of non-preemptive
1725     sections can be tricky and should not be replicated across (event-driven) plugins.
1726 
1727     This patch introduces a generic preemption function that handles
1728     non-preemptive sections (hopefully) correctly.
1729 
1730 commit fb95c290fe461de794c984bc4130741f04f9142d
1731 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1732 Date:   Wed Feb 3 19:40:01 2010 -0500
1733 
1734     Re-implement non-preemptive section support.
1735 
1736     Re-introduce NP sections in the configuration and in litmus.h. Remove the old
1737     np_flag from rt_param.
1738 
1739     If CONFIG_NP_SECTION is disabled, then all non-preemptive section checks are
1740     constant expressions which should get removed by the dead code elimination
1741     during optimization.
1742 
1743     Instead of re-implementing sys_exit_np(), we simply repurposed sched_yield()
1744     for calling into the scheduler to trigger delayed preemptions.
1745 
1746 commit b973c95c86e6710c913c01a67013605f68a3c2c3
1747 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1748 Date:   Wed Feb 3 19:35:20 2010 -0500
1749 
1750     Add virtual LITMUS^RT control device.
1751 
1752     This device only supports mmap()'ing a single page.
1753     This page is shared RW between the kernel and userspace.
1754     It is inteded to allow near-zero-overhead communication
1755     between the kernel and userspace. It's first use will be a
1756     proper implementation of user-signaled
1757     non-preemptable section support.
1758 
1759 commit 5e987d486c0f89d615d134512938fc1198b3ca67
1760 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1761 Date:   Fri Jan 29 19:25:26 2010 -0500
1762 
1763     Bugfix: clear LITMUS^RT state on fork completely
1764 
1765     When a real-time task forks, then its LITMUS^RT-specific fields should be cleared,
1766     because we don't want real-time tasks to spawn new real-time tasks that bypass
1767     the plugin's admission control (if any).
1768 
1769     This was broken in three ways:
1770     1) kernel/fork.c did not erase all of tsk->rt_param, only the first few bytes due to
1771        a wrong size argument to memset().
1772     2) It should have been calling litmus_fork() instead anyway.
1773     3) litmus_fork() was _also_ not clearing all of tsk->rt_param, due to another size
1774        argument bug.
1775 
1776     Interestingly, 1) and 2) can be traced back to the 2007->2008 port,
1777     whereas 3) was added by Mitchell much later on (to dead code, no less).
1778 
1779     I'm really surprised that this never blew up before.
1780 
1781 commit 37b840336a1663a5ce62d663a702d9afefd56d23
1782 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1783 Date:   Mon Feb 1 23:07:54 2010 -0500
1784 
1785     Add Feather-Trace x86_64 architecture dependent code
1786 
1787 commit d1a840d7194fdd09c1bd9977e30fd391ef2a7526
1788 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1789 Date:   Tue Jan 19 19:38:14 2010 -0500
1790 
1791     [ported from 2008.3] Add Feather-Trace x86_32 architecture dependent code
1792 
1793     - [ported from 2008.3] Add x86_32 architecture dependent code.
1794     - Add the infrastructure for x86_32 - x86_64 integration.
1795 
1796 commit 07ae7efcb81f95eb8e870cad21c7ba72573af7e8
1797 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1798 Date:   Thu Dec 17 21:48:38 2009 -0500
1799 
1800     Add support for x86_64 architecture
1801 
1802     - Add syscall on x86_64
1803 
1804     - Refactor __NR_sleep_next_period -> __NR_complete_job
1805       for both x86_32 and x86_64
1806 
1807 commit 5306b9834e9660e370fb8430ff22d4a47b4bbdf5
1808 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1809 Date:   Thu Dec 17 21:47:51 2009 -0500
1810 
1811     Add pull_timers_interrupt() to x86_64
1812 
1813     Add apic interrupt vector for pull_timers() in x86_64 arch.
1814 
1815 commit b30bc467e88c9f1e6335ac7d442d0190bf6f6a2e
1816 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1817 Date:   Thu Jan 28 19:03:17 2010 -0500
1818 
1819     [ported from 2008.3] Add PSN-EDF Plugin
1820 
1821 commit c2f4c165b208062d90f65a1c1a0c815261c6a81e
1822 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1823 Date:   Wed Jan 27 19:57:09 2010 -0500
1824 
1825     [ported from 2008.3] Add PFAIR plugin
1826 
1827 commit cddade083e5ea74cba6f0e4b2fa10c6bbec1336c
1828 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1829 Date:   Sat Jan 16 19:39:40 2010 -0500
1830 
1831     Add optional dynamic assignment of tracing devices major nr
1832 
1833     Setting FT_TASK_TRACE_MAJOR, LOG_MAJOR, FT_TRACE_MAJOR to 0
1834     allows to have them automatically assigned by the kernel
1835 
1836 commit a084c01569bcfe13fd880a0b1e3a9026629a89da
1837 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1838 Date:   Fri May 28 10:30:29 2010 -0400
1839 
1840     Better explanation of jump-to-CFS optimization removal
1841 
1842     GSN-EDF and friends rely on being called even if there is currently
1843     no runnable real-time task on the runqueue for (at least) two reasons:
1844     1) To initiate migrations. LITMUS^RT pull tasks for migrations; this requires
1845          plugins to be called even if no task is currently present.
1846     2) To maintain invariants when jobs block.
1847 
1848 commit e68debebdc2983600063cd6b04c6a51c4b7ddcc1
1849 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1850 Date:   Fri May 28 10:25:34 2010 -0400
1851 
1852     Integrate litmus_tick() in task_tick_litmus()
1853 
1854     - remove the call to litmus_tick() from scheduler_tick() just after
1855       having performed the class task_tick() and integrate
1856       litmus_tick() in task_tick_litmus()
1857 
1858     - task_tick_litmus() is the handler for the litmus class task_tick()
1859       method. It is called in non-queued mode from scheduler_tick()
1860 
1861 commit 9ac80419f88f192cdf586da3df585c224ef27773
1862 Author: Bjoern B. Brandenburg <bbb@cs.unc.edu>
1863 Date:   Wed Feb 3 13:59:40 2010 -0500
1864 
1865     Turn off GSN-EDF TRACE() spam by default.
1866 
1867     Having GSN-EDF log so many things each tick is useful
1868     when tracking down race conditions, but it also makes
1869     it really hard to find anything else. Thus, turn it off by
1870     default but leave it in for future debugging fun.
1871 
1872 commit ee09f78d8faa0b988088d93142e6f5f8a6e75394
1873 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1874 Date:   Mon Dec 21 12:23:57 2009 -0500
1875 
1876     Refactor binomial heap names: heap -> bheap
1877 
1878     - Binomial heap "heap" names conflicted with priority heap
1879       of cgroup in kernel
1880     - This patch change binomial heap "heap" names in "bheap"
1881 
1882 commit 0b28a3122d6917784701377e15a863489aee1c6c
1883 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1884 Date:   Thu Dec 17 21:47:19 2009 -0500
1885 
1886     [ported from 2008.3] Add release-master support
1887 
1888 commit c15be843778236e9f2fdbc207ab36ba996b2bb1b
1889 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1890 Date:   Thu Dec 17 21:45:38 2009 -0500
1891 
1892     [ported from 2008.3] Add hrtimer_start_on() API
1893 
1894 commit b085cafc43bc395e255626204169e20a587f28ba
1895 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1896 Date:   Thu Dec 17 21:44:47 2009 -0500
1897 
1898     [ported from 2008.3] Add send_pull_timers() support for x86_32 arch
1899 
1900 commit 50ca05ff9cc85176c3ee18bf1363d3d7c34aa355
1901 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1902 Date:   Thu Dec 17 21:39:14 2009 -0500
1903 
1904     [ported from 2008.3] Add GSN-EDF plugin
1905 
1906     - insert arm_release_timer() in add_relese() path
1907     - arm_release_timer() uses __hrtimer_start_range_ns() instead of
1908       hrtimer_start() to avoid deadlock on rq->lock.
1909 
1910 commit 2a94c7bf9869a13e32de7a1fe94596de7b4789a8
1911 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1912 Date:   Fri May 28 10:03:24 2010 -0400
1913 
1914     [ported from 2008.3] Add LITRMUS^RT syscalls to x86_32
1915 
1916 commit 269cf3c49cef2b23605e98ad4a8133357bebaac0
1917 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1918 Date:   Thu Dec 17 21:36:40 2009 -0500
1919 
1920     [ported from 2008.3] Add FMLP support
1921 
1922 commit 5442a8adfce93c1cd556e04bfc0a118adc3b683e
1923 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1924 Date:   Thu Dec 17 21:34:09 2009 -0500
1925 
1926     [ported from 2008.3] Add Stack Resource Policy (SRP) support
1927 
1928 commit fa3c94fc9cd1619fe0dd6081a1a980c09ef3e119
1929 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1930 Date:   Thu Dec 17 21:33:26 2009 -0500
1931 
1932     [ported from 2008.3] Add File Descriptor Attached Shared Objects (FDSO) infrastructure
1933 
1934 commit f5936ecf0cff0b94419b6768efba3e15622beeb6
1935 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1936 Date:   Thu Dec 17 21:32:31 2009 -0500
1937 
1938     [ported from 2008.3] Add common EDF functions
1939 
1940 commit 53696c1fe6a6ada66f2a47c078d62aee40ad8ebe
1941 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1942 Date:   Thu Dec 17 21:31:46 2009 -0500
1943 
1944     [ported from 2008.3] Add rt_domain_t support
1945 
1946     Still to be merged:
1947     - arm_release_timer() with no rq locking
1948 
1949 commit 4e593e7105dec02e62ea7a1812dccb35a0d56d01
1950 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1951 Date:   Thu Dec 17 21:30:47 2009 -0500
1952 
1953     [ported from 2008.3] Add support for quantum alignment
1954 
1955 commit 1d823f50678d7cc3bf72bf89ec0bddc7338e23d5
1956 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1957 Date:   Thu Dec 17 21:30:11 2009 -0500
1958 
1959     [ported from 2008.3] Add synchronous task release API
1960 
1961 commit 59d8d4c53f1e9f6408b87fc22e319e78f664276f
1962 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1963 Date:   Thu Dec 17 21:29:31 2009 -0500
1964 
1965     [ported from 2008.3] Add complete_n() call
1966 
1967 commit 2079f38466395c64ef40ef3429ee52fd92cdbd99
1968 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1969 Date:   Sat Jan 16 19:36:50 2010 -0500
1970 
1971     Move sched_trace ring buffer to kfifo implementation
1972 
1973     Use kfifo [kernel/kfifo.c] to implement the ring buffer used
1974     for sched_trace (TRACE() and TRACE_TASK() macros)
1975 
1976     This patch also includes some reorganization of sched_trace.c code
1977     and some fixes:
1978 
1979     - 1c39c59b3 Fix GFP_KERNEL in rb_alloc_buf with interrupt disabled.
1980     - 193ad2688 Let TRACE() log buffer size and comment converge.
1981     - 6195e2ae8 re-enable capturing of printk() messages in TRACE() logs.
1982 
1983 commit 96979188007a0671d3f067d7edf144742d7433ee
1984 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1985 Date:   Thu Dec 17 21:26:50 2009 -0500
1986 
1987     [ported from 2008.3] Add tracing support and hook up Litmus KConfig for x86
1988 
1989     - fix requesting more than 2^11 pages (MAX_ORDER)
1990       to system allocator
1991 
1992     Still to be merged:
1993     - feather-trace generic implementation
1994 
1995 commit cf3f4bd8db320f3f487d66bdec924e926f004787
1996 Author: Andrea Bastoni <bastoni@cs.unc.edu>
1997 Date:   Thu Dec 17 21:24:47 2009 -0500
1998 
1999     [ported from 2008.3] Add Feather-Trace device file support
2000 
2001 commit 4b38febbd59fd33542a343991262119eb9860f5e
2002 Author: Andrea Bastoni <bastoni@cs.unc.edu>
2003 Date:   Thu Dec 17 21:23:36 2009 -0500
2004 
2005     [ported from 2008.3] Core LITMUS^RT infrastructure
2006 
2007     Port 2008.3 Core LITMUS^RT infrastructure to Linux 2.6.32
2008 
2009     litmus_sched_class implements 4 new methods:
2010 
2011     - prio_changed:
2012       void
2013 
2014     - switched_to:
2015       void
2016 
2017     - get_rr_interval:
2018       return infinity (i.e., 0)
2019 
2020     - select_task_rq:
2021       return current cpu
2022 ---
2023  Makefile                                    |    4 +-
2024  arch/arm/Kconfig                            |    8 +
2025  arch/arm/include/asm/timex.h                |    2 +
2026  arch/arm/include/asm/unistd.h               |    3 +
2027  arch/arm/kernel/calls.S                     |   12 +
2028  arch/arm/kernel/smp.c                       |    4 +
2029  arch/arm/mach-realview/include/mach/timex.h |   27 +
2030  arch/x86/Kconfig                            |    8 +
2031  arch/x86/include/asm/entry_arch.h           |    1 +
2032  arch/x86/include/asm/feather_trace.h        |   17 +
2033  arch/x86/include/asm/feather_trace_32.h     |   79 +
2034  arch/x86/include/asm/feather_trace_64.h     |   67 +
2035  arch/x86/include/asm/hw_irq.h               |    3 +
2036  arch/x86/include/asm/irq_vectors.h          |    7 +
2037  arch/x86/include/asm/processor.h            |    4 +
2038  arch/x86/include/asm/unistd_32.h            |    6 +-
2039  arch/x86/include/asm/unistd_64.h            |    4 +
2040  arch/x86/kernel/Makefile                    |    2 +
2041  arch/x86/kernel/cpu/intel_cacheinfo.c       |   17 +
2042  arch/x86/kernel/entry_64.S                  |    2 +
2043  arch/x86/kernel/ft_event.c                  |  118 ++
2044  arch/x86/kernel/irq.c                       |    4 +
2045  arch/x86/kernel/irqinit.c                   |    3 +
2046  arch/x86/kernel/smp.c                       |   31 +
2047  arch/x86/kernel/syscall_table_32.S          |   13 +
2048  fs/exec.c                                   |   13 +-
2049  fs/inode.c                                  |    2 +
2050  include/linux/completion.h                  |    2 +
2051  include/linux/fs.h                          |   21 +-
2052  include/linux/hardirq.h                     |    4 +
2053  include/linux/hrtimer.h                     |   32 +
2054  include/linux/interrupt.h                   |   12 +-
2055  include/linux/mutex.h                       |   10 +
2056  include/linux/sched.h                       |   19 +-
2057  include/linux/semaphore.h                   |    9 +
2058  include/linux/smp.h                         |    5 +
2059  include/linux/tick.h                        |    5 +
2060  include/linux/workqueue.h                   |   18 +
2061  include/litmus/affinity.h                   |   80 +
2062  include/litmus/bheap.h                      |   77 +
2063  include/litmus/binheap.h                    |  207 ++
2064  include/litmus/budget.h                     |    8 +
2065  include/litmus/clustered.h                  |   44 +
2066  include/litmus/debug_trace.h                |   37 +
2067  include/litmus/edf_common.h                 |   37 +
2068  include/litmus/fdso.h                       |   83 +
2069  include/litmus/feather_buffer.h             |   94 +
2070  include/litmus/feather_trace.h              |   65 +
2071  include/litmus/fpmath.h                     |  145 ++
2072  include/litmus/ftdev.h                      |   55 +
2073  include/litmus/gpu_affinity.h               |   49 +
2074  include/litmus/ikglp_lock.h                 |  160 ++
2075  include/litmus/jobs.h                       |    9 +
2076  include/litmus/kexclu_affinity.h            |   35 +
2077  include/litmus/kfmlp_lock.h                 |   97 +
2078  include/litmus/litmus.h                     |  282 +++
2079  include/litmus/litmus_proc.h                |   25 +
2080  include/litmus/litmus_softirq.h             |  199 ++
2081  include/litmus/locking.h                    |  160 ++
2082  include/litmus/nvidia_info.h                |   46 +
2083  include/litmus/preempt.h                    |  164 ++
2084  include/litmus/rsm_lock.h                   |   54 +
2085  include/litmus/rt_domain.h                  |  182 ++
2086  include/litmus/rt_param.h                   |  307 +++
2087  include/litmus/sched_plugin.h               |  183 ++
2088  include/litmus/sched_trace.h                |  380 ++++
2089  include/litmus/sched_trace_external.h       |   78 +
2090  include/litmus/srp.h                        |   28 +
2091  include/litmus/trace.h                      |  148 ++
2092  include/litmus/trace_irq.h                  |   21 +
2093  include/litmus/unistd_32.h                  |   24 +
2094  include/litmus/unistd_64.h                  |   40 +
2095  kernel/exit.c                               |    4 +
2096  kernel/fork.c                               |    7 +
2097  kernel/hrtimer.c                            |   95 +
2098  kernel/lockdep.c                            |    7 +-
2099  kernel/mutex.c                              |  125 ++
2100  kernel/printk.c                             |   14 +-
2101  kernel/sched.c                              |  164 +-
2102  kernel/sched_fair.c                         |    3 +
2103  kernel/sched_rt.c                           |    2 +-
2104  kernel/semaphore.c                          |   13 +-
2105  kernel/softirq.c                            |  322 ++-
2106  kernel/time/tick-sched.c                    |   47 +
2107  kernel/workqueue.c                          |   71 +-
2108  litmus/Kconfig                              |  364 ++++
2109  litmus/Makefile                             |   38 +
2110  litmus/affinity.c                           |   42 +
2111  litmus/bheap.c                              |  314 +++
2112  litmus/binheap.c                            |  443 +++++
2113  litmus/budget.c                             |  111 ++
2114  litmus/clustered.c                          |  111 ++
2115  litmus/ctrldev.c                            |  150 ++
2116  litmus/edf_common.c                         |  211 ++
2117  litmus/fdso.c                               |  306 +++
2118  litmus/ft_event.c                           |   43 +
2119  litmus/ftdev.c                              |  439 +++++
2120  litmus/gpu_affinity.c                       |  113 ++
2121  litmus/ikglp_lock.c                         | 2838 +++++++++++++++++++++++++++
2122  litmus/jobs.c                               |   56 +
2123  litmus/kexclu_affinity.c                    |   92 +
2124  litmus/kfmlp_lock.c                         | 1002 ++++++++++
2125  litmus/litmus.c                             |  684 +++++++
2126  litmus/litmus_pai_softirq.c                 |   64 +
2127  litmus/litmus_proc.c                        |  364 ++++
2128  litmus/litmus_softirq.c                     | 1582 +++++++++++++++
2129  litmus/locking.c                            |  524 +++++
2130  litmus/nvidia_info.c                        |  597 ++++++
2131  litmus/preempt.c                            |  138 ++
2132  litmus/rsm_lock.c                           |  796 ++++++++
2133  litmus/rt_domain.c                          |  357 ++++
2134  litmus/sched_cedf.c                         | 1849 +++++++++++++++++
2135  litmus/sched_gsn_edf.c                      | 1862 ++++++++++++++++++
2136  litmus/sched_litmus.c                       |  327 +++
2137  litmus/sched_pfair.c                        | 1067 ++++++++++
2138  litmus/sched_plugin.c                       |  360 ++++
2139  litmus/sched_psn_edf.c                      |  645 ++++++
2140  litmus/sched_task_trace.c                   |  509 +++++
2141  litmus/sched_trace.c                        |  252 +++
2142  litmus/sched_trace_external.c               |   64 +
2143  litmus/srp.c                                |  295 +++
2144  litmus/sync.c                               |  104 +
2145  litmus/trace.c                              |  225 +++
2146  123 files changed, 24316 insertions(+), 97 deletions(-)
2147  create mode 100644 arch/x86/include/asm/feather_trace.h
2148  create mode 100644 arch/x86/include/asm/feather_trace_32.h
2149  create mode 100644 arch/x86/include/asm/feather_trace_64.h
2150  create mode 100644 arch/x86/kernel/ft_event.c
2151  create mode 100644 include/litmus/affinity.h
2152  create mode 100644 include/litmus/bheap.h
2153  create mode 100644 include/litmus/binheap.h
2154  create mode 100644 include/litmus/budget.h
2155  create mode 100644 include/litmus/clustered.h
2156  create mode 100644 include/litmus/debug_trace.h
2157  create mode 100644 include/litmus/edf_common.h
2158  create mode 100644 include/litmus/fdso.h
2159  create mode 100644 include/litmus/feather_buffer.h
2160  create mode 100644 include/litmus/feather_trace.h
2161  create mode 100644 include/litmus/fpmath.h
2162  create mode 100644 include/litmus/ftdev.h
2163  create mode 100644 include/litmus/gpu_affinity.h
2164  create mode 100644 include/litmus/ikglp_lock.h
2165  create mode 100644 include/litmus/jobs.h
2166  create mode 100644 include/litmus/kexclu_affinity.h
2167  create mode 100644 include/litmus/kfmlp_lock.h
2168  create mode 100644 include/litmus/litmus.h
2169  create mode 100644 include/litmus/litmus_proc.h
2170  create mode 100644 include/litmus/litmus_softirq.h
2171  create mode 100644 include/litmus/locking.h
2172  create mode 100644 include/litmus/nvidia_info.h
2173  create mode 100644 include/litmus/preempt.h
2174  create mode 100644 include/litmus/rsm_lock.h
2175  create mode 100644 include/litmus/rt_domain.h
2176  create mode 100644 include/litmus/rt_param.h
2177  create mode 100644 include/litmus/sched_plugin.h
2178  create mode 100644 include/litmus/sched_trace.h
2179  create mode 100644 include/litmus/sched_trace_external.h
2180  create mode 100644 include/litmus/srp.h
2181  create mode 100644 include/litmus/trace.h
2182  create mode 100644 include/litmus/trace_irq.h
2183  create mode 100644 include/litmus/unistd_32.h
2184  create mode 100644 include/litmus/unistd_64.h
2185  create mode 100644 litmus/Kconfig
2186  create mode 100644 litmus/Makefile
2187  create mode 100644 litmus/affinity.c
2188  create mode 100644 litmus/bheap.c
2189  create mode 100644 litmus/binheap.c
2190  create mode 100644 litmus/budget.c
2191  create mode 100644 litmus/clustered.c
2192  create mode 100644 litmus/ctrldev.c
2193  create mode 100644 litmus/edf_common.c
2194  create mode 100644 litmus/fdso.c
2195  create mode 100644 litmus/ft_event.c
2196  create mode 100644 litmus/ftdev.c
2197  create mode 100644 litmus/gpu_affinity.c
2198  create mode 100644 litmus/ikglp_lock.c
2199  create mode 100644 litmus/jobs.c
2200  create mode 100644 litmus/kexclu_affinity.c
2201  create mode 100644 litmus/kfmlp_lock.c
2202  create mode 100644 litmus/litmus.c
2203  create mode 100644 litmus/litmus_pai_softirq.c
2204  create mode 100644 litmus/litmus_proc.c
2205  create mode 100644 litmus/litmus_softirq.c
2206  create mode 100644 litmus/locking.c
2207  create mode 100644 litmus/nvidia_info.c
2208  create mode 100644 litmus/preempt.c
2209  create mode 100644 litmus/rsm_lock.c
2210  create mode 100644 litmus/rt_domain.c
2211  create mode 100644 litmus/sched_cedf.c
2212  create mode 100644 litmus/sched_gsn_edf.c
2213  create mode 100644 litmus/sched_litmus.c
2214  create mode 100644 litmus/sched_pfair.c
2215  create mode 100644 litmus/sched_plugin.c
2216  create mode 100644 litmus/sched_psn_edf.c
2217  create mode 100644 litmus/sched_task_trace.c
2218  create mode 100644 litmus/sched_trace.c
2219  create mode 100644 litmus/sched_trace_external.c
2220  create mode 100644 litmus/srp.c
2221  create mode 100644 litmus/sync.c
2222  create mode 100644 litmus/trace.c
2223 
2224 diff --git a/Makefile b/Makefile
2225 index 6a5bdad..a327725 100644
2226 --- a/Makefile
2227 +++ b/Makefile
2228 @@ -1,7 +1,7 @@
2229  VERSION = 3
2230  PATCHLEVEL = 0
2231  SUBLEVEL = 0
2232 -EXTRAVERSION =
2233 +EXTRAVERSION =-litmus
2234  NAME = Sneaky Weasel
2235  
2236  # *DOCUMENTATION*
2237 @@ -708,7 +708,7 @@ export mod_strip_cmd
2238  
2239  
2240  ifeq ($(KBUILD_EXTMOD),)
2241 -core-y		+= kernel/ mm/ fs/ ipc/ security/ crypto/ block/
2242 +core-y		+= kernel/ mm/ fs/ ipc/ security/ crypto/ block/ litmus/
2243  
2244  vmlinux-dirs	:= $(patsubst %/,%,$(filter %/, $(init-y) $(init-m) \
2245  		     $(core-y) $(core-m) $(drivers-y) $(drivers-m) \
2246 diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
2247 index 9adc278..fb228ea 100644
2248 --- a/arch/arm/Kconfig
2249 +++ b/arch/arm/Kconfig
2250 @@ -2040,3 +2040,11 @@ source "security/Kconfig"
2251  source "crypto/Kconfig"
2252  
2253  source "lib/Kconfig"
2254 +
2255 +config ARCH_HAS_SEND_PULL_TIMERS
2256 +	def_bool n
2257 +
2258 +config ARCH_HAS_FEATHER_TRACE
2259 +	def_bool n
2260 +
2261 +source "litmus/Kconfig"
2262 diff --git a/arch/arm/include/asm/timex.h b/arch/arm/include/asm/timex.h
2263 index 3be8de3..8a102a3 100644
2264 --- a/arch/arm/include/asm/timex.h
2265 +++ b/arch/arm/include/asm/timex.h
2266 @@ -16,9 +16,11 @@
2267  
2268  typedef unsigned long cycles_t;
2269  
2270 +#ifndef get_cycles
2271  static inline cycles_t get_cycles (void)
2272  {
2273  	return 0;
2274  }
2275 +#endif
2276  
2277  #endif
2278 diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
2279 index 2c04ed5..0196edf 100644
2280 --- a/arch/arm/include/asm/unistd.h
2281 +++ b/arch/arm/include/asm/unistd.h
2282 @@ -403,6 +403,9 @@
2283  #define __NR_sendmmsg			(__NR_SYSCALL_BASE+374)
2284  #define __NR_setns			(__NR_SYSCALL_BASE+375)
2285  
2286 +#define __NR_LITMUS (__NR_SYSCALL_BASE+376)
2287 +#include <litmus/unistd_32.h>
2288 +
2289  /*
2290   * The following SWIs are ARM private.
2291   */
2292 diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
2293 index 80f7896..ed2ae93 100644
2294 --- a/arch/arm/kernel/calls.S
2295 +++ b/arch/arm/kernel/calls.S
2296 @@ -385,6 +385,18 @@
2297  		CALL(sys_syncfs)
2298  		CALL(sys_sendmmsg)
2299  /* 375 */	CALL(sys_setns)
2300 +		CALL(sys_set_rt_task_param)
2301 +		CALL(sys_get_rt_task_param)
2302 +		CALL(sys_complete_job)
2303 +		CALL(sys_od_open)
2304 +/* 380 */	CALL(sys_od_close)
2305 +		CALL(sys_litmus_lock)
2306 +		CALL(sys_litmus_unlock)
2307 +		CALL(sys_query_job_no)
2308 +		CALL(sys_wait_for_job_release)
2309 +/* 385 */	CALL(sys_wait_for_ts_release)
2310 +		CALL(sys_release_ts)
2311 +		CALL(sys_null_call)
2312  #ifndef syscalls_counted
2313  .equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
2314  #define syscalls_counted
2315 diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
2316 index e7f92a4..5a57429 100644
2317 --- a/arch/arm/kernel/smp.c
2318 +++ b/arch/arm/kernel/smp.c
2319 @@ -40,6 +40,8 @@
2320  #include <asm/ptrace.h>
2321  #include <asm/localtimer.h>
2322  
2323 +#include <litmus/preempt.h>
2324 +
2325  /*
2326   * as from 2.5, kernels no longer have an init_tasks structure
2327   * so we need some other way of telling a new secondary core
2328 @@ -572,6 +574,8 @@ asmlinkage void __exception_irq_entry do_IPI(int ipinr, struct pt_regs *regs)
2329  		break;
2330  
2331  	case IPI_RESCHEDULE:
2332 +		/* LITMUS^RT: take action based on scheduler state */
2333 +		sched_state_ipi();
2334  		scheduler_ipi();
2335  		break;
2336  
2337 diff --git a/arch/arm/mach-realview/include/mach/timex.h b/arch/arm/mach-realview/include/mach/timex.h
2338 index 4eeb069..e8bcc40 100644
2339 --- a/arch/arm/mach-realview/include/mach/timex.h
2340 +++ b/arch/arm/mach-realview/include/mach/timex.h
2341 @@ -21,3 +21,30 @@
2342   */
2343  
2344  #define CLOCK_TICK_RATE		(50000000 / 16)
2345 +
2346 +#if defined(CONFIG_MACH_REALVIEW_PB11MP) || defined(CONFIG_MACH_REALVIEW_PB1176)
2347 +
2348 +static inline unsigned long realview_get_arm11_cp15_ccnt(void)
2349 +{
2350 +	unsigned long cycles;
2351 +	/* Read CP15 CCNT register. */
2352 +	asm volatile ("mrc p15, 0, %0, c15, c12, 1" : "=r" (cycles));
2353 +	return cycles;
2354 +}
2355 +
2356 +#define get_cycles realview_get_arm11_cp15_ccnt
2357 +
2358 +#elif defined(CONFIG_MACH_REALVIEW_PBA8)
2359 +
2360 +
2361 +static inline unsigned long realview_get_a8_cp15_ccnt(void)
2362 +{
2363 +	unsigned long cycles;
2364 +	/* Read CP15 CCNT register. */
2365 +	asm volatile ("mrc p15, 0, %0, c9, c13, 0" : "=r" (cycles));
2366 +	return cycles;
2367 +}
2368 +
2369 +#define get_cycles realview_get_a8_cp15_ccnt
2370 +
2371 +#endif
2372 diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
2373 index 37357a5..9f5e143 100644
2374 --- a/arch/x86/Kconfig
2375 +++ b/arch/x86/Kconfig
2376 @@ -2166,3 +2166,11 @@ source "crypto/Kconfig"
2377  source "arch/x86/kvm/Kconfig"
2378  
2379  source "lib/Kconfig"
2380 +
2381 +config ARCH_HAS_FEATHER_TRACE
2382 +	def_bool y
2383 +
2384 +config ARCH_HAS_SEND_PULL_TIMERS
2385 +	def_bool y
2386 +
2387 +source "litmus/Kconfig"
2388 diff --git a/arch/x86/include/asm/entry_arch.h b/arch/x86/include/asm/entry_arch.h
2389 index 1cd6d26..3b0d7ef 100644
2390 --- a/arch/x86/include/asm/entry_arch.h
2391 +++ b/arch/x86/include/asm/entry_arch.h
2392 @@ -13,6 +13,7 @@
2393  BUILD_INTERRUPT(reschedule_interrupt,RESCHEDULE_VECTOR)
2394  BUILD_INTERRUPT(call_function_interrupt,CALL_FUNCTION_VECTOR)
2395  BUILD_INTERRUPT(call_function_single_interrupt,CALL_FUNCTION_SINGLE_VECTOR)
2396 +BUILD_INTERRUPT(pull_timers_interrupt,PULL_TIMERS_VECTOR)
2397  BUILD_INTERRUPT(irq_move_cleanup_interrupt,IRQ_MOVE_CLEANUP_VECTOR)
2398  BUILD_INTERRUPT(reboot_interrupt,REBOOT_VECTOR)
2399  
2400 diff --git a/arch/x86/include/asm/feather_trace.h b/arch/x86/include/asm/feather_trace.h
2401 new file mode 100644
2402 index 0000000..4fd3163
2403 --- /dev/null
2404 +++ b/arch/x86/include/asm/feather_trace.h
2405 @@ -0,0 +1,17 @@
2406 +#ifndef _ARCH_FEATHER_TRACE_H
2407 +#define _ARCH_FEATHER_TRACE_H
2408 +
2409 +#include <asm/msr.h>
2410 +
2411 +static inline unsigned long long ft_timestamp(void)
2412 +{
2413 +	return __native_read_tsc();
2414 +}
2415 +
2416 +#ifdef CONFIG_X86_32
2417 +#include "feather_trace_32.h"
2418 +#else
2419 +#include "feather_trace_64.h"
2420 +#endif
2421 +
2422 +#endif
2423 diff --git a/arch/x86/include/asm/feather_trace_32.h b/arch/x86/include/asm/feather_trace_32.h
2424 new file mode 100644
2425 index 0000000..70202f9
2426 --- /dev/null
2427 +++ b/arch/x86/include/asm/feather_trace_32.h
2428 @@ -0,0 +1,79 @@
2429 +/* Do not directly include this file. Include feather_trace.h instead */
2430 +
2431 +#define feather_callback __attribute__((regparm(0)))
2432 +
2433 +/*
2434 + * make the compiler reload any register that is not saved in
2435 + * a cdecl function call
2436 + */
2437 +#define CLOBBER_LIST "memory", "cc", "eax", "ecx", "edx"
2438 +
2439 +#define ft_event(id, callback)                                  \
2440 +        __asm__ __volatile__(                                   \
2441 +            "1: jmp 2f                                    \n\t" \
2442 +	    " call " #callback "                          \n\t" \
2443 +            ".section __event_table, \"aw\"               \n\t" \
2444 +            ".long " #id  ", 0, 1b, 2f                    \n\t" \
2445 +            ".previous                                    \n\t" \
2446 +            "2:                                           \n\t" \
2447 +        : : : CLOBBER_LIST)
2448 +
2449 +#define ft_event0(id, callback)                                 \
2450 +        __asm__ __volatile__(                                   \
2451 +            "1: jmp 2f                                    \n\t" \
2452 +	    " subl $4, %%esp                              \n\t" \
2453 +            " movl $" #id  ", (%%esp)                     \n\t" \
2454 +	    " call " #callback "                          \n\t" \
2455 +	    " addl $4, %%esp                              \n\t" \
2456 +            ".section __event_table, \"aw\"               \n\t" \
2457 +            ".long " #id  ", 0, 1b, 2f                    \n\t" \
2458 +            ".previous                                    \n\t" \
2459 +            "2:                                           \n\t" \
2460 +        : :  : CLOBBER_LIST)
2461 +
2462 +#define ft_event1(id, callback, param)                          \
2463 +        __asm__ __volatile__(                                   \
2464 +            "1: jmp 2f                                    \n\t" \
2465 +	    " subl $8, %%esp                              \n\t" \
2466 +	    " movl %0, 4(%%esp)                           \n\t" \
2467 +            " movl $" #id  ", (%%esp)                     \n\t" \
2468 +	    " call " #callback "                          \n\t" \
2469 +	    " addl $8, %%esp                              \n\t" \
2470 +            ".section __event_table, \"aw\"               \n\t" \
2471 +            ".long " #id  ", 0, 1b, 2f                    \n\t" \
2472 +            ".previous                                    \n\t" \
2473 +            "2:                                           \n\t" \
2474 +        : : "r" (param)  : CLOBBER_LIST)
2475 +
2476 +#define ft_event2(id, callback, param, param2)                  \
2477 +        __asm__ __volatile__(                                   \
2478 +            "1: jmp 2f                                    \n\t" \
2479 +	    " subl $12, %%esp                             \n\t" \
2480 +	    " movl %1, 8(%%esp)                           \n\t" \
2481 +	    " movl %0, 4(%%esp)                           \n\t" \
2482 +            " movl $" #id  ", (%%esp)                     \n\t" \
2483 +	    " call " #callback "                          \n\t" \
2484 +	    " addl $12, %%esp                             \n\t" \
2485 +            ".section __event_table, \"aw\"               \n\t" \
2486 +            ".long " #id  ", 0, 1b, 2f                    \n\t" \
2487 +            ".previous                                    \n\t" \
2488 +            "2:                                           \n\t" \
2489 +        : : "r" (param), "r" (param2)  : CLOBBER_LIST)
2490 +
2491 +
2492 +#define ft_event3(id, callback, p, p2, p3)                      \
2493 +        __asm__ __volatile__(                                   \
2494 +            "1: jmp 2f                                    \n\t" \
2495 +	    " subl $16, %%esp                             \n\t" \
2496 +	    " movl %2, 12(%%esp)                          \n\t" \
2497 +	    " movl %1, 8(%%esp)                           \n\t" \
2498 +	    " movl %0, 4(%%esp)                           \n\t" \
2499 +            " movl $" #id  ", (%%esp)                     \n\t" \
2500 +	    " call " #callback "                          \n\t" \
2501 +	    " addl $16, %%esp                             \n\t" \
2502 +            ".section __event_table, \"aw\"               \n\t" \
2503 +            ".long " #id  ", 0, 1b, 2f                    \n\t" \
2504 +            ".previous                                    \n\t" \
2505 +            "2:                                           \n\t" \
2506 +        : : "r" (p), "r" (p2), "r" (p3)  : CLOBBER_LIST)
2507 +
2508 diff --git a/arch/x86/include/asm/feather_trace_64.h b/arch/x86/include/asm/feather_trace_64.h
2509 new file mode 100644
2510 index 0000000..54ac2ae
2511 --- /dev/null
2512 +++ b/arch/x86/include/asm/feather_trace_64.h
2513 @@ -0,0 +1,67 @@
2514 +/* Do not directly include this file. Include feather_trace.h instead */
2515 +
2516 +/* regparm is the default on x86_64 */
2517 +#define feather_callback
2518 +
2519 +# define _EVENT_TABLE(id,from,to) \
2520 +            ".section __event_table, \"aw\"\n\t" \
2521 +	    ".balign 8\n\t" \
2522 +            ".quad " #id  ", 0, " #from ", " #to " \n\t" \
2523 +            ".previous \n\t"
2524 +
2525 +/*
2526 + * x86_64 callee only owns rbp, rbx, r12 -> r15
2527 + * the called can freely modify the others
2528 + */
2529 +#define CLOBBER_LIST	"memory", "cc", "rdi", "rsi", "rdx", "rcx", \
2530 +			"r8", "r9", "r10", "r11", "rax"
2531 +
2532 +#define ft_event(id, callback)                                  \
2533 +        __asm__ __volatile__(                                   \
2534 +            "1: jmp 2f                                    \n\t" \
2535 +	    " call " #callback "                          \n\t" \
2536 +            _EVENT_TABLE(id,1b,2f) \
2537 +            "2:                                           \n\t" \
2538 +        : : : CLOBBER_LIST)
2539 +
2540 +#define ft_event0(id, callback)                                 \
2541 +        __asm__ __volatile__(                                   \
2542 +            "1: jmp 2f                                    \n\t" \
2543 +	    " movq $" #id ", %%rdi			  \n\t" \
2544 +	    " call " #callback "                          \n\t" \
2545 +	    _EVENT_TABLE(id,1b,2f) \
2546 +            "2:                                           \n\t" \
2547 +        : :  : CLOBBER_LIST)
2548 +
2549 +#define ft_event1(id, callback, param)                          \
2550 +	__asm__ __volatile__(                                   \
2551 +	    "1: jmp 2f                                    \n\t" \
2552 +	    " movq %0, %%rsi				  \n\t"	\
2553 +	    " movq $" #id ", %%rdi			  \n\t" \
2554 +	    " call " #callback "                          \n\t" \
2555 +	    _EVENT_TABLE(id,1b,2f) \
2556 +	    "2:                                           \n\t" \
2557 +	: : "r" (param)  : CLOBBER_LIST)
2558 +
2559 +#define ft_event2(id, callback, param, param2)                  \
2560 +        __asm__ __volatile__(                                   \
2561 +            "1: jmp 2f                                    \n\t" \
2562 +	    " movq %1, %%rdx				  \n\t"	\
2563 +	    " movq %0, %%rsi				  \n\t"	\
2564 +	    " movq $" #id ", %%rdi			  \n\t" \
2565 +	    " call " #callback "                          \n\t" \
2566 +            _EVENT_TABLE(id,1b,2f) \
2567 +            "2:                                           \n\t" \
2568 +        : : "r" (param), "r" (param2)  : CLOBBER_LIST)
2569 +
2570 +#define ft_event3(id, callback, p, p2, p3)                      \
2571 +        __asm__ __volatile__(                                   \
2572 +            "1: jmp 2f                                    \n\t" \
2573 +	    " movq %2, %%rcx				  \n\t"	\
2574 +	    " movq %1, %%rdx				  \n\t"	\
2575 +	    " movq %0, %%rsi				  \n\t"	\
2576 +	    " movq $" #id ", %%rdi			  \n\t" \
2577 +	    " call " #callback "                          \n\t" \
2578 +            _EVENT_TABLE(id,1b,2f) \
2579 +            "2:                                           \n\t" \
2580 +        : : "r" (p), "r" (p2), "r" (p3)  : CLOBBER_LIST)
2581 diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h
2582 index bb9efe8..c490d89 100644
2583 --- a/arch/x86/include/asm/hw_irq.h
2584 +++ b/arch/x86/include/asm/hw_irq.h
2585 @@ -77,6 +77,8 @@ extern void threshold_interrupt(void);
2586  extern void call_function_interrupt(void);
2587  extern void call_function_single_interrupt(void);
2588  
2589 +extern void pull_timers_interrupt(void);
2590 +
2591  /* IOAPIC */
2592  #define IO_APIC_IRQ(x) (((x) >= NR_IRQS_LEGACY) || ((1<<(x)) & io_apic_irqs))
2593  extern unsigned long io_apic_irqs;
2594 @@ -155,6 +157,7 @@ extern asmlinkage void smp_irq_move_cleanup_interrupt(void);
2595  extern void smp_reschedule_interrupt(struct pt_regs *);
2596  extern void smp_call_function_interrupt(struct pt_regs *);
2597  extern void smp_call_function_single_interrupt(struct pt_regs *);
2598 +extern void smp_pull_timers_interrupt(struct pt_regs *);
2599  #ifdef CONFIG_X86_32
2600  extern void smp_invalidate_interrupt(struct pt_regs *);
2601  #else
2602 diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h
2603 index 6e976ee..99a44cf 100644
2604 --- a/arch/x86/include/asm/irq_vectors.h
2605 +++ b/arch/x86/include/asm/irq_vectors.h
2606 @@ -135,6 +135,13 @@
2607  #define INVALIDATE_TLB_VECTOR_START	\
2608  	(INVALIDATE_TLB_VECTOR_END-NUM_INVALIDATE_TLB_VECTORS+1)
2609  
2610 +/*
2611 + * LITMUS^RT pull timers IRQ vector
2612 + * Make sure it's below the above max 32 vectors.
2613 + */
2614 +#define PULL_TIMERS_VECTOR		0xce
2615 +
2616 +
2617  #define NR_VECTORS			 256
2618  
2619  #define FPU_IRQ				  13
2620 diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
2621 index 2193715..b844edc 100644
2622 --- a/arch/x86/include/asm/processor.h
2623 +++ b/arch/x86/include/asm/processor.h
2624 @@ -166,6 +166,10 @@ extern void print_cpu_info(struct cpuinfo_x86 *);
2625  extern void init_scattered_cpuid_features(struct cpuinfo_x86 *c);
2626  extern unsigned int init_intel_cacheinfo(struct cpuinfo_x86 *c);
2627  extern unsigned short num_cache_leaves;
2628 +#ifdef CONFIG_SYSFS
2629 +extern int get_shared_cpu_map(cpumask_var_t mask,
2630 +			       unsigned int cpu, int index);
2631 +#endif
2632  
2633  extern void detect_extended_topology(struct cpuinfo_x86 *c);
2634  extern void detect_ht(struct cpuinfo_x86 *c);
2635 diff --git a/arch/x86/include/asm/unistd_32.h b/arch/x86/include/asm/unistd_32.h
2636 index 593485b..2f6e127 100644
2637 --- a/arch/x86/include/asm/unistd_32.h
2638 +++ b/arch/x86/include/asm/unistd_32.h
2639 @@ -353,9 +353,13 @@
2640  #define __NR_sendmmsg		345
2641  #define __NR_setns		346
2642  
2643 +#define __NR_LITMUS		347
2644 +
2645 +#include "litmus/unistd_32.h"
2646 +
2647  #ifdef __KERNEL__
2648  
2649 -#define NR_syscalls 347
2650 +#define NR_syscalls 347 + NR_litmus_syscalls
2651  
2652  #define __ARCH_WANT_IPC_PARSE_VERSION
2653  #define __ARCH_WANT_OLD_READDIR
2654 diff --git a/arch/x86/include/asm/unistd_64.h b/arch/x86/include/asm/unistd_64.h
2655 index 705bf13..e347f07 100644
2656 --- a/arch/x86/include/asm/unistd_64.h
2657 +++ b/arch/x86/include/asm/unistd_64.h
2658 @@ -682,6 +682,10 @@ __SYSCALL(__NR_sendmmsg, sys_sendmmsg)
2659  #define __NR_setns				308
2660  __SYSCALL(__NR_setns, sys_setns)
2661  
2662 +#define __NR_LITMUS				309
2663 +
2664 +#include "litmus/unistd_64.h"
2665 +
2666  #ifndef __NO_STUBS
2667  #define __ARCH_WANT_OLD_READDIR
2668  #define __ARCH_WANT_OLD_STAT
2669 diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
2670 index 90b06d4..d727f8f 100644
2671 --- a/arch/x86/kernel/Makefile
2672 +++ b/arch/x86/kernel/Makefile
2673 @@ -116,6 +116,8 @@ obj-$(CONFIG_X86_CHECK_BIOS_CORRUPTION) += check.o
2674  obj-$(CONFIG_SWIOTLB)			+= pci-swiotlb.o
2675  obj-$(CONFIG_OF)			+= devicetree.o
2676  
2677 +obj-$(CONFIG_FEATHER_TRACE)	+= ft_event.o
2678 +
2679  ###
2680  # 64 bit specific files
2681  ifeq ($(CONFIG_X86_64),y)
2682 diff --git a/arch/x86/kernel/cpu/intel_cacheinfo.c b/arch/x86/kernel/cpu/intel_cacheinfo.c
2683 index c105c53..0bf1264 100644
2684 --- a/arch/x86/kernel/cpu/intel_cacheinfo.c
2685 +++ b/arch/x86/kernel/cpu/intel_cacheinfo.c
2686 @@ -747,6 +747,23 @@ unsigned int __cpuinit init_intel_cacheinfo(struct cpuinfo_x86 *c)
2687  static DEFINE_PER_CPU(struct _cpuid4_info *, ici_cpuid4_info);
2688  #define CPUID4_INFO_IDX(x, y)	(&((per_cpu(ici_cpuid4_info, x))[y]))
2689  
2690 +/* returns CPUs that share the index cache with cpu */
2691 +int get_shared_cpu_map(cpumask_var_t mask, unsigned int cpu, int index)
2692 +{
2693 +	int ret = 0;
2694 +	struct _cpuid4_info *this_leaf;
2695 +
2696 +	if (index >= num_cache_leaves) {
2697 +		index = num_cache_leaves - 1;
2698 +		ret = index;
2699 +	}
2700 +
2701 +	this_leaf = CPUID4_INFO_IDX(cpu,index);
2702 +	cpumask_copy(mask, to_cpumask(this_leaf->shared_cpu_map));
2703 +
2704 +	return ret;
2705 +}
2706 +
2707  #ifdef CONFIG_SMP
2708  static void __cpuinit cache_shared_cpu_map_setup(unsigned int cpu, int index)
2709  {
2710 diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
2711 index 8a445a0..47a4bcd 100644
2712 --- a/arch/x86/kernel/entry_64.S
2713 +++ b/arch/x86/kernel/entry_64.S
2714 @@ -1003,6 +1003,8 @@ apicinterrupt CALL_FUNCTION_VECTOR \
2715  	call_function_interrupt smp_call_function_interrupt
2716  apicinterrupt RESCHEDULE_VECTOR \
2717  	reschedule_interrupt smp_reschedule_interrupt
2718 +apicinterrupt PULL_TIMERS_VECTOR \
2719 +	pull_timers_interrupt smp_pull_timers_interrupt
2720  #endif
2721  
2722  apicinterrupt ERROR_APIC_VECTOR \
2723 diff --git a/arch/x86/kernel/ft_event.c b/arch/x86/kernel/ft_event.c
2724 new file mode 100644
2725 index 0000000..37cc332
2726 --- /dev/null
2727 +++ b/arch/x86/kernel/ft_event.c
2728 @@ -0,0 +1,118 @@
2729 +#include <linux/types.h>
2730 +
2731 +#include <litmus/feather_trace.h>
2732 +
2733 +/* the feather trace management functions assume
2734 + * exclusive access to the event table
2735 + */
2736 +
2737 +#ifndef CONFIG_DEBUG_RODATA
2738 +
2739 +#define BYTE_JUMP      0xeb
2740 +#define BYTE_JUMP_LEN  0x02
2741 +
2742 +/* for each event, there is an entry in the event table */
2743 +struct trace_event {
2744 +	long 	id;
2745 +	long	count;
2746 +	long	start_addr;
2747 +	long	end_addr;
2748 +};
2749 +
2750 +extern struct trace_event  __start___event_table[];
2751 +extern struct trace_event  __stop___event_table[];
2752 +
2753 +/* Workaround: if no events are defined, then the event_table section does not
2754 + * exist and the above references cause linker errors. This could probably be
2755 + * fixed by adjusting the linker script, but it is easier to maintain for us if
2756 + * we simply create a dummy symbol in the event table section.
2757 + */
2758 +int __event_table_dummy[0] __attribute__ ((section("__event_table")));
2759 +
2760 +int ft_enable_event(unsigned long id)
2761 +{
2762 +	struct trace_event* te = __start___event_table;
2763 +	int count = 0;
2764 +	char* delta;
2765 +	unsigned char* instr;
2766 +
2767 +	while (te < __stop___event_table) {
2768 +		if (te->id == id && ++te->count == 1) {
2769 +			instr  = (unsigned char*) te->start_addr;
2770 +			/* make sure we don't clobber something wrong */
2771 +			if (*instr == BYTE_JUMP) {
2772 +				delta  = (((unsigned char*) te->start_addr) + 1);
2773 +				*delta = 0;
2774 +			}
2775 +		}
2776 +		if (te->id == id)
2777 +			count++;
2778 +		te++;
2779 +	}
2780 +
2781 +	printk(KERN_DEBUG "ft_enable_event: enabled %d events\n", count);
2782 +	return count;
2783 +}
2784 +
2785 +int ft_disable_event(unsigned long id)
2786 +{
2787 +	struct trace_event* te = __start___event_table;
2788 +	int count = 0;
2789 +	char* delta;
2790 +	unsigned char* instr;
2791 +
2792 +	while (te < __stop___event_table) {
2793 +		if (te->id == id && --te->count == 0) {
2794 +			instr  = (unsigned char*) te->start_addr;
2795 +			if (*instr == BYTE_JUMP) {
2796 +				delta  = (((unsigned char*) te->start_addr) + 1);
2797 +				*delta = te->end_addr - te->start_addr -
2798 +					BYTE_JUMP_LEN;
2799 +			}
2800 +		}
2801 +		if (te->id == id)
2802 +			count++;
2803 +		te++;
2804 +	}
2805 +
2806 +	printk(KERN_DEBUG "ft_disable_event: disabled %d events\n", count);
2807 +	return count;
2808 +}
2809 +
2810 +int ft_disable_all_events(void)
2811 +{
2812 +	struct trace_event* te = __start___event_table;
2813 +	int count = 0;
2814 +	char* delta;
2815 +	unsigned char* instr;
2816 +
2817 +	while (te < __stop___event_table) {
2818 +		if (te->count) {
2819 +			instr  = (unsigned char*) te->start_addr;
2820 +			if (*instr == BYTE_JUMP) {
2821 +				delta  = (((unsigned char*) te->start_addr)
2822 +					  + 1);
2823 +				*delta = te->end_addr - te->start_addr -
2824 +					BYTE_JUMP_LEN;
2825 +				te->count = 0;
2826 +				count++;
2827 +			}
2828 +		}
2829 +		te++;
2830 +	}
2831 +	return count;
2832 +}
2833 +
2834 +int ft_is_event_enabled(unsigned long id)
2835 +{
2836 +	struct trace_event* te = __start___event_table;
2837 +
2838 +	while (te < __stop___event_table) {
2839 +		if (te->id == id)
2840 +			return te->count;
2841 +		te++;
2842 +	}
2843 +	return 0;
2844 +}
2845 +
2846 +#endif
2847 diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
2848 index 6c0802e..680a5cb 100644
2849 --- a/arch/x86/kernel/irq.c
2850 +++ b/arch/x86/kernel/irq.c
2851 @@ -10,6 +10,10 @@
2852  #include <linux/ftrace.h>
2853  #include <linux/delay.h>
2854  
2855 +#ifdef CONFIG_LITMUS_NVIDIA
2856 +#include <litmus/sched_trace.h>
2857 +#endif
2858 +
2859  #include <asm/apic.h>
2860  #include <asm/io_apic.h>
2861  #include <asm/irq.h>
2862 diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
2863 index f470e4e..48acf71 100644
2864 --- a/arch/x86/kernel/irqinit.c
2865 +++ b/arch/x86/kernel/irqinit.c
2866 @@ -252,6 +252,9 @@ static void __init smp_intr_init(void)
2867  	alloc_intr_gate(CALL_FUNCTION_SINGLE_VECTOR,
2868  			call_function_single_interrupt);
2869  
2870 +	/* IPI for hrtimer pulling on remote cpus */
2871 +	alloc_intr_gate(PULL_TIMERS_VECTOR, pull_timers_interrupt);
2872 +
2873  	/* Low priority IPI to cleanup after moving an irq */
2874  	set_intr_gate(IRQ_MOVE_CLEANUP_VECTOR, irq_move_cleanup_interrupt);
2875  	set_bit(IRQ_MOVE_CLEANUP_VECTOR, used_vectors);
2876 diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
2877 index 013e7eb..ed4c4f5 100644
2878 --- a/arch/x86/kernel/smp.c
2879 +++ b/arch/x86/kernel/smp.c
2880 @@ -23,6 +23,10 @@
2881  #include <linux/cpu.h>
2882  #include <linux/gfp.h>
2883  
2884 +#include <litmus/preempt.h>
2885 +#include <litmus/debug_trace.h>
2886 +#include <litmus/trace.h>
2887 +
2888  #include <asm/mtrr.h>
2889  #include <asm/tlbflush.h>
2890  #include <asm/mmu_context.h>
2891 @@ -118,6 +122,7 @@ static void native_smp_send_reschedule(int cpu)
2892  		WARN_ON(1);
2893  		return;
2894  	}
2895 +	TS_SEND_RESCHED_START(cpu);
2896  	apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
2897  }
2898  
2899 @@ -147,6 +152,16 @@ void native_send_call_func_ipi(const struct cpumask *mask)
2900  	free_cpumask_var(allbutself);
2901  }
2902  
2903 +/* trigger timers on remote cpu */
2904 +void smp_send_pull_timers(int cpu)
2905 +{
2906 +	if (unlikely(cpu_is_offline(cpu))) {
2907 +		WARN_ON(1);
2908 +		return;
2909 +	}
2910 +	apic->send_IPI_mask(cpumask_of(cpu), PULL_TIMERS_VECTOR);
2911 +}
2912 +
2913  /*
2914   * this function calls the 'stop' function on all other CPUs in the system.
2915   */
2916 @@ -199,8 +214,15 @@ static void native_stop_other_cpus(int wait)
2917  void smp_reschedule_interrupt(struct pt_regs *regs)
2918  {
2919  	ack_APIC_irq();
2920 +	/* LITMUS^RT: this IPI might need to trigger the sched state machine. */
2921 +	sched_state_ipi();
2922  	inc_irq_stat(irq_resched_count);
2923 +	/*
2924 +	 * LITMUS^RT: starting from 3.0 schedule_ipi() actually does something.
2925 +	 * This may increase IPI latencies compared with previous versions.
2926 +	 */
2927  	scheduler_ipi();
2928 +	TS_SEND_RESCHED_END;
2929  	/*
2930  	 * KVM uses this interrupt to force a cpu out of guest mode
2931  	 */
2932 @@ -224,6 +246,15 @@ void smp_call_function_single_interrupt(struct pt_regs *regs)
2933  	irq_exit();
2934  }
2935  
2936 +extern void hrtimer_pull(void);
2937 +
2938 +void smp_pull_timers_interrupt(struct pt_regs *regs)
2939 +{
2940 +	ack_APIC_irq();
2941 +	TRACE("pull timer interrupt\n");
2942 +	hrtimer_pull();
2943 +}
2944 +
2945  struct smp_ops smp_ops = {
2946  	.smp_prepare_boot_cpu	= native_smp_prepare_boot_cpu,
2947  	.smp_prepare_cpus	= native_smp_prepare_cpus,
2948 diff --git a/arch/x86/kernel/syscall_table_32.S b/arch/x86/kernel/syscall_table_32.S
2949 index fbb0a04..0cb4373 100644
2950 --- a/arch/x86/kernel/syscall_table_32.S
2951 +++ b/arch/x86/kernel/syscall_table_32.S
2952 @@ -346,3 +346,16 @@ ENTRY(sys_call_table)
2953  	.long sys_syncfs
2954  	.long sys_sendmmsg		/* 345 */
2955  	.long sys_setns
2956 +	.long sys_set_rt_task_param	/* LITMUS^RT 347 */
2957 +	.long sys_get_rt_task_param
2958 +	.long sys_complete_job
2959 +	.long sys_od_open
2960 +	.long sys_od_close
2961 +	.long sys_litmus_lock		/* +5 */
2962 +	.long sys_litmus_unlock
2963 +	.long sys_query_job_no
2964 +	.long sys_wait_for_job_release
2965 +	.long sys_wait_for_ts_release
2966 +	.long sys_release_ts		/* +10 */
2967 +	.long sys_null_call
2968 +	.long sys_register_nv_device
2969 diff --git a/fs/exec.c b/fs/exec.c
2970 index 6075a1e..9984562 100644
2971 --- a/fs/exec.c
2972 +++ b/fs/exec.c
2973 @@ -19,7 +19,7 @@
2974   * current->executable is only used by the procfs.  This allows a dispatch
2975   * table to check for several different types  of binary formats.  We keep
2976   * trying until we recognize the file or we run out of supported binary
2977 - * formats. 
2978 + * formats.
2979   */
2980  
2981  #include <linux/slab.h>
2982 @@ -56,6 +56,8 @@
2983  #include <linux/oom.h>
2984  #include <linux/compat.h>
2985  
2986 +#include <litmus/litmus.h>
2987 +
2988  #include <asm/uaccess.h>
2989  #include <asm/mmu_context.h>
2990  #include <asm/tlb.h>
2991 @@ -85,7 +87,7 @@ int __register_binfmt(struct linux_binfmt * fmt, int insert)
2992  	insert ? list_add(&fmt->lh, &formats) :
2993  		 list_add_tail(&fmt->lh, &formats);
2994  	write_unlock(&binfmt_lock);
2995 -	return 0;	
2996 +	return 0;
2997  }
2998  
2999  EXPORT_SYMBOL(__register_binfmt);
3000 @@ -1160,7 +1162,7 @@ void setup_new_exec(struct linux_binprm * bprm)
3001  	   group */
3002  
3003  	current->self_exec_id++;
3004 -			
3005 +
3006  	flush_signal_handlers(current, 0);
3007  	flush_old_files(current->files);
3008  }
3009 @@ -1250,8 +1252,8 @@ int check_unsafe_exec(struct linux_binprm *bprm)
3010  	return res;
3011  }
3012  
3013 -/* 
3014 - * Fill the binprm structure from the inode. 
3015 +/*
3016 + * Fill the binprm structure from the inode.
3017   * Check permissions, then read the first 128 (BINPRM_BUF_SIZE) bytes
3018   *
3019   * This may be called multiple times for binary chains (scripts for example).
3020 @@ -1459,6 +1461,7 @@ static int do_execve_common(const char *filename,
3021  		goto out_unmark;
3022  
3023  	sched_exec();
3024 +	litmus_exec();
3025  
3026  	bprm->file = file;
3027  	bprm->filename = filename;
3028 diff --git a/fs/inode.c b/fs/inode.c
3029 index 43566d1..dbf0e76 100644
3030 --- a/fs/inode.c
3031 +++ b/fs/inode.c
3032 @@ -308,6 +308,8 @@ void inode_init_once(struct inode *inode)
3033  #ifdef CONFIG_FSNOTIFY
3034  	INIT_HLIST_HEAD(&inode->i_fsnotify_marks);
3035  #endif
3036 +	INIT_LIST_HEAD(&inode->i_obj_list);
3037 +	mutex_init(&inode->i_obj_mutex);
3038  }
3039  EXPORT_SYMBOL(inode_init_once);
3040  
3041 diff --git a/include/linux/completion.h b/include/linux/completion.h
3042 index 51494e6..cff405c 100644
3043 --- a/include/linux/completion.h
3044 +++ b/include/linux/completion.h
3045 @@ -76,6 +76,7 @@ static inline void init_completion(struct completion *x)
3046  	init_waitqueue_head(&x->wait);
3047  }
3048  
3049 +extern void __wait_for_completion_locked(struct completion *);
3050  extern void wait_for_completion(struct completion *);
3051  extern int wait_for_completion_interruptible(struct completion *x);
3052  extern int wait_for_completion_killable(struct completion *x);
3053 @@ -90,6 +91,7 @@ extern bool completion_done(struct completion *x);
3054  
3055  extern void complete(struct completion *);
3056  extern void complete_all(struct completion *);
3057 +extern void complete_n(struct completion *, int n);
3058  
3059  /**
3060   * INIT_COMPLETION - reinitialize a completion structure
3061 diff --git a/include/linux/fs.h b/include/linux/fs.h
3062 index b5b9792..8d5834b 100644
3063 --- a/include/linux/fs.h
3064 +++ b/include/linux/fs.h
3065 @@ -17,8 +17,8 @@
3066   * nr_file rlimit, so it's safe to set up a ridiculously high absolute
3067   * upper limit on files-per-process.
3068   *
3069 - * Some programs (notably those using select()) may have to be 
3070 - * recompiled to take full advantage of the new limits..  
3071 + * Some programs (notably those using select()) may have to be
3072 + * recompiled to take full advantage of the new limits..
3073   */
3074  
3075  /* Fixed constants first: */
3076 @@ -172,7 +172,7 @@ struct inodes_stat_t {
3077  #define SEL_EX		4
3078  
3079  /* public flags for file_system_type */
3080 -#define FS_REQUIRES_DEV 1 
3081 +#define FS_REQUIRES_DEV 1
3082  #define FS_BINARY_MOUNTDATA 2
3083  #define FS_HAS_SUBTYPE 4
3084  #define FS_REVAL_DOT	16384	/* Check the paths ".", ".." for staleness */
3085 @@ -480,7 +480,7 @@ struct iattr {
3086   */
3087  #include <linux/quota.h>
3088  
3089 -/** 
3090 +/**
3091   * enum positive_aop_returns - aop return codes with specific semantics
3092   *
3093   * @AOP_WRITEPAGE_ACTIVATE: Informs the caller that page writeback has
3094 @@ -490,7 +490,7 @@ struct iattr {
3095   * 			    be a candidate for writeback again in the near
3096   * 			    future.  Other callers must be careful to unlock
3097   * 			    the page if they get this return.  Returned by
3098 - * 			    writepage(); 
3099 + * 			    writepage();
3100   *
3101   * @AOP_TRUNCATED_PAGE: The AOP method that was handed a locked page has
3102   *  			unlocked it and the page might have been truncated.
3103 @@ -734,6 +734,7 @@ static inline int mapping_writably_mapped(struct address_space *mapping)
3104  
3105  struct posix_acl;
3106  #define ACL_NOT_CACHED ((void *)(-1))
3107 +struct inode_obj_id_table;
3108  
3109  struct inode {
3110  	/* RCU path lookup touches following: */
3111 @@ -807,6 +808,8 @@ struct inode {
3112  	struct posix_acl	*i_acl;
3113  	struct posix_acl	*i_default_acl;
3114  #endif
3115 +	struct list_head	i_obj_list;
3116 +	struct mutex		i_obj_mutex;
3117  	void			*i_private; /* fs or device private pointer */
3118  };
3119  
3120 @@ -1032,10 +1035,10 @@ static inline int file_check_writeable(struct file *filp)
3121  
3122  #define	MAX_NON_LFS	((1UL<<31) - 1)
3123  
3124 -/* Page cache limit. The filesystems should put that into their s_maxbytes 
3125 -   limits, otherwise bad things can happen in VM. */ 
3126 +/* Page cache limit. The filesystems should put that into their s_maxbytes
3127 +   limits, otherwise bad things can happen in VM. */
3128  #if BITS_PER_LONG==32
3129 -#define MAX_LFS_FILESIZE	(((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) 
3130 +#define MAX_LFS_FILESIZE	(((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
3131  #elif BITS_PER_LONG==64
3132  #define MAX_LFS_FILESIZE 	0x7fffffffffffffffUL
3133  #endif
3134 @@ -2234,7 +2237,7 @@ extern void free_write_pipe(struct file *);
3135  
3136  extern int kernel_read(struct file *, loff_t, char *, unsigned long);
3137  extern struct file * open_exec(const char *);
3138 - 
3139 +
3140  /* fs/dcache.c -- generic fs support functions */
3141  extern int is_subdir(struct dentry *, struct dentry *);
3142  extern int path_is_under(struct path *, struct path *);
3143 diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
3144 index ba36217..e6dd5a4 100644
3145 --- a/include/linux/hardirq.h
3146 +++ b/include/linux/hardirq.h
3147 @@ -6,6 +6,8 @@
3148  #include <linux/ftrace_irq.h>
3149  #include <asm/hardirq.h>
3150  
3151 +#include <litmus/trace_irq.h>
3152 +
3153  /*
3154   * We put the hardirq and softirq counter into the preemption
3155   * counter. The bitmask has the following meaning:
3156 @@ -186,6 +188,7 @@ extern void rcu_nmi_exit(void);
3157  		account_system_vtime(current);		\
3158  		add_preempt_count(HARDIRQ_OFFSET);	\
3159  		trace_hardirq_enter();			\
3160 +		ft_irq_fired();				\
3161  	} while (0)
3162  
3163  /*
3164 @@ -216,6 +219,7 @@ extern void irq_exit(void);
3165  		lockdep_off();					\
3166  		rcu_nmi_enter();				\
3167  		trace_hardirq_enter();				\
3168 +		ft_irq_fired();					\
3169  	} while (0)
3170  
3171  #define nmi_exit()						\
3172 diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
3173 index fd0dc30..d91bba5 100644
3174 --- a/include/linux/hrtimer.h
3175 +++ b/include/linux/hrtimer.h
3176 @@ -174,6 +174,7 @@ enum  hrtimer_base_type {
3177   * @nr_hangs:		Total number of hrtimer interrupt hangs
3178   * @max_hang_time:	Maximum time spent in hrtimer_interrupt
3179   * @clock_base:		array of clock bases for this cpu
3180 + * @to_pull:		LITMUS^RT list of timers to be pulled on this cpu
3181   */
3182  struct hrtimer_cpu_base {
3183  	raw_spinlock_t			lock;
3184 @@ -188,8 +189,32 @@ struct hrtimer_cpu_base {
3185  	ktime_t				max_hang_time;
3186  #endif
3187  	struct hrtimer_clock_base	clock_base[HRTIMER_MAX_CLOCK_BASES];
3188 +	struct list_head		to_pull;
3189  };
3190  
3191 +#ifdef CONFIG_ARCH_HAS_SEND_PULL_TIMERS
3192 +
3193 +#define HRTIMER_START_ON_INACTIVE	0
3194 +#define HRTIMER_START_ON_QUEUED		1
3195 +
3196 +/*
3197 + * struct hrtimer_start_on_info - save timer info on remote cpu
3198 + * @list:	list of hrtimer_start_on_info on remote cpu (to_pull)
3199 + * @timer:	timer to be triggered on remote cpu
3200 + * @time:	time event
3201 + * @mode:	timer mode
3202 + * @state:	activity flag
3203 + */
3204 +struct hrtimer_start_on_info {
3205 +	struct list_head	list;
3206 +	struct hrtimer		*timer;
3207 +	ktime_t			time;
3208 +	enum hrtimer_mode	mode;
3209 +	atomic_t		state;
3210 +};
3211 +
3212 +#endif
3213 +
3214  static inline void hrtimer_set_expires(struct hrtimer *timer, ktime_t time)
3215  {
3216  	timer->node.expires = time;
3217 @@ -355,6 +380,13 @@ __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
3218  			 unsigned long delta_ns,
3219  			 const enum hrtimer_mode mode, int wakeup);
3220  
3221 +#ifdef CONFIG_ARCH_HAS_SEND_PULL_TIMERS
3222 +extern void hrtimer_start_on_info_init(struct hrtimer_start_on_info *info);
3223 +extern int hrtimer_start_on(int cpu, struct hrtimer_start_on_info *info,
3224 +			struct hrtimer *timer, ktime_t time,
3225 +			const enum hrtimer_mode mode);
3226 +#endif
3227 +
3228  extern int hrtimer_cancel(struct hrtimer *timer);
3229  extern int hrtimer_try_to_cancel(struct hrtimer *timer);
3230  
3231 diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
3232 index f6efed0..8fb3dad 100644
3233 --- a/include/linux/interrupt.h
3234 +++ b/include/linux/interrupt.h
3235 @@ -445,6 +445,7 @@ static inline void __raise_softirq_irqoff(unsigned int nr)
3236  
3237  extern void raise_softirq_irqoff(unsigned int nr);
3238  extern void raise_softirq(unsigned int nr);
3239 +extern void wakeup_softirqd(void);
3240  
3241  /* This is the worklist that queues up per-cpu softirq work.
3242   *
3243 @@ -500,6 +501,10 @@ struct tasklet_struct
3244  	atomic_t count;
3245  	void (*func)(unsigned long);
3246  	unsigned long data;
3247 +
3248 +#if defined(CONFIG_LITMUS_SOFTIRQD) || defined(CONFIG_LITMUS_PAI_SOFTIRQD)
3249 +	struct task_struct *owner;
3250 +#endif
3251  };
3252  
3253  #define DECLARE_TASKLET(name, func, data) \
3254 @@ -523,7 +528,7 @@ static inline int tasklet_trylock(struct tasklet_struct *t)
3255  
3256  static inline void tasklet_unlock(struct tasklet_struct *t)
3257  {
3258 -	smp_mb__before_clear_bit(); 
3259 +	smp_mb__before_clear_bit();
3260  	clear_bit(TASKLET_STATE_RUN, &(t)->state);
3261  }
3262  
3263 @@ -537,6 +542,7 @@ static inline void tasklet_unlock_wait(struct tasklet_struct *t)
3264  #define tasklet_unlock(t) do { } while (0)
3265  #endif
3266  
3267 +extern void ___tasklet_schedule(struct tasklet_struct *t);
3268  extern void __tasklet_schedule(struct tasklet_struct *t);
3269  
3270  static inline void tasklet_schedule(struct tasklet_struct *t)
3271 @@ -545,6 +551,7 @@ static inline void tasklet_schedule(struct tasklet_struct *t)
3272  		__tasklet_schedule(t);
3273  }
3274  
3275 +extern void ___tasklet_hi_schedule(struct tasklet_struct *t);
3276  extern void __tasklet_hi_schedule(struct tasklet_struct *t);
3277  
3278  static inline void tasklet_hi_schedule(struct tasklet_struct *t)
3279 @@ -553,6 +560,7 @@ static inline void tasklet_hi_schedule(struct tasklet_struct *t)
3280  		__tasklet_hi_schedule(t);
3281  }
3282  
3283 +extern void ___tasklet_hi_schedule_first(struct tasklet_struct *t);
3284  extern void __tasklet_hi_schedule_first(struct tasklet_struct *t);
3285  
3286  /*
3287 @@ -651,7 +659,7 @@ void tasklet_hrtimer_cancel(struct tasklet_hrtimer *ttimer)
3288   * if more than one irq occurred.
3289   */
3290  
3291 -#if defined(CONFIG_GENERIC_HARDIRQS) && !defined(CONFIG_GENERIC_IRQ_PROBE) 
3292 +#if defined(CONFIG_GENERIC_HARDIRQS) && !defined(CONFIG_GENERIC_IRQ_PROBE)
3293  static inline unsigned long probe_irq_on(void)
3294  {
3295  	return 0;
3296 diff --git a/include/linux/mutex.h b/include/linux/mutex.h
3297 index a940fe4..cb47deb 100644
3298 --- a/include/linux/mutex.h
3299 +++ b/include/linux/mutex.h
3300 @@ -126,6 +126,15 @@ static inline int mutex_is_locked(struct mutex *lock)
3301  	return atomic_read(&lock->count) != 1;
3302  }
3303  
3304 +/* return non-zero to abort.  only pre-side-effects may abort */
3305 +typedef int (*side_effect_t)(unsigned long);
3306 +extern void mutex_lock_sfx(struct mutex *lock,
3307 +						   side_effect_t pre, unsigned long pre_arg,
3308 +						   side_effect_t post, unsigned long post_arg);
3309 +extern void mutex_unlock_sfx(struct mutex *lock,
3310 +							 side_effect_t pre, unsigned long pre_arg,
3311 +							 side_effect_t post, unsigned long post_arg);
3312 +
3313  /*
3314   * See kernel/mutex.c for detailed documentation of these APIs.
3315   * Also see Documentation/mutex-design.txt.
3316 @@ -153,6 +162,7 @@ extern void mutex_lock(struct mutex *lock);
3317  extern int __must_check mutex_lock_interruptible(struct mutex *lock);
3318  extern int __must_check mutex_lock_killable(struct mutex *lock);
3319  
3320 +
3321  # define mutex_lock_nested(lock, subclass) mutex_lock(lock)
3322  # define mutex_lock_interruptible_nested(lock, subclass) mutex_lock_interruptible(lock)
3323  # define mutex_lock_killable_nested(lock, subclass) mutex_lock_killable(lock)
3324 diff --git a/include/linux/sched.h b/include/linux/sched.h
3325 index 14a6c7b..9c990d1 100644
3326 --- a/include/linux/sched.h
3327 +++ b/include/linux/sched.h
3328 @@ -39,6 +39,7 @@
3329  #define SCHED_BATCH		3
3330  /* SCHED_ISO: reserved but not implemented yet */
3331  #define SCHED_IDLE		5
3332 +#define SCHED_LITMUS		6
3333  /* Can be ORed in to make sure the process is reverted back to SCHED_NORMAL on fork */
3334  #define SCHED_RESET_ON_FORK     0x40000000
3335  
3336 @@ -93,6 +94,9 @@ struct sched_param {
3337  
3338  #include <asm/processor.h>
3339  
3340 +#include <litmus/rt_param.h>
3341 +#include <litmus/preempt.h>
3342 +
3343  struct exec_domain;
3344  struct futex_pi_state;
3345  struct robust_list_head;
3346 @@ -1209,6 +1213,7 @@ struct sched_rt_entity {
3347  };
3348  
3349  struct rcu_node;
3350 +struct od_table_entry;
3351  
3352  enum perf_event_task_context {
3353  	perf_invalid_context = -1,
3354 @@ -1313,9 +1318,9 @@ struct task_struct {
3355  	unsigned long stack_canary;
3356  #endif
3357  
3358 -	/* 
3359 +	/*
3360  	 * pointers to (original) parent process, youngest child, younger sibling,
3361 -	 * older sibling, respectively.  (p->father can be replaced with 
3362 +	 * older sibling, respectively.  (p->father can be replaced with
3363  	 * p->real_parent->pid)
3364  	 */
3365  	struct task_struct *real_parent; /* real parent process */
3366 @@ -1526,6 +1531,13 @@ struct task_struct {
3367  	int make_it_fail;
3368  #endif
3369  	struct prop_local_single dirties;
3370 +
3371 +	/* LITMUS RT parameters and state */
3372 +	struct rt_param rt_param;
3373 +
3374 +	/* references to PI semaphores, etc. */
3375 +	struct od_table_entry *od_table;
3376 +
3377  #ifdef CONFIG_LATENCYTOP
3378  	int latency_record_count;
3379  	struct latency_record latency_record[LT_SAVECOUNT];
3380 @@ -2136,7 +2148,7 @@ static inline int dequeue_signal_lock(struct task_struct *tsk, sigset_t *mask, s
3381  	spin_unlock_irqrestore(&tsk->sighand->siglock, flags);
3382  
3383  	return ret;
3384 -}	
3385 +}
3386  
3387  extern void block_all_signals(int (*notifier)(void *priv), void *priv,
3388  			      sigset_t *mask);
3389 @@ -2446,6 +2458,7 @@ static inline int test_tsk_thread_flag(struct task_struct *tsk, int flag)
3390  static inline void set_tsk_need_resched(struct task_struct *tsk)
3391  {
3392  	set_tsk_thread_flag(tsk,TIF_NEED_RESCHED);
3393 +	sched_state_will_schedule(tsk);
3394  }
3395  
3396  static inline void clear_tsk_need_resched(struct task_struct *tsk)
3397 diff --git a/include/linux/semaphore.h b/include/linux/semaphore.h
3398 index 39fa049..c83fc2b 100644
3399 --- a/include/linux/semaphore.h
3400 +++ b/include/linux/semaphore.h
3401 @@ -43,4 +43,13 @@ extern int __must_check down_trylock(struct semaphore *sem);
3402  extern int __must_check down_timeout(struct semaphore *sem, long jiffies);
3403  extern void up(struct semaphore *sem);
3404  
3405 +extern void __down(struct semaphore *sem);
3406 +extern void __up(struct semaphore *sem);
3407 +
3408 +struct semaphore_waiter {
3409 +	struct list_head list;
3410 +	struct task_struct *task;
3411 +	int up;
3412 +};
3413 +
3414  #endif /* __LINUX_SEMAPHORE_H */
3415 diff --git a/include/linux/smp.h b/include/linux/smp.h
3416 index 8cc38d3..53b1bee 100644
3417 --- a/include/linux/smp.h
3418 +++ b/include/linux/smp.h
3419 @@ -82,6 +82,11 @@ int smp_call_function_any(const struct cpumask *mask,
3420  			  smp_call_func_t func, void *info, int wait);
3421  
3422  /*
3423 + * sends a 'pull timer' event to a remote CPU
3424 + */
3425 +extern void smp_send_pull_timers(int cpu);
3426 +
3427 +/*
3428   * Generic and arch helpers
3429   */
3430  #ifdef CONFIG_USE_GENERIC_SMP_HELPERS
3431 diff --git a/include/linux/tick.h b/include/linux/tick.h
3432 index b232ccc..1e29bd5 100644
3433 --- a/include/linux/tick.h
3434 +++ b/include/linux/tick.h
3435 @@ -74,6 +74,11 @@ extern int tick_is_oneshot_available(void);
3436  extern struct tick_device *tick_get_device(int cpu);
3437  
3438  # ifdef CONFIG_HIGH_RES_TIMERS
3439 +/* LITMUS^RT tick alignment */
3440 +#define LINUX_DEFAULT_TICKS	0
3441 +#define LITMUS_ALIGNED_TICKS	1
3442 +#define	LITMUS_STAGGERED_TICKS	2
3443 +
3444  extern int tick_init_highres(void);
3445  extern int tick_program_event(ktime_t expires, int force);
3446  extern void tick_setup_sched_timer(void);
3447 diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
3448 index f584aba..1ec2ec7 100644
3449 --- a/include/linux/workqueue.h
3450 +++ b/include/linux/workqueue.h
3451 @@ -83,6 +83,9 @@ struct work_struct {
3452  #ifdef CONFIG_LOCKDEP
3453  	struct lockdep_map lockdep_map;
3454  #endif
3455 +#ifdef CONFIG_LITMUS_SOFTIRQD
3456 +	struct task_struct *owner;
3457 +#endif
3458  };
3459  
3460  #define WORK_DATA_INIT()	ATOMIC_LONG_INIT(WORK_STRUCT_NO_CPU)
3461 @@ -115,11 +118,25 @@ struct execute_work {
3462  #define __WORK_INIT_LOCKDEP_MAP(n, k)
3463  #endif
3464  
3465 +#ifdef CONFIG_LITMUS_SOFTIRQD
3466 +#define __WORK_INIT_OWNER() \
3467 +	.owner = NULL,
3468 +
3469 +#define PREPARE_OWNER(_work, _owner) \
3470 +	do { \
3471 +		(_work)->owner = (_owner); \
3472 +	} while(0)
3473 +#else
3474 +#define __WORK_INIT_OWNER()
3475 +#define PREPARE_OWNER(_work, _owner)
3476 +#endif
3477 +
3478  #define __WORK_INITIALIZER(n, f) {				\
3479  	.data = WORK_DATA_STATIC_INIT(),			\
3480  	.entry	= { &(n).entry, &(n).entry },			\
3481  	.func = (f),						\
3482  	__WORK_INIT_LOCKDEP_MAP(#n, &(n))			\
3483 +	__WORK_INIT_OWNER() \
3484  	}
3485  
3486  #define __DELAYED_WORK_INITIALIZER(n, f) {			\
3487 @@ -357,6 +374,7 @@ extern int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
3488  extern void flush_workqueue(struct workqueue_struct *wq);
3489  extern void flush_scheduled_work(void);
3490  
3491 +extern int __schedule_work(struct work_struct *work);
3492  extern int schedule_work(struct work_struct *work);
3493  extern int schedule_work_on(int cpu, struct work_struct *work);
3494  extern int schedule_delayed_work(struct delayed_work *work, unsigned long delay);
3495 diff --git a/include/litmus/affinity.h b/include/litmus/affinity.h
3496 new file mode 100644
3497 index 0000000..ca2e442
3498 --- /dev/null
3499 +++ b/include/litmus/affinity.h
3500 @@ -0,0 +1,80 @@
3501 +#ifndef __LITMUS_AFFINITY_H
3502 +#define __LITMUS_AFFINITY_H
3503 +
3504 +#include <linux/cpumask.h>
3505 +
3506 +/*
3507 +  L1 (instr) = depth 0
3508 +  L1 (data)  = depth 1
3509 +  L2 = depth 2
3510 +  L3 = depth 3
3511 + */
3512 +#define NUM_CACHE_LEVELS 4
3513 +
3514 +struct neighborhood
3515 +{
3516 +	unsigned int size[NUM_CACHE_LEVELS];
3517 +	cpumask_var_t neighbors[NUM_CACHE_LEVELS];
3518 +};
3519 +
3520 +/* topology info is stored redundently in a big array for fast lookups */
3521 +extern struct neighborhood neigh_info[NR_CPUS];
3522 +
3523 +void init_topology(void); /* called by Litmus module's _init_litmus() */
3524 +
3525 +/* Works like:
3526 +void get_nearest_available_cpu(
3527 +	cpu_entry_t **nearest,
3528 +	cpu_entry_t *start,
3529 +	cpu_entry_t *entries,
3530 +	int release_master)
3531 +
3532 +Set release_master = NO_CPU for no Release Master.
3533 +
3534 +We use a macro here to exploit the fact that C-EDF and G-EDF
3535 +have similar structures for their cpu_entry_t structs, even though
3536 +they do not share a common base-struct.  The macro allows us to
3537 +avoid code duplication.
3538 +
3539 +TODO: Factor out the job-to-processor linking from C/G-EDF into
3540 +a reusable "processor mapping".  (See B.B.'s RTSS'09 paper &
3541 +dissertation.)
3542 + */
3543 +#define get_nearest_available_cpu(nearest, start, entries, release_master) \
3544 +{ \
3545 +	(nearest) = NULL; \
3546 +	if (!(start)->linked) { \
3547 +		(nearest) = (start); \
3548 +	} else { \
3549 +		int __level; \
3550 +		int __cpu; \
3551 +		int __release_master = ((release_master) == NO_CPU) ? -1 : (release_master); \
3552 +		struct neighborhood *__neighbors = &neigh_info[(start)->cpu]; \
3553 +		\
3554 +		for (__level = 0; (__level < NUM_CACHE_LEVELS) && !(nearest); ++__level) { \
3555 +			if (__neighbors->size[__level] > 1) { \
3556 +				for_each_cpu(__cpu, __neighbors->neighbors[__level]) { \
3557 +					if (__cpu != __release_master) { \
3558 +						cpu_entry_t *__entry = &per_cpu((entries), __cpu); \
3559 +						if (!__entry->linked) { \
3560 +							(nearest) = __entry; \
3561 +							break; \
3562 +						} \
3563 +					} \
3564 +				} \
3565 +			} else if (__neighbors->size[__level] == 0) { \
3566 +				break; \
3567 +			} \
3568 +		} \
3569 +	} \
3570 +	\
3571 +	if ((nearest)) { \
3572 +		TRACE("P%d is closest available CPU to P%d\n", \
3573 +				(nearest)->cpu, (start)->cpu); \
3574 +	} else { \
3575 +		TRACE("Could not find an available CPU close to P%d\n", \
3576 +				(start)->cpu); \
3577 +	} \
3578 +}
3579 +
3580 +#endif
3581 diff --git a/include/litmus/bheap.h b/include/litmus/bheap.h
3582 new file mode 100644
3583 index 0000000..cf4864a
3584 --- /dev/null
3585 +++ b/include/litmus/bheap.h
3586 @@ -0,0 +1,77 @@
3587 +/* bheaps.h -- Binomial Heaps
3588 + *
3589 + * (c) 2008, 2009 Bjoern Brandenburg
3590 + */
3591 +
3592 +#ifndef BHEAP_H
3593 +#define BHEAP_H
3594 +
3595 +#define NOT_IN_HEAP UINT_MAX
3596 +
3597 +struct bheap_node {
3598 +	struct bheap_node* 	parent;
3599 +	struct bheap_node* 	next;
3600 +	struct bheap_node* 	child;
3601 +
3602 +	unsigned int 		degree;
3603 +	void*			value;
3604 +	struct bheap_node**	ref;
3605 +};
3606 +
3607 +struct bheap {
3608 +	struct bheap_node* 	head;
3609 +	/* We cache the minimum of the heap.
3610 +	 * This speeds up repeated peek operations.
3611 +	 */
3612 +	struct bheap_node*	min;
3613 +};
3614 +
3615 +typedef int (*bheap_prio_t)(struct bheap_node* a, struct bheap_node* b);
3616 +
3617 +void bheap_init(struct bheap* heap);
3618 +void bheap_node_init(struct bheap_node** ref_to_bheap_node_ptr, void* value);
3619 +
3620 +static inline int bheap_node_in_heap(struct bheap_node* h)
3621 +{
3622 +	return h->degree != NOT_IN_HEAP;
3623 +}
3624 +
3625 +static inline int bheap_empty(struct bheap* heap)
3626 +{
3627 +	return heap->head == NULL && heap->min == NULL;
3628 +}
3629 +
3630 +/* insert (and reinitialize) a node into the heap */
3631 +void bheap_insert(bheap_prio_t higher_prio,
3632 +		 struct bheap* heap,
3633 +		 struct bheap_node* node);
3634 +
3635 +/* merge addition into target */
3636 +void bheap_union(bheap_prio_t higher_prio,
3637 +		struct bheap* target,
3638 +		struct bheap* addition);
3639 +
3640 +struct bheap_node* bheap_peek(bheap_prio_t higher_prio,
3641 +			    struct bheap* heap);
3642 +
3643 +struct bheap_node* bheap_take(bheap_prio_t higher_prio,
3644 +			    struct bheap* heap);
3645 +
3646 +void bheap_uncache_min(bheap_prio_t higher_prio, struct bheap* heap);
3647 +int  bheap_decrease(bheap_prio_t higher_prio, struct bheap_node* node);
3648 +
3649 +void bheap_delete(bheap_prio_t higher_prio,
3650 +		 struct bheap* heap,
3651 +		 struct bheap_node* node);
3652 +
3653 +/* allocate from memcache */
3654 +struct bheap_node* bheap_node_alloc(int gfp_flags);
3655 +void bheap_node_free(struct bheap_node* hn);
3656 +
3657 +/* allocate a heap node for value and insert into the heap */
3658 +int bheap_add(bheap_prio_t higher_prio, struct bheap* heap,
3659 +	     void* value, int gfp_flags);
3660 +
3661 +void* bheap_take_del(bheap_prio_t higher_prio,
3662 +		    struct bheap* heap);
3663 +#endif
3664 diff --git a/include/litmus/binheap.h b/include/litmus/binheap.h
3665 new file mode 100644
3666 index 0000000..9e966e3
3667 --- /dev/null
3668 +++ b/include/litmus/binheap.h
3669 @@ -0,0 +1,207 @@
3670 +#ifndef LITMUS_BINARY_HEAP_H
3671 +#define LITMUS_BINARY_HEAP_H
3672 +
3673 +#include <linux/kernel.h>
3674 +
3675 +/**
3676 + * Simple binary heap with add, arbitrary delete, delete_root, and top
3677 + * operations.
3678 + *
3679 + * Style meant to conform with list.h.
3680 + *
3681 + * Motivation: Linux's prio_heap.h is of fixed size. Litmus's binomial
3682 + * heap may be overkill (and perhaps not general enough) for some applications.
3683 + *
3684 + * Note: In order to make node swaps fast, a node inserted with a data pointer
3685 + * may not always hold said data pointer. This is similar to the binomial heap
3686 + * implementation. This does make node deletion tricky since we have to
3687 + * (1) locate the node that holds the data pointer to delete, and (2) the
3688 + * node that was originally inserted with said data pointer. These have to be
3689 + * coalesced into a single node before removal (see usage of
3690 + * __binheap_safe_swap()). We have to track node references to accomplish this.
3691 + */
3692 +
3693 +struct binheap_node {
3694 +	void	*data;
3695 +	struct binheap_node *parent;
3696 +	struct binheap_node *left;
3697 +	struct binheap_node *right;
3698 +
3699 +	/* pointer to binheap_node that holds *data for which this binheap_node
3700 +	 * was originally inserted.  (*data "owns" this node)
3701 +	 */
3702 +	struct binheap_node *ref;
3703 +	struct binheap_node **ref_ptr;
3704 +};
3705 +
3706 +/**
3707 + * Signature of compator function.  Assumed 'less-than' (min-heap).
3708 + * Pass in 'greater-than' for max-heap.
3709 + *
3710 + * TODO: Consider macro-based implementation that allows comparator to be
3711 + * inlined (similar to Linux red/black tree) for greater efficiency.
3712 + */
3713 +typedef int (*binheap_order_t)(struct binheap_node *a,
3714 +							   struct binheap_node *b);
3715 +
3716 +
3717 +struct binheap_handle {
3718 +	struct binheap_node *root;
3719 +
3720 +	/* pointer to node to take next inserted child */
3721 +	struct binheap_node *next;
3722 +
3723 +	/* pointer to last node in complete binary tree */
3724 +	struct binheap_node *last;
3725 +
3726 +	/* comparator function pointer */
3727 +	binheap_order_t compare;
3728 +};
3729 +
3730 +
3731 +#define BINHEAP_POISON	((void*)(0xdeadbeef))
3732 +
3733 +
3734 +/**
3735 + * binheap_entry - get the struct for this heap node.
3736 + *  Only valid when called upon heap nodes other than the root handle.
3737 + * @ptr:	the heap node.
3738 + * @type:	the type of struct pointed to by binheap_node::data.
3739 + * @member:	unused.
3740 + */
3741 +#define binheap_entry(ptr, type, member) \
3742 +((type *)((ptr)->data))
3743 +
3744 +/**
3745 + * binheap_node_container - get the struct that contains this node.
3746 + *  Only valid when called upon heap nodes other than the root handle.
3747 + * @ptr:	the heap node.
3748 + * @type:	the type of struct the node is embedded in.
3749 + * @member:	the name of the binheap_struct within the (type) struct.
3750 + */
3751 +#define binheap_node_container(ptr, type, member) \
3752 +container_of((ptr), type, member)
3753 +
3754 +/**
3755 + * binheap_top_entry - get the struct for the node at the top of the heap.
3756 + *  Only valid when called upon the heap handle node.
3757 + * @ptr:    the special heap-handle node.
3758 + * @type:   the type of the struct the head is embedded in.
3759 + * @member:	the name of the binheap_struct within the (type) struct.
3760 + */
3761 +#define binheap_top_entry(ptr, type, member) \
3762 +binheap_entry((ptr)->root, type, member)
3763 +
3764 +/**
3765 + * binheap_delete_root - remove the root element from the heap.
3766 + * @handle:	 handle to the heap.
3767 + * @type:    the type of the struct the head is embedded in.
3768 + * @member:	 the name of the binheap_struct within the (type) struct.
3769 + */
3770 +#define binheap_delete_root(handle, type, member) \
3771 +__binheap_delete_root((handle), &((type *)((handle)->root->data))->member)
3772 +
3773 +/**
3774 + * binheap_delete - remove an arbitrary element from the heap.
3775 + * @to_delete:  pointer to node to be removed.
3776 + * @handle:	 handle to the heap.
3777 + */
3778 +#define binheap_delete(to_delete, handle) \
3779 +__binheap_delete((to_delete), (handle))
3780 +
3781 +/**
3782 + * binheap_add - insert an element to the heap
3783 + * new_node: node to add.
3784 + * @handle:	 handle to the heap.
3785 + * @type:    the type of the struct the head is embedded in.
3786 + * @member:	 the name of the binheap_struct within the (type) struct.
3787 + */
3788 +#define binheap_add(new_node, handle, type, member) \
3789 +__binheap_add((new_node), (handle), container_of((new_node), type, member))
3790 +
3791 +/**
3792 + * binheap_decrease - re-eval the position of a node (based upon its
3793 + * original data pointer).
3794 + * @handle: handle to the heap.
3795 + * @orig_node: node that was associated with the data pointer
3796 + *             (whose value has changed) when said pointer was
3797 + *             added to the heap.
3798 + */
3799 +#define binheap_decrease(orig_node, handle) \
3800 +__binheap_decrease((orig_node), (handle))
3801 +
3802 +#define BINHEAP_NODE_INIT() { NULL, BINHEAP_POISON, NULL, NULL , NULL, NULL}
3803 +
3804 +#define BINHEAP_NODE(name) \
3805 +	struct binheap_node name = BINHEAP_NODE_INIT()
3806 +
3807 +
3808 +static inline void INIT_BINHEAP_NODE(struct binheap_node *n)
3809 +{
3810 +	n->data = NULL;
3811 +	n->parent = BINHEAP_POISON;
3812 +	n->left = NULL;
3813 +	n->right = NULL;
3814 +	n->ref = NULL;
3815 +	n->ref_ptr = NULL;
3816 +}
3817 +
3818 +static inline void INIT_BINHEAP_HANDLE(
3819 +	struct binheap_handle *handle,
3820 +	binheap_order_t compare)
3821 +{
3822 +	handle->root = NULL;
3823 +	handle->next = NULL;
3824 +	handle->last = NULL;
3825 +	handle->compare = compare;
3826 +}
3827 +
3828 +/* Returns true (1) if binheap is empty. */
3829 +static inline int binheap_empty(struct binheap_handle *handle)
3830 +{
3831 +	return(handle->root == NULL);
3832 +}
3833 +
3834 +/* Returns true (1) if binheap node is in a heap. */
3835 +static inline int binheap_is_in_heap(struct binheap_node *node)
3836 +{
3837 +	return (node->parent != BINHEAP_POISON);
3838 +}
3839 +
3840 +
3841 +int binheap_is_in_this_heap(struct binheap_node *node, struct binheap_handle* heap);
3842 +
3843 +
3844 +
3845 +void __binheap_add(struct binheap_node *new_node,
3846 +	struct binheap_handle *handle,
3847 +	void *data);
3848 +
3849 +
3850 +/**
3851 + * Removes the root node from the heap. The node is removed after coalescing
3852 + * the binheap_node with its original data pointer at the root of the tree.
3853 + *
3854 + * The 'last' node in the tree is then swapped up to the root and bubbled
3855 + * down.
3856 + */
3857 +void __binheap_delete_root(struct binheap_handle *handle,
3858 +	struct binheap_node *container);
3859 +
3860 +/**
3861 + * Delete an arbitrary node.  Bubble node to delete up to the root,
3862 + * and then delete to root.
3863 + */
3864 +void __binheap_delete(
3865 +	struct binheap_node *node_to_delete,
3866 +	struct binheap_handle *handle);
3867 +
3868 +/**
3869 + * Bubble up a node whose pointer has decreased in value.
3870 + */
3871 +void __binheap_decrease(struct binheap_node *orig_node,
3872 +						struct binheap_handle *handle);
3873 +
3874 +
3875 +#endif
3876 +
3877 diff --git a/include/litmus/budget.h b/include/litmus/budget.h
3878 new file mode 100644
3879 index 0000000..732530e
3880 --- /dev/null
3881 +++ b/include/litmus/budget.h
3882 @@ -0,0 +1,8 @@
3883 +#ifndef _LITMUS_BUDGET_H_
3884 +#define _LITMUS_BUDGET_H_
3885 +
3886 +/* Update the per-processor enforcement timer (arm/reproram/cancel) for
3887 + * the next task. */
3888 +void update_enforcement_timer(struct task_struct* t);
3889 +
3890 +#endif
3891 diff --git a/include/litmus/clustered.h b/include/litmus/clustered.h
3892 new file mode 100644
3893 index 0000000..0c18dcb
3894 --- /dev/null
3895 +++ b/include/litmus/clustered.h
3896 @@ -0,0 +1,44 @@
3897 +#ifndef CLUSTERED_H
3898 +#define CLUSTERED_H
3899 +
3900 +/* Which cache level should be used to group CPUs into clusters?
3901 + * GLOBAL_CLUSTER means that all CPUs form a single cluster (just like under
3902 + * global scheduling).
3903 + */
3904 +enum cache_level {
3905 +	GLOBAL_CLUSTER = 0,
3906 +	L1_CLUSTER     = 1,
3907 +	L2_CLUSTER     = 2,
3908 +	L3_CLUSTER     = 3
3909 +};
3910 +
3911 +int parse_cache_level(const char *str, enum cache_level *level);
3912 +const char* cache_level_name(enum cache_level level);
3913 +
3914 +/* expose a cache level in a /proc dir */
3915 +struct proc_dir_entry* create_cluster_file(struct proc_dir_entry* parent,
3916 +					   enum cache_level* level);
3917 +
3918 +
3919 +
3920 +struct scheduling_cluster {
3921 +	unsigned int id;
3922 +	/* list of CPUs that are part of this cluster */
3923 +	struct list_head cpus;
3924 +};
3925 +
3926 +struct cluster_cpu {
3927 +	unsigned int id; /* which CPU is this? */
3928 +	struct list_head cluster_list; /* List of the CPUs in this cluster. */
3929 +	struct scheduling_cluster* cluster; /* The cluster that this CPU belongs to. */
3930 +};
3931 +
3932 +int get_cluster_size(enum cache_level level);
3933 +
3934 +int assign_cpus_to_clusters(enum cache_level level,
3935 +			    struct scheduling_cluster* clusters[],
3936 +			    unsigned int num_clusters,
3937 +			    struct cluster_cpu* cpus[],
3938 +			    unsigned int num_cpus);
3939 +
3940 +#endif
3941 diff --git a/include/litmus/debug_trace.h b/include/litmus/debug_trace.h
3942 new file mode 100644
3943 index 0000000..48d086d
3944 --- /dev/null
3945 +++ b/include/litmus/debug_trace.h
3946 @@ -0,0 +1,37 @@
3947 +#ifndef LITMUS_DEBUG_TRACE_H
3948 +#define LITMUS_DEBUG_TRACE_H
3949 +
3950 +#ifdef CONFIG_SCHED_DEBUG_TRACE
3951 +void sched_trace_log_message(const char* fmt, ...);
3952 +void dump_trace_buffer(int max);
3953 +#else
3954 +
3955 +#define sched_trace_log_message(fmt, ...)
3956 +
3957 +#endif
3958 +
3959 +extern atomic_t __log_seq_no;
3960 +
3961 +#ifdef CONFIG_SCHED_DEBUG_TRACE_CALLER
3962 +#define TRACE_PREFIX "%d P%d [%s@%s:%d]: "
3963 +#define TRACE_ARGS  atomic_add_return(1, &__log_seq_no),	\
3964 +		raw_smp_processor_id(),				\
3965 +		__FUNCTION__, __FILE__, __LINE__
3966 +#else
3967 +#define TRACE_PREFIX "%d P%d: "
3968 +#define TRACE_ARGS  atomic_add_return(1, &__log_seq_no), \
3969 +		raw_smp_processor_id()
3970 +#endif
3971 +
3972 +#define TRACE(fmt, args...)						\
3973 +	sched_trace_log_message(TRACE_PREFIX fmt,			\
3974 +				TRACE_ARGS,  ## args)
3975 +
3976 +#define TRACE_TASK(t, fmt, args...)			\
3977 +	TRACE("(%s/%d:%d) " fmt, (t)->comm, (t)->pid,	\
3978 +	      (t)->rt_param.job_params.job_no,  ##args)
3979 +
3980 +#define TRACE_CUR(fmt, args...) \
3981 +	TRACE_TASK(current, fmt, ## args)
3982 +
3983 +#endif
3984 diff --git a/include/litmus/edf_common.h b/include/litmus/edf_common.h
3985 new file mode 100644
3986 index 0000000..63dff7e
3987 --- /dev/null
3988 +++ b/include/litmus/edf_common.h
3989 @@ -0,0 +1,37 @@
3990 +/*
3991 + * EDF common data structures and utility functions shared by all EDF
3992 + * based scheduler plugins
3993 + */
3994 +
3995 +/* CLEANUP: Add comments and make it less messy.
3996 + *
3997 + */
3998 +
3999 +#ifndef __UNC_EDF_COMMON_H__
4000 +#define __UNC_EDF_COMMON_H__
4001 +
4002 +#include <litmus/rt_domain.h>
4003 +
4004 +void edf_domain_init(rt_domain_t* rt, check_resched_needed_t resched,
4005 +		     release_jobs_t release);
4006 +
4007 +int edf_higher_prio(struct task_struct* first,
4008 +		    struct task_struct* second);
4009 +
4010 +int edf_ready_order(struct bheap_node* a, struct bheap_node* b);
4011 +
4012 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
4013 +/* binheap_nodes must be embedded within 'struct litmus_lock' */
4014 +int edf_max_heap_order(struct binheap_node *a, struct binheap_node *b);
4015 +int edf_min_heap_order(struct binheap_node *a, struct binheap_node *b);
4016 +int edf_max_heap_base_priority_order(struct binheap_node *a, struct binheap_node *b);
4017 +int edf_min_heap_base_priority_order(struct binheap_node *a, struct binheap_node *b);
4018 +
4019 +int __edf_higher_prio(struct task_struct* first, comparison_mode_t first_mode,
4020 +					  struct task_struct* second, comparison_mode_t second_mode);
4021 +
4022 +#endif
4023 +
4024 +int edf_preemption_needed(rt_domain_t* rt, struct task_struct *t);
4025 +
4026 +#endif
4027 diff --git a/include/litmus/fdso.h b/include/litmus/fdso.h
4028 new file mode 100644
4029 index 0000000..1f5d3bd
4030 --- /dev/null
4031 +++ b/include/litmus/fdso.h
4032 @@ -0,0 +1,83 @@
4033 +/* fdso.h - file descriptor attached shared objects
4034 + *
4035 + * (c) 2007 B. Brandenburg, LITMUS^RT project
4036 + */
4037 +
4038 +#ifndef _LINUX_FDSO_H_
4039 +#define _LINUX_FDSO_H_
4040 +
4041 +#include <linux/list.h>
4042 +#include <asm/atomic.h>
4043 +
4044 +#include <linux/fs.h>
4045 +#include <linux/slab.h>
4046 +
4047 +#define MAX_OBJECT_DESCRIPTORS 32
4048 +
4049 +typedef enum  {
4050 +	MIN_OBJ_TYPE 	= 0,
4051 +
4052 +	FMLP_SEM	= 0,
4053 +	SRP_SEM		= 1,
4054 +
4055 +	RSM_MUTEX	= 2,
4056 +	IKGLP_SEM	= 3,
4057 +	KFMLP_SEM	= 4,
4058 +
4059 +	IKGLP_SIMPLE_GPU_AFF_OBS = 5,
4060 +	IKGLP_GPU_AFF_OBS = 6,
4061 +	KFMLP_SIMPLE_GPU_AFF_OBS = 7,
4062 +	KFMLP_GPU_AFF_OBS = 8,
4063 +
4064 +	MAX_OBJ_TYPE	= 8
4065 +} obj_type_t;
4066 +
4067 +struct inode_obj_id {
4068 +	struct list_head	list;
4069 +	atomic_t		count;
4070 +	struct inode*		inode;
4071 +
4072 +	obj_type_t 		type;
4073 +	void*			obj;
4074 +	unsigned int		id;
4075 +};
4076 +
4077 +struct fdso_ops;
4078 +
4079 +struct od_table_entry {
4080 +	unsigned int		used;
4081 +
4082 +	struct inode_obj_id*	obj;
4083 +	const struct fdso_ops*	class;
4084 +};
4085 +
4086 +struct fdso_ops {
4087 +	int   (*create)(void** obj_ref, obj_type_t type, void* __user);
4088 +	void  (*destroy)(obj_type_t type, void*);
4089 +	int   (*open)	(struct od_table_entry*, void* __user);
4090 +	int   (*close)	(struct od_table_entry*);
4091 +};
4092 +
4093 +/* translate a userspace supplied od into the raw table entry
4094 + * returns NULL if od is invalid
4095 + */
4096 +struct od_table_entry* get_entry_for_od(int od);
4097 +
4098 +/* translate a userspace supplied od into the associated object
4099 + * returns NULL if od is invalid
4100 + */
4101 +static inline void* od_lookup(int od, obj_type_t type)
4102 +{
4103 +	struct od_table_entry* e = get_entry_for_od(od);
4104 +	return e && e->obj->type == type ? e->obj->obj : NULL;
4105 +}
4106 +
4107 +#define lookup_fmlp_sem(od)((struct pi_semaphore*)  od_lookup(od, FMLP_SEM))
4108 +#define lookup_kfmlp_sem(od)((struct pi_semaphore*)  od_lookup(od, KFMLP_SEM))
4109 +#define lookup_srp_sem(od) ((struct srp_semaphore*) od_lookup(od, SRP_SEM))
4110 +#define lookup_ics(od)     ((struct ics*)           od_lookup(od, ICS_ID))
4111 +
4112 +#define lookup_rsm_mutex(od)((struct litmus_lock*)  od_lookup(od, FMLP_SEM))
4113 +
4114 +
4115 +#endif
4116 diff --git a/include/litmus/feather_buffer.h b/include/litmus/feather_buffer.h
4117 new file mode 100644
4118 index 0000000..6c18277
4119 --- /dev/null
4120 +++ b/include/litmus/feather_buffer.h
4121 @@ -0,0 +1,94 @@
4122 +#ifndef _FEATHER_BUFFER_H_
4123 +#define _FEATHER_BUFFER_H_
4124 +
4125 +/* requires UINT_MAX and memcpy */
4126 +
4127 +#define SLOT_FREE	0
4128 +#define	SLOT_BUSY 	1
4129 +#define	SLOT_READY	2
4130 +
4131 +struct ft_buffer {
4132 +	unsigned int	slot_count;
4133 +	unsigned int	slot_size;
4134 +
4135 +	int 		free_count;
4136 +	unsigned int 	write_idx;
4137 +	unsigned int 	read_idx;
4138 +
4139 +	char*		slots;
4140 +	void*		buffer_mem;
4141 +	unsigned int	failed_writes;
4142 +};
4143 +
4144 +static inline int init_ft_buffer(struct ft_buffer*	buf,
4145 +				 unsigned int 		slot_count,
4146 +				 unsigned int 		slot_size,
4147 +				 char*			slots,
4148 +				 void* 			buffer_mem)
4149 +{
4150 +	int i = 0;
4151 +	if (!slot_count || UINT_MAX % slot_count != slot_count - 1) {
4152 +		/* The slot count must divide UNIT_MAX + 1 so that when it
4153 +		 * wraps around the index correctly points to 0.
4154 +		 */
4155 +		return 0;
4156 +	} else {
4157 +		buf->slot_count    = slot_count;
4158 +		buf->slot_size     = slot_size;
4159 +		buf->slots         = slots;
4160 +		buf->buffer_mem    = buffer_mem;
4161 +		buf->free_count    = slot_count;
4162 +		buf->write_idx     = 0;
4163 +		buf->read_idx      = 0;
4164 +		buf->failed_writes = 0;
4165 +		for (i = 0; i < slot_count; i++)
4166 +			buf->slots[i] = SLOT_FREE;
4167 +		return 1;
4168 +	}
4169 +}
4170 +
4171 +static inline int ft_buffer_start_write(struct ft_buffer* buf, void **ptr)
4172 +{
4173 +	int free = fetch_and_dec(&buf->free_count);
4174 +	unsigned int idx;
4175 +	if (free <= 0) {
4176 +		fetch_and_inc(&buf->free_count);
4177 +		*ptr = 0;
4178 +		fetch_and_inc(&buf->failed_writes);
4179 +		return 0;
4180 +	} else {
4181 +		idx  = fetch_and_inc((int*) &buf->write_idx) % buf->slot_count;
4182 +		buf->slots[idx] = SLOT_BUSY;
4183 +		*ptr = ((char*) buf->buffer_mem) + idx * buf->slot_size;
4184 +		return 1;
4185 +	}
4186 +}
4187 +
4188 +static inline void ft_buffer_finish_write(struct ft_buffer* buf, void *ptr)
4189 +{
4190 +	unsigned int idx = ((char*) ptr - (char*) buf->buffer_mem) / buf->slot_size;
4191 +	buf->slots[idx]  = SLOT_READY;
4192 +}
4193 +
4194 +
4195 +/* exclusive reader access is assumed */
4196 +static inline int ft_buffer_read(struct ft_buffer* buf, void* dest)
4197 +{
4198 +	unsigned int idx;
4199 +	if (buf->free_count == buf->slot_count)
4200 +		/* nothing available */
4201 +		return 0;
4202 +	idx = buf->read_idx % buf->slot_count;
4203 +	if (buf->slots[idx] == SLOT_READY) {
4204 +		memcpy(dest, ((char*) buf->buffer_mem) + idx * buf->slot_size,
4205 +		       buf->slot_size);
4206 +		buf->slots[idx] = SLOT_FREE;
4207 +		buf->read_idx++;
4208 +		fetch_and_inc(&buf->free_count);
4209 +		return 1;
4210 +	} else
4211 +		return 0;
4212 +}
4213 +
4214 +
4215 +#endif
4216 diff --git a/include/litmus/feather_trace.h b/include/litmus/feather_trace.h
4217 new file mode 100644
4218 index 0000000..028dfb2
4219 --- /dev/null
4220 +++ b/include/litmus/feather_trace.h
4221 @@ -0,0 +1,65 @@
4222 +#ifndef _FEATHER_TRACE_H_
4223 +#define _FEATHER_TRACE_H_
4224 +
4225 +#include <asm/atomic.h>
4226 +
4227 +int ft_enable_event(unsigned long id);
4228 +int ft_disable_event(unsigned long id);
4229 +int ft_is_event_enabled(unsigned long id);
4230 +int ft_disable_all_events(void);
4231 +
4232 +/* atomic_* funcitons are inline anyway */
4233 +static inline int fetch_and_inc(int *val)
4234 +{
4235 +	return atomic_add_return(1, (atomic_t*) val) - 1;
4236 +}
4237 +
4238 +static inline int fetch_and_dec(int *val)
4239 +{
4240 +	return atomic_sub_return(1, (atomic_t*) val) + 1;
4241 +}
4242 +
4243 +/* Don't use rewriting implementation if kernel text pages are read-only.
4244 + * Ftrace gets around this by using the identity mapping, but that's more
4245 + * effort that is warrented right now for Feather-Trace.
4246 + * Eventually, it may make sense to replace Feather-Trace with ftrace.
4247 + */
4248 +#if defined(CONFIG_ARCH_HAS_FEATHER_TRACE) && !defined(CONFIG_DEBUG_RODATA)
4249 +
4250 +#include <asm/feather_trace.h>
4251 +
4252 +#else /* !__ARCH_HAS_FEATHER_TRACE */
4253 +
4254 +/* provide default implementation */
4255 +
4256 +#include <asm/timex.h> /* for get_cycles() */
4257 +
4258 +static inline unsigned long long ft_timestamp(void)
4259 +{
4260 +	return get_cycles();
4261 +}
4262 +
4263 +#define feather_callback
4264 +
4265 +#define MAX_EVENTS 1024
4266 +
4267 +extern int ft_events[MAX_EVENTS];
4268 +
4269 +#define ft_event(id, callback) \
4270 +	if (ft_events[id]) callback();
4271 +
4272 +#define ft_event0(id, callback) \
4273 +	if (ft_events[id]) callback(id);
4274 +
4275 +#define ft_event1(id, callback, param) \
4276 +	if (ft_events[id]) callback(id, param);
4277 +
4278 +#define ft_event2(id, callback, param, param2) \
4279 +	if (ft_events[id]) callback(id, param, param2);
4280 +
4281 +#define ft_event3(id, callback, p, p2, p3) \
4282 +	if (ft_events[id]) callback(id, p, p2, p3);
4283 +
4284 +#endif /* __ARCH_HAS_FEATHER_TRACE */
4285 +
4286 +#endif
4287 diff --git a/include/litmus/fpmath.h b/include/litmus/fpmath.h
4288 new file mode 100644
4289 index 0000000..04d4bca
4290 --- /dev/null
4291 +++ b/include/litmus/fpmath.h
4292 @@ -0,0 +1,145 @@
4293 +#ifndef __FP_MATH_H__
4294 +#define __FP_MATH_H__
4295 +
4296 +#ifndef __KERNEL__
4297 +#include <stdint.h>
4298 +#define abs(x) (((x) < 0) ? -(x) : x)
4299 +#endif
4300 +
4301 +// Use 64-bit because we want to track things at the nanosecond scale.
4302 +// This can lead to very large numbers.
4303 +typedef int64_t fpbuf_t;
4304 +typedef struct
4305 +{
4306 +	fpbuf_t val;
4307 +} fp_t;
4308 +
4309 +#define FP_SHIFT 10
4310 +#define ROUND_BIT (FP_SHIFT - 1)
4311 +
4312 +#define _fp(x) ((fp_t) {x})
4313 +
4314 +#ifdef __KERNEL__
4315 +static const fp_t LITMUS_FP_ZERO = {.val = 0};
4316 +static const fp_t LITMUS_FP_ONE = {.val = (1 << FP_SHIFT)};
4317 +#endif
4318 +
4319 +static inline fp_t FP(fpbuf_t x)
4320 +{
4321 +	return _fp(((fpbuf_t) x) << FP_SHIFT);
4322 +}
4323 +
4324 +/* divide two integers to obtain a fixed point value  */
4325 +static inline fp_t _frac(fpbuf_t a, fpbuf_t b)
4326 +{
4327 +	return _fp(FP(a).val / (b));
4328 +}
4329 +
4330 +static inline fpbuf_t _point(fp_t x)
4331 +{
4332 +	return (x.val % (1 << FP_SHIFT));
4333 +
4334 +}
4335 +
4336 +#define fp2str(x) x.val
4337 +/*(x.val >> FP_SHIFT), (x.val % (1 << FP_SHIFT)) */
4338 +#define _FP_  "%ld/1024"
4339 +
4340 +static inline fpbuf_t _floor(fp_t x)
4341 +{
4342 +	return x.val >> FP_SHIFT;
4343 +}
4344 +
4345 +/* FIXME: negative rounding */
4346 +static inline fpbuf_t _round(fp_t x)
4347 +{
4348 +	return _floor(x) + ((x.val >> ROUND_BIT) & 1);
4349 +}
4350 +
4351 +/* multiply two fixed point values */
4352 +static inline fp_t _mul(fp_t a, fp_t b)
4353 +{
4354 +	return _fp((a.val * b.val) >> FP_SHIFT);
4355 +}
4356 +
4357 +static inline fp_t _div(fp_t a, fp_t b)
4358 +{
4359 +#if !defined(__KERNEL__) && !defined(unlikely)
4360 +#define unlikely(x) (x)
4361 +#define DO_UNDEF_UNLIKELY
4362 +#endif
4363 +	/* try not to overflow */
4364 +	if (unlikely(  a.val > (2l << ((sizeof(fpbuf_t)*8) - FP_SHIFT)) ))
4365 +		return _fp((a.val / b.val) << FP_SHIFT);
4366 +	else
4367 +		return _fp((a.val << FP_SHIFT) / b.val);
4368 +#ifdef DO_UNDEF_UNLIKELY
4369 +#undef unlikely
4370 +#undef DO_UNDEF_UNLIKELY
4371 +#endif
4372 +}
4373 +
4374 +static inline fp_t _add(fp_t a, fp_t b)
4375 +{
4376 +	return _fp(a.val + b.val);
4377 +}
4378 +
4379 +static inline fp_t _sub(fp_t a, fp_t b)
4380 +{
4381 +	return _fp(a.val - b.val);
4382 +}
4383 +
4384 +static inline fp_t _neg(fp_t x)
4385 +{
4386 +	return _fp(-x.val);
4387 +}
4388 +
4389 +static inline fp_t _abs(fp_t x)
4390 +{
4391 +	return _fp(abs(x.val));
4392 +}
4393 +
4394 +/* works the same as casting float/double to integer */
4395 +static inline fpbuf_t _fp_to_integer(fp_t x)
4396 +{
4397 +	return _floor(_abs(x)) * ((x.val > 0) ? 1 : -1);
4398 +}
4399 +
4400 +static inline fp_t _integer_to_fp(fpbuf_t x)
4401 +{
4402 +	return _frac(x,1);
4403 +}
4404 +
4405 +static inline int _leq(fp_t a, fp_t b)
4406 +{
4407 +	return a.val <= b.val;
4408 +}
4409 +
4410 +static inline int _geq(fp_t a, fp_t b)
4411 +{
4412 +	return a.val >= b.val;
4413 +}
4414 +
4415 +static inline int _lt(fp_t a, fp_t b)
4416 +{
4417 +	return a.val < b.val;
4418 +}
4419 +
4420 +static inline int _gt(fp_t a, fp_t b)
4421 +{
4422 +	return a.val > b.val;
4423 +}
4424 +
4425 +static inline int _eq(fp_t a, fp_t b)
4426 +{
4427 +	return a.val == b.val;
4428 +}
4429 +
4430 +static inline fp_t _max(fp_t a, fp_t b)
4431 +{
4432 +	if (a.val < b.val)
4433 +		return b;
4434 +	else
4435 +		return a;
4436 +}
4437 +#endif
4438 diff --git a/include/litmus/ftdev.h b/include/litmus/ftdev.h
4439 new file mode 100644
4440 index 0000000..0b95987
4441 --- /dev/null
4442 +++ b/include/litmus/ftdev.h
4443 @@ -0,0 +1,55 @@
4444 +#ifndef _LITMUS_FTDEV_H_
4445 +#define	_LITMUS_FTDEV_H_
4446 +
4447 +#include <litmus/feather_trace.h>
4448 +#include <litmus/feather_buffer.h>
4449 +#include <linux/mutex.h>
4450 +#include <linux/cdev.h>
4451 +
4452 +#define FTDEV_ENABLE_CMD 	0
4453 +#define FTDEV_DISABLE_CMD 	1
4454 +
4455 +struct ftdev;
4456 +
4457 +/* return 0 if buffer can be opened, otherwise -$REASON */
4458 +typedef int  (*ftdev_can_open_t)(struct ftdev* dev, unsigned int buf_no);
4459 +/* return 0 on success, otherwise -$REASON */
4460 +typedef int  (*ftdev_alloc_t)(struct ftdev* dev, unsigned int buf_no);
4461 +typedef void (*ftdev_free_t)(struct ftdev* dev, unsigned int buf_no);
4462 +/* Let devices handle writes from userspace. No synchronization provided. */
4463 +typedef ssize_t (*ftdev_write_t)(struct ft_buffer* buf, size_t len, const char __user *from);
4464 +
4465 +struct ftdev_event;
4466 +
4467 +struct ftdev_minor {
4468 +	struct ft_buffer*	buf;
4469 +	unsigned int		readers;
4470 +	struct mutex		lock;
4471 +	/* FIXME: filter for authorized events */
4472 +	struct ftdev_event*	events;
4473 +	struct device*		device;
4474 +	struct ftdev*		ftdev;
4475 +};
4476 +
4477 +struct ftdev {
4478 +	dev_t			major;
4479 +	struct cdev		cdev;
4480 +	struct class*		class;
4481 +	const char*		name;
4482 +	struct ftdev_minor*	minor;
4483 +	unsigned int		minor_cnt;
4484 +	ftdev_alloc_t		alloc;
4485 +	ftdev_free_t		free;
4486 +	ftdev_can_open_t	can_open;
4487 +	ftdev_write_t		write;
4488 +};
4489 +
4490 +struct ft_buffer* alloc_ft_buffer(unsigned int count, size_t size);
4491 +void free_ft_buffer(struct ft_buffer* buf);
4492 +
4493 +int ftdev_init(	struct ftdev* ftdev, struct module* owner,
4494 +		const int minor_cnt, const char* name);
4495 +void ftdev_exit(struct ftdev* ftdev);
4496 +int register_ftdev(struct ftdev* ftdev);
4497 +
4498 +#endif
4499 diff --git a/include/litmus/gpu_affinity.h b/include/litmus/gpu_affinity.h
4500 new file mode 100644
4501 index 0000000..6b3fb8b
4502 --- /dev/null
4503 +++ b/include/litmus/gpu_affinity.h
4504 @@ -0,0 +1,49 @@
4505 +#ifndef LITMUS_GPU_AFFINITY_H
4506 +#define LITMUS_GPU_AFFINITY_H
4507 +
4508 +#include <litmus/rt_param.h>
4509 +#include <litmus/sched_plugin.h>
4510 +#include <litmus/litmus.h>
4511 +
4512 +void update_gpu_estimate(struct task_struct* t, lt_t observed);
4513 +gpu_migration_dist_t gpu_migration_distance(int a, int b);
4514 +
4515 +static inline void reset_gpu_tracker(struct task_struct* t)
4516 +{
4517 +	t->rt_param.accum_gpu_time = 0;
4518 +}
4519 +
4520 +static inline void start_gpu_tracker(struct task_struct* t)
4521 +{
4522 +	t->rt_param.gpu_time_stamp = litmus_clock();
4523 +}
4524 +
4525 +static inline void stop_gpu_tracker(struct task_struct* t)
4526 +{
4527 +	lt_t now = litmus_clock();
4528 +	t->rt_param.accum_gpu_time += (now - t->rt_param.gpu_time_stamp);
4529 +}
4530 +
4531 +static inline lt_t get_gpu_time(struct task_struct* t)
4532 +{
4533 +	return t->rt_param.accum_gpu_time;
4534 +}
4535 +
4536 +static inline lt_t get_gpu_estimate(struct task_struct* t, gpu_migration_dist_t dist)
4537 +{
4538 +	int i;
4539 +	fpbuf_t temp = _fp_to_integer(t->rt_param.gpu_migration_est[dist].est);
4540 +	lt_t val = (temp >= 0) ? temp : 0;  // never allow negative estimates...
4541 +
4542 +	WARN_ON(temp < 0);
4543 +
4544 +	// lower-bound a distant migration to be at least equal to the level
4545 +	// below it.
4546 +	for(i = dist-1; (val == 0) && (i >= MIG_LOCAL); --i) {
4547 +		val = _fp_to_integer(t->rt_param.gpu_migration_est[i].est);
4548 +	}
4549 +
4550 +	return ((val > 0) ? val : dist+1);
4551 +}
4552 +
4553 +#endif
4554 diff --git a/include/litmus/ikglp_lock.h b/include/litmus/ikglp_lock.h
4555 new file mode 100644
4556 index 0000000..af6f151
4557 --- /dev/null
4558 +++ b/include/litmus/ikglp_lock.h
4559 @@ -0,0 +1,160 @@
4560 +#ifndef LITMUS_IKGLP_H
4561 +#define LITMUS_IKGLP_H
4562 +
4563 +#include <litmus/litmus.h>
4564 +#include <litmus/binheap.h>
4565 +#include <litmus/locking.h>
4566 +
4567 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
4568 +#include <litmus/kexclu_affinity.h>
4569 +
4570 +struct ikglp_affinity;
4571 +#endif
4572 +
4573 +typedef struct ikglp_heap_node
4574 +{
4575 +	struct task_struct *task;
4576 +	struct binheap_node node;
4577 +} ikglp_heap_node_t;
4578 +
4579 +struct fifo_queue;
4580 +struct ikglp_wait_state;
4581 +
4582 +typedef struct ikglp_donee_heap_node
4583 +{
4584 +	struct task_struct *task;
4585 +	struct fifo_queue *fq;
4586 +	struct ikglp_wait_state *donor_info;  // cross-linked with ikglp_wait_state_t of donor
4587 +
4588 +	struct binheap_node node;
4589 +} ikglp_donee_heap_node_t;
4590 +
4591 +// Maintains the state of a request as it goes through the IKGLP
4592 +typedef struct ikglp_wait_state {
4593 +	struct task_struct *task;  // pointer back to the requesting task
4594 +
4595 +	// Data for while waiting in FIFO Queue
4596 +	wait_queue_t fq_node;
4597 +	ikglp_heap_node_t global_heap_node;
4598 +	ikglp_donee_heap_node_t donee_heap_node;
4599 +
4600 +	// Data for while waiting in PQ
4601 +	ikglp_heap_node_t pq_node;
4602 +
4603 +	// Data for while waiting as a donor
4604 +	ikglp_donee_heap_node_t *donee_info;  // cross-linked with donee's ikglp_donee_heap_node_t
4605 +	struct nested_info prio_donation;
4606 +	struct binheap_node node;
4607 +} ikglp_wait_state_t;
4608 +
4609 +/* struct for semaphore with priority inheritance */
4610 +struct fifo_queue
4611 +{
4612 +	wait_queue_head_t wait;
4613 +	struct task_struct* owner;
4614 +
4615 +	// used for bookkeepping
4616 +	ikglp_heap_node_t global_heap_node;
4617 +	ikglp_donee_heap_node_t donee_heap_node;
4618 +
4619 +	struct task_struct* hp_waiter;
4620 +	int count; /* number of waiters + holder */
4621 +
4622 +	struct nested_info nest;
4623 +};
4624 +
4625 +struct ikglp_semaphore
4626 +{
4627 +	struct litmus_lock litmus_lock;
4628 +
4629 +	raw_spinlock_t	lock;
4630 +	raw_spinlock_t	real_lock;
4631 +
4632 +	int nr_replicas; // AKA k
4633 +	int m;
4634 +
4635 +	int max_fifo_len; // max len of a fifo queue
4636 +	int nr_in_fifos;
4637 +
4638 +	struct binheap_handle top_m;  // min heap, base prio
4639 +	int top_m_size;  // number of nodes in top_m
4640 +
4641 +	struct binheap_handle not_top_m; // max heap, base prio
4642 +
4643 +	struct binheap_handle donees;	// min-heap, base prio
4644 +	struct fifo_queue *shortest_fifo_queue; // pointer to shortest fifo queue
4645 +
4646 +	/* data structures for holding requests */
4647 +	struct fifo_queue *fifo_queues; // array nr_replicas in length
4648 +	struct binheap_handle priority_queue;	// max-heap, base prio
4649 +	struct binheap_handle donors;	// max-heap, base prio
4650 +
4651 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
4652 +	struct ikglp_affinity *aff_obs;
4653 +#endif
4654 +};
4655 +
4656 +static inline struct ikglp_semaphore* ikglp_from_lock(struct litmus_lock* lock)
4657 +{
4658 +	return container_of(lock, struct ikglp_semaphore, litmus_lock);
4659 +}
4660 +
4661 +int ikglp_lock(struct litmus_lock* l);
4662 +int ikglp_unlock(struct litmus_lock* l);
4663 +int ikglp_close(struct litmus_lock* l);
4664 +void ikglp_free(struct litmus_lock* l);
4665 +struct litmus_lock* ikglp_new(int m, struct litmus_lock_ops*, void* __user arg);
4666 +
4667 +
4668 +
4669 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
4670 +
4671 +struct ikglp_queue_info
4672 +{
4673 +	struct fifo_queue* q;
4674 +	lt_t estimated_len;
4675 +	int *nr_cur_users;
4676 +};
4677 +
4678 +struct ikglp_affinity_ops
4679 +{
4680 +	struct fifo_queue* (*advise_enqueue)(struct ikglp_affinity* aff, struct task_struct* t);	// select FIFO
4681 +	ikglp_wait_state_t* (*advise_steal)(struct ikglp_affinity* aff, struct fifo_queue* dst);	// select steal from FIFO
4682 +	ikglp_donee_heap_node_t* (*advise_donee_selection)(struct ikglp_affinity* aff, struct task_struct* t);	// select a donee
4683 +	ikglp_wait_state_t* (*advise_donor_to_fq)(struct ikglp_affinity* aff, struct fifo_queue* dst);	// select a donor to move to PQ
4684 +
4685 +	void (*notify_enqueue)(struct ikglp_affinity* aff, struct fifo_queue* fq, struct task_struct* t);	// fifo enqueue
4686 +	void (*notify_dequeue)(struct ikglp_affinity* aff, struct fifo_queue* fq, struct task_struct* t);	// fifo dequeue
4687 +	void (*notify_acquired)(struct ikglp_affinity* aff, struct fifo_queue* fq, struct task_struct* t);	// replica acquired
4688 +	void (*notify_freed)(struct ikglp_affinity* aff, struct fifo_queue* fq, struct task_struct* t);		// replica freed
4689 +	int (*replica_to_resource)(struct ikglp_affinity* aff, struct fifo_queue* fq);		// convert a replica # to a GPU (includes offsets and simult user folding)
4690 +};
4691 +
4692 +struct ikglp_affinity
4693 +{
4694 +	struct affinity_observer obs;
4695 +	struct ikglp_affinity_ops *ops;
4696 +	struct ikglp_queue_info *q_info;
4697 +	int *nr_cur_users_on_rsrc;
4698 +	int offset;
4699 +	int nr_simult;
4700 +	int nr_rsrc;
4701 +	int relax_max_fifo_len;
4702 +};
4703 +
4704 +static inline struct ikglp_affinity* ikglp_aff_obs_from_aff_obs(struct affinity_observer* aff_obs)
4705 +{
4706 +	return container_of(aff_obs, struct ikglp_affinity, obs);
4707 +}
4708 +
4709 +int ikglp_aff_obs_close(struct affinity_observer*);
4710 +void ikglp_aff_obs_free(struct affinity_observer*);
4711 +struct affinity_observer* ikglp_gpu_aff_obs_new(struct affinity_observer_ops*,
4712 +												void* __user arg);
4713 +struct affinity_observer* ikglp_simple_gpu_aff_obs_new(struct affinity_observer_ops*,
4714 +												void* __user arg);
4715 +#endif
4716 +
4717 +
4718 +
4719 +#endif
4720 diff --git a/include/litmus/jobs.h b/include/litmus/jobs.h
4721 new file mode 100644
4722 index 0000000..9bd361e
4723 --- /dev/null
4724 +++ b/include/litmus/jobs.h
4725 @@ -0,0 +1,9 @@
4726 +#ifndef __LITMUS_JOBS_H__
4727 +#define __LITMUS_JOBS_H__
4728 +
4729 +void prepare_for_next_period(struct task_struct *t);
4730 +void release_at(struct task_struct *t, lt_t start);
4731 +long complete_job(void);
4732 +
4733 +#endif
4734 +
4735 diff --git a/include/litmus/kexclu_affinity.h b/include/litmus/kexclu_affinity.h
4736 new file mode 100644
4737 index 0000000..f6355de
4738 --- /dev/null
4739 +++ b/include/litmus/kexclu_affinity.h
4740 @@ -0,0 +1,35 @@
4741 +#ifndef LITMUS_AFF_OBS_H
4742 +#define LITMUS_AFF_OBS_H
4743 +
4744 +#include <litmus/locking.h>
4745 +
4746 +struct affinity_observer_ops;
4747 +
4748 +struct affinity_observer
4749 +{
4750 +	struct affinity_observer_ops* ops;
4751 +	int type;
4752 +	int ident;
4753 +
4754 +	struct litmus_lock* lock;  // the lock under observation
4755 +};
4756 +
4757 +typedef int (*aff_obs_open_t)(struct affinity_observer* aff_obs,
4758 +							  void* __user arg);
4759 +typedef int (*aff_obs_close_t)(struct affinity_observer* aff_obs);
4760 +typedef void (*aff_obs_free_t)(struct affinity_observer* aff_obs);
4761 +
4762 +struct affinity_observer_ops
4763 +{
4764 +	aff_obs_open_t open;
4765 +	aff_obs_close_t close;
4766 +	aff_obs_free_t deallocate;
4767 +};
4768 +
4769 +struct litmus_lock* get_lock_from_od(int od);
4770 +
4771 +void affinity_observer_new(struct affinity_observer* aff,
4772 +						   struct affinity_observer_ops* ops,
4773 +						   struct affinity_observer_args* args);
4774 +
4775 +#endif
4776 diff --git a/include/litmus/kfmlp_lock.h b/include/litmus/kfmlp_lock.h
4777 new file mode 100644
4778 index 0000000..5f0aae6
4779 --- /dev/null
4780 +++ b/include/litmus/kfmlp_lock.h
4781 @@ -0,0 +1,97 @@
4782 +#ifndef LITMUS_KFMLP_H
4783 +#define LITMUS_KFMLP_H
4784 +
4785 +#include <litmus/litmus.h>
4786 +#include <litmus/locking.h>
4787 +
4788 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
4789 +#include <litmus/kexclu_affinity.h>
4790 +
4791 +struct kfmlp_affinity;
4792 +#endif
4793 +
4794 +/* struct for semaphore with priority inheritance */
4795 +struct kfmlp_queue
4796 +{
4797 +	wait_queue_head_t wait;
4798 +	struct task_struct* owner;
4799 +	struct task_struct* hp_waiter;
4800 +	int count; /* number of waiters + holder */
4801 +};
4802 +
4803 +struct kfmlp_semaphore
4804 +{
4805 +	struct litmus_lock litmus_lock;
4806 +
4807 +	spinlock_t	lock;
4808 +
4809 +	int num_resources; /* aka k */
4810 +
4811 +	struct kfmlp_queue *queues; /* array */
4812 +	struct kfmlp_queue *shortest_queue; /* pointer to shortest queue */
4813 +
4814 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
4815 +	struct kfmlp_affinity *aff_obs;
4816 +#endif
4817 +};
4818 +
4819 +static inline struct kfmlp_semaphore* kfmlp_from_lock(struct litmus_lock* lock)
4820 +{
4821 +	return container_of(lock, struct kfmlp_semaphore, litmus_lock);
4822 +}
4823 +
4824 +int kfmlp_lock(struct litmus_lock* l);
4825 +int kfmlp_unlock(struct litmus_lock* l);
4826 +int kfmlp_close(struct litmus_lock* l);
4827 +void kfmlp_free(struct litmus_lock* l);
4828 +struct litmus_lock* kfmlp_new(struct litmus_lock_ops*, void* __user arg);
4829 +
4830 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
4831 +
4832 +struct kfmlp_queue_info
4833 +{
4834 +	struct kfmlp_queue* q;
4835 +	lt_t estimated_len;
4836 +	int *nr_cur_users;
4837 +};
4838 +
4839 +struct kfmlp_affinity_ops
4840 +{
4841 +	struct kfmlp_queue* (*advise_enqueue)(struct kfmlp_affinity* aff, struct task_struct* t);
4842 +	struct task_struct* (*advise_steal)(struct kfmlp_affinity* aff, wait_queue_t** to_steal, struct kfmlp_queue** to_steal_from);
4843 +	void (*notify_enqueue)(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t);
4844 +	void (*notify_dequeue)(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t);
4845 +	void (*notify_acquired)(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t);
4846 +	void (*notify_freed)(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t);
4847 +	int (*replica_to_resource)(struct kfmlp_affinity* aff, struct kfmlp_queue* fq);
4848 +};
4849 +
4850 +struct kfmlp_affinity
4851 +{
4852 +	struct affinity_observer obs;
4853 +	struct kfmlp_affinity_ops *ops;
4854 +	struct kfmlp_queue_info *q_info;
4855 +	int *nr_cur_users_on_rsrc;
4856 +	int offset;
4857 +	int nr_simult;
4858 +	int nr_rsrc;
4859 +};
4860 +
4861 +static inline struct kfmlp_affinity* kfmlp_aff_obs_from_aff_obs(struct affinity_observer* aff_obs)
4862 +{
4863 +	return container_of(aff_obs, struct kfmlp_affinity, obs);
4864 +}
4865 +
4866 +int kfmlp_aff_obs_close(struct affinity_observer*);
4867 +void kfmlp_aff_obs_free(struct affinity_observer*);
4868 +struct affinity_observer* kfmlp_gpu_aff_obs_new(struct affinity_observer_ops*,
4869 +											void* __user arg);
4870 +struct affinity_observer* kfmlp_simple_gpu_aff_obs_new(struct affinity_observer_ops*,
4871 +												void* __user arg);
4872 +
4873 +
4874 +#endif
4875 +
4876 +#endif
4877 +
4878 +
4879 diff --git a/include/litmus/litmus.h b/include/litmus/litmus.h
4880 new file mode 100644
4881 index 0000000..71df378
4882 --- /dev/null
4883 +++ b/include/litmus/litmus.h
4884 @@ -0,0 +1,282 @@
4885 +/*
4886 + * Constant definitions related to
4887 + * scheduling policy.
4888 + */
4889 +
4890 +#ifndef _LINUX_LITMUS_H_
4891 +#define _LINUX_LITMUS_H_
4892 +
4893 +#include <litmus/debug_trace.h>
4894 +
4895 +#ifdef CONFIG_RELEASE_MASTER
4896 +extern atomic_t release_master_cpu;
4897 +#endif
4898 +
4899 +/* in_list - is a given list_head queued on some list?
4900 + */
4901 +static inline int in_list(struct list_head* list)
4902 +{
4903 +	return !(  /* case 1: deleted */
4904 +		   (list->next == LIST_POISON1 &&
4905 +		    list->prev == LIST_POISON2)
4906 +		 ||
4907 +		   /* case 2: initialized */
4908 +		   (list->next == list &&
4909 +		    list->prev == list)
4910 +		);
4911 +}
4912 +
4913 +
4914 +struct task_struct* __waitqueue_remove_first(wait_queue_head_t *wq);
4915 +
4916 +#define NO_CPU			0xffffffff
4917 +
4918 +void litmus_fork(struct task_struct *tsk);
4919 +void litmus_exec(void);
4920 +/* clean up real-time state of a task */
4921 +void exit_litmus(struct task_struct *dead_tsk);
4922 +
4923 +long litmus_admit_task(struct task_struct *tsk);
4924 +void litmus_exit_task(struct task_struct *tsk);
4925 +
4926 +#define is_realtime(t) 		((t)->policy == SCHED_LITMUS)
4927 +#define rt_transition_pending(t) \
4928 +	((t)->rt_param.transition_pending)
4929 +
4930 +#define tsk_rt(t)		(&(t)->rt_param)
4931 +
4932 +/*	Realtime utility macros */
4933 +#define get_rt_flags(t)		(tsk_rt(t)->flags)
4934 +#define set_rt_flags(t,f) 	(tsk_rt(t)->flags=(f))
4935 +#define get_exec_cost(t)  	(tsk_rt(t)->task_params.exec_cost)
4936 +#define get_exec_time(t)	(tsk_rt(t)->job_params.exec_time)
4937 +#define get_rt_period(t)	(tsk_rt(t)->task_params.period)
4938 +#define get_rt_phase(t)		(tsk_rt(t)->task_params.phase)
4939 +#define get_partition(t) 	(tsk_rt(t)->task_params.cpu)
4940 +#define get_deadline(t)		(tsk_rt(t)->job_params.deadline)
4941 +#define get_period(t)		(tsk_rt(t)->task_params.period)
4942 +#define get_release(t)		(tsk_rt(t)->job_params.release)
4943 +#define get_class(t)		(tsk_rt(t)->task_params.cls)
4944 +
4945 +#define is_priority_boosted(t)	(tsk_rt(t)->priority_boosted)
4946 +#define get_boost_start(t)	(tsk_rt(t)->boost_start_time)
4947 +
4948 +#define effective_priority(t) ((!(tsk_rt(t)->inh_task)) ? t : tsk_rt(t)->inh_task)
4949 +#define base_priority(t) (t)
4950 +
4951 +inline static int budget_exhausted(struct task_struct* t)
4952 +{
4953 +	return get_exec_time(t) >= get_exec_cost(t);
4954 +}
4955 +
4956 +inline static lt_t budget_remaining(struct task_struct* t)
4957 +{
4958 +	if (!budget_exhausted(t))
4959 +		return get_exec_cost(t) - get_exec_time(t);
4960 +	else
4961 +		/* avoid overflow */
4962 +		return 0;
4963 +}
4964 +
4965 +#define budget_enforced(t) (tsk_rt(t)->task_params.budget_policy != NO_ENFORCEMENT)
4966 +
4967 +#define budget_precisely_enforced(t) (tsk_rt(t)->task_params.budget_policy \
4968 +				      == PRECISE_ENFORCEMENT)
4969 +
4970 +#define is_hrt(t)     		\
4971 +	(tsk_rt(t)->task_params.cls == RT_CLASS_HARD)
4972 +#define is_srt(t)     		\
4973 +	(tsk_rt(t)->task_params.cls == RT_CLASS_SOFT)
4974 +#define is_be(t)      		\
4975 +	(tsk_rt(t)->task_params.cls == RT_CLASS_BEST_EFFORT)
4976 +
4977 +/* Our notion of time within LITMUS: kernel monotonic time. */
4978 +static inline lt_t litmus_clock(void)
4979 +{
4980 +	return ktime_to_ns(ktime_get());
4981 +}
4982 +
4983 +/* A macro to convert from nanoseconds to ktime_t. */
4984 +#define ns_to_ktime(t)		ktime_add_ns(ktime_set(0, 0), t)
4985 +
4986 +#define get_domain(t) (tsk_rt(t)->domain)
4987 +
4988 +/* Honor the flag in the preempt_count variable that is set
4989 + * when scheduling is in progress.
4990 + */
4991 +#define is_running(t) 			\
4992 +	((t)->state == TASK_RUNNING || 	\
4993 +	 task_thread_info(t)->preempt_count & PREEMPT_ACTIVE)
4994 +
4995 +#define is_blocked(t)       \
4996 +	(!is_running(t))
4997 +#define is_released(t, now)	\
4998 +	(lt_before_eq(get_release(t), now))
4999 +#define is_tardy(t, now)    \
5000 +	(lt_before_eq(tsk_rt(t)->job_params.deadline, now))
5001 +
5002 +/* real-time comparison macros */
5003 +#define earlier_deadline(a, b) (lt_before(\
5004 +	(a)->rt_param.job_params.deadline,\
5005 +	(b)->rt_param.job_params.deadline))
5006 +#define shorter_period(a, b) (lt_before(\
5007 +	(a)->rt_param.task_params.period,\
5008 +	(b)->rt_param.task_params.period))
5009 +#define earlier_release(a, b)  (lt_before(\
5010 +	(a)->rt_param.job_params.release,\
5011 +	(b)->rt_param.job_params.release))
5012 +void preempt_if_preemptable(struct task_struct* t, int on_cpu);
5013 +
5014 +#ifdef CONFIG_LITMUS_LOCKING
5015 +void srp_ceiling_block(void);
5016 +#else
5017 +#define srp_ceiling_block() /* nothing */
5018 +#endif
5019 +
5020 +#define bheap2task(hn) ((struct task_struct*) hn->value)
5021 +
5022 +#ifdef CONFIG_NP_SECTION
5023 +
5024 +static inline int is_kernel_np(struct task_struct *t)
5025 +{
5026 +	return tsk_rt(t)->kernel_np;
5027 +}
5028 +
5029 +static inline int is_user_np(struct task_struct *t)
5030 +{
5031 +	return tsk_rt(t)->ctrl_page ? tsk_rt(t)->ctrl_page->sched.np.flag : 0;
5032 +}
5033 +
5034 +static inline void request_exit_np(struct task_struct *t)
5035 +{
5036 +	if (is_user_np(t)) {
5037 +		/* Set the flag that tells user space to call
5038 +		 * into the kernel at the end of a critical section. */
5039 +		if (likely(tsk_rt(t)->ctrl_page)) {
5040 +			TRACE_TASK(t, "setting delayed_preemption flag\n");
5041 +			tsk_rt(t)->ctrl_page->sched.np.preempt = 1;
5042 +		}
5043 +	}
5044 +}
5045 +
5046 +static inline void make_np(struct task_struct *t)
5047 +{
5048 +	tsk_rt(t)->kernel_np++;
5049 +}
5050 +
5051 +/* Caller should check if preemption is necessary when
5052 + * the function return 0.
5053 + */
5054 +static inline int take_np(struct task_struct *t)
5055 +{
5056 +	return --tsk_rt(t)->kernel_np;
5057 +}
5058 +
5059 +/* returns 0 if remote CPU needs an IPI to preempt, 1 if no IPI is required */
5060 +static inline int request_exit_np_atomic(struct task_struct *t)
5061 +{
5062 +	union np_flag old, new;
5063 +
5064 +	if (tsk_rt(t)->ctrl_page) {
5065 +		old.raw = tsk_rt(t)->ctrl_page->sched.raw;
5066 +		if (old.np.flag == 0) {
5067 +			/* no longer non-preemptive */
5068 +			return 0;
5069 +		} else if (old.np.preempt) {
5070 +			/* already set, nothing for us to do */
5071 +			return 1;
5072 +		} else {
5073 +			/* non preemptive and flag not set */
5074 +			new.raw = old.raw;
5075 +			new.np.preempt = 1;
5076 +			/* if we get old back, then we atomically set the flag */
5077 +			return cmpxchg(&tsk_rt(t)->ctrl_page->sched.raw, old.raw, new.raw) == old.raw;
5078 +			/* If we raced with a concurrent change, then so be
5079 +			 * it. Deliver it by IPI.  We don't want an unbounded
5080 +			 * retry loop here since tasks might exploit that to
5081 +			 * keep the kernel busy indefinitely. */
5082 +		}
5083 +	} else
5084 +		return 0;
5085 +}
5086 +
5087 +#else
5088 +
5089 +static inline int is_kernel_np(struct task_struct* t)
5090 +{
5091 +	return 0;
5092 +}
5093 +
5094 +static inline int is_user_np(struct task_struct* t)
5095 +{
5096 +	return 0;
5097 +}
5098 +
5099 +static inline void request_exit_np(struct task_struct *t)
5100 +{
5101 +	/* request_exit_np() shouldn't be called if !CONFIG_NP_SECTION */
5102 +	BUG();
5103 +}
5104 +
5105 +static inline int request_exit_np_atomic(struct task_struct *t)
5106 +{
5107 +	return 0;
5108 +}
5109 +
5110 +#endif
5111 +
5112 +static inline void clear_exit_np(struct task_struct *t)
5113 +{
5114 +	if (likely(tsk_rt(t)->ctrl_page))
5115 +		tsk_rt(t)->ctrl_page->sched.np.preempt = 0;
5116 +}
5117 +
5118 +static inline int is_np(struct task_struct *t)
5119 +{
5120 +#ifdef CONFIG_SCHED_DEBUG_TRACE
5121 +	int kernel, user;
5122 +	kernel = is_kernel_np(t);
5123 +	user   = is_user_np(t);
5124 +	if (kernel || user)
5125 +		TRACE_TASK(t, " is non-preemptive: kernel=%d user=%d\n",
5126 +
5127 +			   kernel, user);
5128 +	return kernel || user;
5129 +#else
5130 +	return unlikely(is_kernel_np(t) || is_user_np(t));
5131 +#endif
5132 +}
5133 +
5134 +static inline int is_present(struct task_struct* t)
5135 +{
5136 +	return t && tsk_rt(t)->present;
5137 +}
5138 +
5139 +
5140 +/* make the unit explicit */
5141 +typedef unsigned long quanta_t;
5142 +
5143 +enum round {
5144 +	FLOOR,
5145 +	CEIL
5146 +};
5147 +
5148 +
5149 +/* Tick period is used to convert ns-specified execution
5150 + * costs and periods into tick-based equivalents.
5151 + */
5152 +extern ktime_t tick_period;
5153 +
5154 +static inline quanta_t time2quanta(lt_t time, enum round round)
5155 +{
5156 +	s64  quantum_length = ktime_to_ns(tick_period);
5157 +
5158 +	if (do_div(time, quantum_length) && round == CEIL)
5159 +		time++;
5160 +	return (quanta_t) time;
5161 +}
5162 +
5163 +/* By how much is cpu staggered behind CPU 0? */
5164 +u64 cpu_stagger_offset(int cpu);
5165 +
5166 +#endif
5167 diff --git a/include/litmus/litmus_proc.h b/include/litmus/litmus_proc.h
5168 new file mode 100644
5169 index 0000000..6800e72
5170 --- /dev/null
5171 +++ b/include/litmus/litmus_proc.h
5172 @@ -0,0 +1,25 @@
5173 +#include <litmus/sched_plugin.h>
5174 +#include <linux/proc_fs.h>
5175 +
5176 +int __init init_litmus_proc(void);
5177 +void exit_litmus_proc(void);
5178 +
5179 +/*
5180 + * On success, returns 0 and sets the pointer to the location of the new
5181 + * proc dir entry, otherwise returns an error code and sets pde to NULL.
5182 + */
5183 +long make_plugin_proc_dir(struct sched_plugin* plugin,
5184 +		struct proc_dir_entry** pde);
5185 +
5186 +/*
5187 + * Plugins should deallocate all child proc directory entries before
5188 + * calling this, to avoid memory leaks.
5189 + */
5190 +void remove_plugin_proc_dir(struct sched_plugin* plugin);
5191 +
5192 +
5193 +/* Copy at most size-1 bytes from ubuf into kbuf, null-terminate buf, and
5194 + * remove a '\n' if present. Returns the number of bytes that were read or
5195 + * -EFAULT. */
5196 +int copy_and_chomp(char *kbuf, unsigned long ksize,
5197 +		   __user const char* ubuf, unsigned long ulength);
5198 diff --git a/include/litmus/litmus_softirq.h b/include/litmus/litmus_softirq.h
5199 new file mode 100644
5200 index 0000000..1eb5ea1
5201 --- /dev/null
5202 +++ b/include/litmus/litmus_softirq.h
5203 @@ -0,0 +1,199 @@
5204 +#ifndef __LITMUS_SOFTIRQ_H
5205 +#define __LITMUS_SOFTIRQ_H
5206 +
5207 +#include <linux/interrupt.h>
5208 +#include <linux/workqueue.h>
5209 +
5210 +/*
5211 +   Threaded tasklet handling for Litmus.  Tasklets
5212 +   are scheduled with the priority of the tasklet's
5213 +   owner---that is, the RT task on behalf the tasklet
5214 +   runs.
5215 +
5216 +   Tasklets are current scheduled in FIFO order with
5217 +   NO priority inheritance for "blocked" tasklets.
5218 +
5219 +   klitirqd assumes the priority of the owner of the
5220 +   tasklet when the tasklet is next to execute.
5221 +
5222 +   Currently, hi-tasklets are scheduled before
5223 +   low-tasklets, regardless of priority of low-tasklets.
5224 +   And likewise, low-tasklets are scheduled before work
5225 +   queue objects.  This priority inversion probably needs
5226 +   to be fixed, though it is not an issue if our work with
5227 +   GPUs as GPUs are owned (and associated klitirqds) for
5228 +   exclusive time periods, thus no inversions can
5229 +   occur.
5230 + */
5231 +
5232 +
5233 +
5234 +#define NR_LITMUS_SOFTIRQD CONFIG_NR_LITMUS_SOFTIRQD
5235 +
5236 +/* Spawns NR_LITMUS_SOFTIRQD klitirqd daemons.
5237 +   Actual launch of threads is deffered to kworker's
5238 +   workqueue, so daemons will likely not be immediately
5239 +   running when this function returns, though the required
5240 +   data will be initialized.
5241 +
5242 +   @affinity_set: an array expressing the processor affinity
5243 +    for each of the NR_LITMUS_SOFTIRQD daemons.  May be set
5244 +    to NULL for global scheduling.
5245 +
5246 +	- Examples -
5247 +	8-CPU system with two CPU clusters:
5248 +		affinity[] = {0, 0, 0, 0, 3, 3, 3, 3}
5249 +		NOTE: Daemons not actually bound to specified CPU, but rather
5250 +		cluster in which the CPU resides.
5251 +
5252 +	8-CPU system, partitioned:
5253 +		affinity[] = {0, 1, 2, 3, 4, 5, 6, 7}
5254 +
5255 +	FIXME: change array to a CPU topology or array of cpumasks
5256 +
5257 + */
5258 +void spawn_klitirqd(int* affinity);
5259 +
5260 +
5261 +/* Raises a flag to tell klitirqds to terminate.
5262 +   Termination is async, so some threads may be running
5263 +   after function return. */
5264 +void kill_klitirqd(void);
5265 +
5266 +
5267 +/* Returns 1 if all NR_LITMUS_SOFTIRQD klitirqs are ready
5268 +   to handle tasklets. 0, otherwise.*/
5269 +int klitirqd_is_ready(void);
5270 +
5271 +/* Returns 1 if no NR_LITMUS_SOFTIRQD klitirqs are ready
5272 +   to handle tasklets. 0, otherwise.*/
5273 +int klitirqd_is_dead(void);
5274 +
5275 +/* Flushes all pending work out to the OS for regular
5276 + * tasklet/work processing of the specified 'owner'
5277 + *
5278 + * PRECOND: klitirqd_thread must have a clear entry
5279 + * in the GPU registry, otherwise this call will become
5280 + * a no-op as work will loop back to the klitirqd_thread.
5281 + *
5282 + * Pass NULL for owner to flush ALL pending items.
5283 + */
5284 +void flush_pending(struct task_struct* klitirqd_thread,
5285 +				   struct task_struct* owner);
5286 +
5287 +struct task_struct* get_klitirqd(unsigned int k_id);
5288 +
5289 +
5290 +extern int __litmus_tasklet_schedule(
5291 +        struct tasklet_struct *t,
5292 +        unsigned int k_id);
5293 +
5294 +/* schedule a tasklet on klitirqd #k_id */
5295 +static inline int litmus_tasklet_schedule(
5296 +    struct tasklet_struct *t,
5297 +    unsigned int k_id)
5298 +{
5299 +	int ret = 0;
5300 +	if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))
5301 +		ret = __litmus_tasklet_schedule(t, k_id);
5302 +	return(ret);
5303 +}
5304 +
5305 +/* for use by __tasklet_schedule() */
5306 +static inline int _litmus_tasklet_schedule(
5307 +    struct tasklet_struct *t,
5308 +    unsigned int k_id)
5309 +{
5310 +    return(__litmus_tasklet_schedule(t, k_id));
5311 +}
5312 +
5313 +
5314 +
5315 +
5316 +extern int __litmus_tasklet_hi_schedule(struct tasklet_struct *t,
5317 +                                         unsigned int k_id);
5318 +
5319 +/* schedule a hi tasklet on klitirqd #k_id */
5320 +static inline int litmus_tasklet_hi_schedule(struct tasklet_struct *t,
5321 +                                              unsigned int k_id)
5322 +{
5323 +	int ret = 0;
5324 +	if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))
5325 +		ret = __litmus_tasklet_hi_schedule(t, k_id);
5326 +	return(ret);
5327 +}
5328 +
5329 +/* for use by __tasklet_hi_schedule() */
5330 +static inline int _litmus_tasklet_hi_schedule(struct tasklet_struct *t,
5331 +                                               unsigned int k_id)
5332 +{
5333 +    return(__litmus_tasklet_hi_schedule(t, k_id));
5334 +}
5335 +
5336 +
5337 +
5338 +
5339 +
5340 +extern int __litmus_tasklet_hi_schedule_first(
5341 +    struct tasklet_struct *t,
5342 +    unsigned int k_id);
5343 +
5344 +/* schedule a hi tasklet on klitirqd #k_id on next go-around */
5345 +/* PRECONDITION: Interrupts must be disabled. */
5346 +static inline int litmus_tasklet_hi_schedule_first(
5347 +    struct tasklet_struct *t,
5348 +    unsigned int k_id)
5349 +{
5350 +	int ret = 0;
5351 +	if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))
5352 +		ret = __litmus_tasklet_hi_schedule_first(t, k_id);
5353 +	return(ret);
5354 +}
5355 +
5356 +/* for use by __tasklet_hi_schedule_first() */
5357 +static inline int _litmus_tasklet_hi_schedule_first(
5358 +    struct tasklet_struct *t,
5359 +    unsigned int k_id)
5360 +{
5361 +    return(__litmus_tasklet_hi_schedule_first(t, k_id));
5362 +}
5363 +
5364 +
5365 +
5366 +//////////////
5367 +
5368 +extern int __litmus_schedule_work(
5369 +	struct work_struct* w,
5370 +	unsigned int k_id);
5371 +
5372 +static inline int litmus_schedule_work(
5373 +	struct work_struct* w,
5374 +	unsigned int k_id)
5375 +{
5376 +	return(__litmus_schedule_work(w, k_id));
5377 +}
5378 +
5379 +
5380 +
5381 +///////////// mutex operations for client threads.
5382 +
5383 +void down_and_set_stat(struct task_struct* t,
5384 +					 enum klitirqd_sem_status to_set,
5385 +					 struct mutex* sem);
5386 +
5387 +void __down_and_reset_and_set_stat(struct task_struct* t,
5388 +				enum klitirqd_sem_status to_reset,
5389 +				enum klitirqd_sem_status to_set,
5390 +				struct mutex* sem);
5391 +
5392 +void up_and_set_stat(struct task_struct* t,
5393 +					enum klitirqd_sem_status to_set,
5394 +					struct mutex* sem);
5395 +
5396 +
5397 +
5398 +void release_klitirqd_lock(struct task_struct* t);
5399 +
5400 +int reacquire_klitirqd_lock(struct task_struct* t);
5401 +
5402 +#endif
5403 diff --git a/include/litmus/locking.h b/include/litmus/locking.h
5404 new file mode 100644
5405 index 0000000..36647fe
5406 --- /dev/null
5407 +++ b/include/litmus/locking.h
5408 @@ -0,0 +1,160 @@
5409 +#ifndef LITMUS_LOCKING_H
5410 +#define LITMUS_LOCKING_H
5411 +
5412 +#include <linux/list.h>
5413 +
5414 +struct litmus_lock_ops;
5415 +
5416 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
5417 +struct nested_info
5418 +{
5419 +	struct litmus_lock *lock;
5420 +	struct task_struct *hp_waiter_eff_prio;
5421 +	struct task_struct **hp_waiter_ptr;
5422 +    struct binheap_node hp_binheap_node;
5423 +};
5424 +
5425 +static inline struct task_struct* top_priority(struct binheap_handle* handle) {
5426 +	if(!binheap_empty(handle)) {
5427 +		return (struct task_struct*)(binheap_top_entry(handle, struct nested_info, hp_binheap_node)->hp_waiter_eff_prio);
5428 +	}
5429 +	return NULL;
5430 +}
5431 +
5432 +void print_hp_waiters(struct binheap_node* n, int depth);
5433 +#endif
5434 +
5435 +
5436 +/* Generic base struct for LITMUS^RT userspace semaphores.
5437 + * This structure should be embedded in protocol-specific semaphores.
5438 + */
5439 +struct litmus_lock {
5440 +	struct litmus_lock_ops *ops;
5441 +	int type;
5442 +
5443 +	int ident;
5444 +
5445 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
5446 +	struct nested_info nest;
5447 +//#ifdef CONFIG_DEBUG_SPINLOCK
5448 +	char cheat_lockdep[2];
5449 +	struct lock_class_key key;
5450 +//#endif
5451 +#endif
5452 +};
5453 +
5454 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
5455 +
5456 +#define MAX_DGL_SIZE CONFIG_LITMUS_MAX_DGL_SIZE
5457 +
5458 +typedef struct dgl_wait_state {
5459 +	struct task_struct *task;	/* task waiting on DGL */
5460 +	struct litmus_lock *locks[MAX_DGL_SIZE];	/* requested locks in DGL */
5461 +	int size;			/* size of the DGL */
5462 +	int nr_remaining;	/* nr locks remainging before DGL is complete */
5463 +	int last_primary;	/* index lock in locks[] that has active priority */
5464 +	wait_queue_t wq_nodes[MAX_DGL_SIZE];
5465 +} dgl_wait_state_t;
5466 +
5467 +void wake_or_wait_on_next_lock(dgl_wait_state_t *dgl_wait);
5468 +void select_next_lock(dgl_wait_state_t* dgl_wait /*, struct litmus_lock* prev_lock*/);
5469 +
5470 +void init_dgl_waitqueue_entry(wait_queue_t *wq_node, dgl_wait_state_t* dgl_wait);
5471 +int dgl_wake_up(wait_queue_t *wq_node, unsigned mode, int sync, void *key);
5472 +void __waitqueue_dgl_remove_first(wait_queue_head_t *wq, dgl_wait_state_t** dgl_wait, struct task_struct **task);
5473 +#endif
5474 +
5475 +typedef int (*lock_op_t)(struct litmus_lock *l);
5476 +typedef lock_op_t lock_close_t;
5477 +typedef lock_op_t lock_lock_t;
5478 +typedef lock_op_t lock_unlock_t;
5479 +
5480 +typedef int (*lock_open_t)(struct litmus_lock *l, void* __user arg);
5481 +typedef void (*lock_free_t)(struct litmus_lock *l);
5482 +
5483 +struct litmus_lock_ops {
5484 +	/* Current task tries to obtain / drop a reference to a lock.
5485 +	 * Optional methods, allowed by default. */
5486 +	lock_open_t open;
5487 +	lock_close_t close;
5488 +
5489 +	/* Current tries to lock/unlock this lock (mandatory methods). */
5490 +	lock_lock_t lock;
5491 +	lock_unlock_t unlock;
5492 +
5493 +	/* The lock is no longer being referenced (mandatory method). */
5494 +	lock_free_t deallocate;
5495 +
5496 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
5497 +	void (*propagate_increase_inheritance)(struct litmus_lock* l, struct task_struct* t, raw_spinlock_t* to_unlock, unsigned long irqflags);
5498 +	void (*propagate_decrease_inheritance)(struct litmus_lock* l, struct task_struct* t, raw_spinlock_t* to_unlock, unsigned long irqflags);
5499 +#endif
5500 +
5501 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
5502 +	raw_spinlock_t* (*get_dgl_spin_lock)(struct litmus_lock *l);
5503 +	int (*dgl_lock)(struct litmus_lock *l, dgl_wait_state_t* dgl_wait, wait_queue_t* wq_node);
5504 +	int (*is_owner)(struct litmus_lock *l, struct task_struct *t);
5505 +	void (*enable_priority)(struct litmus_lock *l, dgl_wait_state_t* dgl_wait);
5506 +#endif
5507 +};
5508 +
5509 +
5510 +/*
5511 + Nested inheritance can be achieved with fine-grain locking when there is
5512 + no need for DGL support, presuming locks are acquired in a partial order
5513 + (no cycles!).  However, DGLs allow locks to be acquired in any order.  This
5514 + makes nested inheritance very difficult (we don't yet know a solution) to
5515 + realize with fine-grain locks, so we use a big lock instead.
5516 +
5517 + Code contains both fine-grain and coarse-grain methods together, side-by-side.
5518 + Each lock operation *IS NOT* surrounded by ifdef/endif to help make code more
5519 + readable.  However, this leads to the odd situation where both code paths
5520 + appear together in code as if they were both active together.
5521 +
5522 + THIS IS NOT REALLY THE CASE!  ONLY ONE CODE PATH IS ACTUALLY ACTIVE!
5523 +
5524 + Example:
5525 +	lock_global_irqsave(coarseLock, flags);
5526 +	lock_fine_irqsave(fineLock, flags);
5527 +
5528 + Reality (coarse):
5529 +	lock_global_irqsave(coarseLock, flags);
5530 +	//lock_fine_irqsave(fineLock, flags);
5531 +
5532 + Reality (fine):
5533 +	//lock_global_irqsave(coarseLock, flags);
5534 +	lock_fine_irqsave(fineLock, flags);
5535 +
5536 + Be careful when you read code involving nested inheritance.
5537 + */
5538 +#if defined(CONFIG_LITMUS_DGL_SUPPORT)
5539 +/* DGL requires a big lock to implement nested inheritance */
5540 +#define lock_global_irqsave(lock, flags)		raw_spin_lock_irqsave((lock), (flags))
5541 +#define lock_global(lock)						raw_spin_lock((lock))
5542 +#define unlock_global_irqrestore(lock, flags)	raw_spin_unlock_irqrestore((lock), (flags))
5543 +#define unlock_global(lock)						raw_spin_unlock((lock))
5544 +
5545 +/* fine-grain locking are no-ops with DGL support */
5546 +#define lock_fine_irqsave(lock, flags)
5547 +#define lock_fine(lock)
5548 +#define unlock_fine_irqrestore(lock, flags)
5549 +#define unlock_fine(lock)
5550 +
5551 +#elif defined(CONFIG_LITMUS_NESTED_LOCKING)
5552 +/* Use fine-grain locking when DGLs are disabled. */
5553 +/* global locking are no-ops without DGL support */
5554 +#define lock_global_irqsave(lock, flags)
5555 +#define lock_global(lock)
5556 +#define unlock_global_irqrestore(lock, flags)
5557 +#define unlock_global(lock)
5558 +
5559 +#define lock_fine_irqsave(lock, flags)			raw_spin_lock_irqsave((lock), (flags))
5560 +#define lock_fine(lock)							raw_spin_lock((lock))
5561 +#define unlock_fine_irqrestore(lock, flags)		raw_spin_unlock_irqrestore((lock), (flags))
5562 +#define unlock_fine(lock)						raw_spin_unlock((lock))
5563 +
5564 +#endif
5565 +
5566 +
5567 +#endif
5568 +
5569 diff --git a/include/litmus/nvidia_info.h b/include/litmus/nvidia_info.h
5570 new file mode 100644
5571 index 0000000..97c9577
5572 --- /dev/null
5573 +++ b/include/litmus/nvidia_info.h
5574 @@ -0,0 +1,46 @@
5575 +#ifndef __LITMUS_NVIDIA_H
5576 +#define __LITMUS_NVIDIA_H
5577 +
5578 +#include <linux/interrupt.h>
5579 +
5580 +
5581 +#include <litmus/litmus_softirq.h>
5582 +
5583 +
5584 +//#define NV_DEVICE_NUM NR_LITMUS_SOFTIRQD
5585 +#define NV_DEVICE_NUM CONFIG_NV_DEVICE_NUM
5586 +#define NV_MAX_SIMULT_USERS CONFIG_NV_MAX_SIMULT_USERS
5587 +
5588 +int init_nvidia_info(void);
5589 +void shutdown_nvidia_info(void);
5590 +
5591 +int is_nvidia_func(void* func_addr);
5592 +
5593 +void dump_nvidia_info(const struct tasklet_struct *t);
5594 +
5595 +
5596 +// Returns the Nvidia device # associated with provided tasklet and work_struct.
5597 +u32 get_tasklet_nv_device_num(const struct tasklet_struct *t);
5598 +u32 get_work_nv_device_num(const struct work_struct *t);
5599 +
5600 +
5601 +int init_nv_device_reg(void);
5602 +//int get_nv_device_id(struct task_struct* owner);
5603 +
5604 +
5605 +int reg_nv_device(int reg_device_id, int register_device, struct task_struct *t);
5606 +
5607 +struct task_struct* get_nv_max_device_owner(u32 target_device_id);
5608 +//int is_nv_device_owner(u32 target_device_id);
5609 +
5610 +void lock_nv_registry(u32 reg_device_id, unsigned long* flags);
5611 +void unlock_nv_registry(u32 reg_device_id, unsigned long* flags);
5612 +
5613 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
5614 +void pai_check_priority_increase(struct task_struct *t, int reg_device_id);
5615 +void pai_check_priority_decrease(struct task_struct *t, int reg_device_id);
5616 +#endif
5617 +
5618 +//void increment_nv_int_count(u32 device);
5619 +
5620 +#endif
5621 diff --git a/include/litmus/preempt.h b/include/litmus/preempt.h
5622 new file mode 100644
5623 index 0000000..8f3a9ca
5624 --- /dev/null
5625 +++ b/include/litmus/preempt.h
5626 @@ -0,0 +1,164 @@
5627 +#ifndef LITMUS_PREEMPT_H
5628 +#define LITMUS_PREEMPT_H
5629 +
5630 +#include <linux/types.h>
5631 +#include <linux/cache.h>
5632 +#include <linux/percpu.h>
5633 +#include <asm/atomic.h>
5634 +
5635 +#include <litmus/debug_trace.h>
5636 +
5637 +extern DEFINE_PER_CPU_SHARED_ALIGNED(atomic_t, resched_state);
5638 +
5639 +#ifdef CONFIG_PREEMPT_STATE_TRACE
5640 +const char* sched_state_name(int s);
5641 +#define TRACE_STATE(fmt, args...) TRACE("SCHED_STATE " fmt, args)
5642 +#else
5643 +#define TRACE_STATE(fmt, args...) /* ignore */
5644 +#endif
5645 +
5646 +#define VERIFY_SCHED_STATE(x)						\
5647 +	do { int __s = get_sched_state();				\
5648 +		if ((__s & (x)) == 0)					\
5649 +			TRACE_STATE("INVALID s=0x%x (%s) not "		\
5650 +				    "in 0x%x (%s) [%s]\n",		\
5651 +				    __s, sched_state_name(__s),		\
5652 +				    (x), #x, __FUNCTION__);		\
5653 +	} while (0);
5654 +
5655 +//#define TRACE_SCHED_STATE_CHANGE(x, y, cpu) /* ignore */
5656 +#define TRACE_SCHED_STATE_CHANGE(x, y, cpu)				\
5657 +	TRACE_STATE("[P%d] 0x%x (%s) -> 0x%x (%s)\n",			\
5658 +		    cpu,  (x), sched_state_name(x),			\
5659 +		    (y), sched_state_name(y))
5660 +
5661 +typedef enum scheduling_state {
5662 +	TASK_SCHEDULED    = (1 << 0),  /* The currently scheduled task is the one that
5663 +					* should be scheduled, and the processor does not
5664 +					* plan to invoke schedule(). */
5665 +	SHOULD_SCHEDULE   = (1 << 1),  /* A remote processor has determined that the
5666 +					* processor should reschedule, but this has not
5667 +					* been communicated yet (IPI still pending). */
5668 +	WILL_SCHEDULE     = (1 << 2),  /* The processor has noticed that it has to
5669 +					* reschedule and will do so shortly. */
5670 +	TASK_PICKED       = (1 << 3),  /* The processor is currently executing schedule(),
5671 +					* has selected a new task to schedule, but has not
5672 +					* yet performed the actual context switch. */
5673 +	PICKED_WRONG_TASK = (1 << 4),  /* The processor has not yet performed the context
5674 +					* switch, but a remote processor has already
5675 +					* determined that a higher-priority task became
5676 +					* eligible after the task was picked. */
5677 +} sched_state_t;
5678 +
5679 +static inline sched_state_t get_sched_state_on(int cpu)
5680 +{
5681 +	return atomic_read(&per_cpu(resched_state, cpu));
5682 +}
5683 +
5684 +static inline sched_state_t get_sched_state(void)
5685 +{
5686 +	return atomic_read(&__get_cpu_var(resched_state));
5687 +}
5688 +
5689 +static inline int is_in_sched_state(int possible_states)
5690 +{
5691 +	return get_sched_state() & possible_states;
5692 +}
5693 +
5694 +static inline int cpu_is_in_sched_state(int cpu, int possible_states)
5695 +{
5696 +	return get_sched_state_on(cpu) & possible_states;
5697 +}
5698 +
5699 +static inline void set_sched_state(sched_state_t s)
5700 +{
5701 +	TRACE_SCHED_STATE_CHANGE(get_sched_state(), s, smp_processor_id());
5702 +	atomic_set(&__get_cpu_var(resched_state), s);
5703 +}
5704 +
5705 +static inline int sched_state_transition(sched_state_t from, sched_state_t to)
5706 +{
5707 +	sched_state_t old_state;
5708 +
5709 +	old_state = atomic_cmpxchg(&__get_cpu_var(resched_state), from, to);
5710 +	if (old_state == from) {
5711 +		TRACE_SCHED_STATE_CHANGE(from, to, smp_processor_id());
5712 +		return 1;
5713 +	} else
5714 +		return 0;
5715 +}
5716 +
5717 +static inline int sched_state_transition_on(int cpu,
5718 +					    sched_state_t from,
5719 +					    sched_state_t to)
5720 +{
5721 +	sched_state_t old_state;
5722 +
5723 +	old_state = atomic_cmpxchg(&per_cpu(resched_state, cpu), from, to);
5724 +	if (old_state == from) {
5725 +		TRACE_SCHED_STATE_CHANGE(from, to, cpu);
5726 +		return 1;
5727 +	} else
5728 +		return 0;
5729 +}
5730 +
5731 +/* Plugins must call this function after they have decided which job to
5732 + * schedule next.  IMPORTANT: this function must be called while still holding
5733 + * the lock that is used to serialize scheduling decisions.
5734 + *
5735 + * (Ideally, we would like to use runqueue locks for this purpose, but that
5736 + * would lead to deadlocks with the migration code.)
5737 + */
5738 +static inline void sched_state_task_picked(void)
5739 +{
5740 +	VERIFY_SCHED_STATE(WILL_SCHEDULE);
5741 +
5742 +	/* WILL_SCHEDULE has only a local tansition => simple store is ok */
5743 +	set_sched_state(TASK_PICKED);
5744 +}
5745 +
5746 +static inline void sched_state_entered_schedule(void)
5747 +{
5748 +	/* Update state for the case that we entered schedule() not due to
5749 +	 * set_tsk_need_resched() */
5750 +	set_sched_state(WILL_SCHEDULE);
5751 +}
5752 +
5753 +/* Called by schedule() to check if the scheduling decision is still valid
5754 + * after a context switch. Returns 1 if the CPU needs to reschdule. */
5755 +static inline int sched_state_validate_switch(void)
5756 +{
5757 +	int left_state_ok = 0;
5758 +
5759 +	VERIFY_SCHED_STATE(PICKED_WRONG_TASK | TASK_PICKED);
5760 +
5761 +	if (is_in_sched_state(TASK_PICKED)) {
5762 +		/* Might be good; let's try to transition out of this
5763 +		 * state. This must be done atomically since remote processors
5764 +		 * may try to change the state, too. */
5765 +		left_state_ok = sched_state_transition(TASK_PICKED, TASK_SCHEDULED);
5766 +	}
5767 +
5768 +	if (!left_state_ok) {
5769 +		/* We raced with a higher-priority task arrival => not
5770 +		 * valid. The CPU needs to reschedule. */
5771 +		set_sched_state(WILL_SCHEDULE);
5772 +		return 1;
5773 +	} else
5774 +		return 0;
5775 +}
5776 +
5777 +/* State transition events. See litmus/preempt.c for details. */
5778 +void sched_state_will_schedule(struct task_struct* tsk);
5779 +void sched_state_ipi(void);
5780 +/* Cause a CPU (remote or local) to reschedule. */
5781 +void litmus_reschedule(int cpu);
5782 +void litmus_reschedule_local(void);
5783 +
5784 +#ifdef CONFIG_DEBUG_KERNEL
5785 +void sched_state_plugin_check(void);
5786 +#else
5787 +#define sched_state_plugin_check() /* no check */
5788 +#endif
5789 +
5790 +#endif
5791 diff --git a/include/litmus/rsm_lock.h b/include/litmus/rsm_lock.h
5792 new file mode 100644
5793 index 0000000..a151896
5794 --- /dev/null
5795 +++ b/include/litmus/rsm_lock.h
5796 @@ -0,0 +1,54 @@
5797 +#ifndef LITMUS_RSM_H
5798 +#define LITMUS_RSM_H
5799 +
5800 +#include <litmus/litmus.h>
5801 +#include <litmus/binheap.h>
5802 +#include <litmus/locking.h>
5803 +
5804 +/* struct for semaphore with priority inheritance */
5805 +struct rsm_mutex {
5806 +	struct litmus_lock litmus_lock;
5807 +
5808 +	/* current resource holder */
5809 +	struct task_struct *owner;
5810 +
5811 +	/* highest-priority waiter */
5812 +	struct task_struct *hp_waiter;
5813 +
5814 +	/* FIFO queue of waiting tasks -- for now.  time stamp in the future. */
5815 +	wait_queue_head_t	wait;
5816 +
5817 +	/* we do some nesting within spinlocks, so we can't use the normal
5818 +	 sleeplocks found in wait_queue_head_t. */
5819 +	raw_spinlock_t		lock;
5820 +};
5821 +
5822 +static inline struct rsm_mutex* rsm_mutex_from_lock(struct litmus_lock* lock)
5823 +{
5824 +	return container_of(lock, struct rsm_mutex, litmus_lock);
5825 +}
5826 +
5827 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
5828 +int rsm_mutex_is_owner(struct litmus_lock *l, struct task_struct *t);
5829 +int rsm_mutex_dgl_lock(struct litmus_lock *l, dgl_wait_state_t* dgl_wait, wait_queue_t* wq_node);
5830 +void rsm_mutex_enable_priority(struct litmus_lock *l, dgl_wait_state_t* dgl_wait);
5831 +#endif
5832 +
5833 +void rsm_mutex_propagate_increase_inheritance(struct litmus_lock* l,
5834 +											  struct task_struct* t,
5835 +											  raw_spinlock_t* to_unlock,
5836 +											  unsigned long irqflags);
5837 +
5838 +void rsm_mutex_propagate_decrease_inheritance(struct litmus_lock* l,
5839 +											  struct task_struct* t,
5840 +											  raw_spinlock_t* to_unlock,
5841 +											  unsigned long irqflags);
5842 +
5843 +int rsm_mutex_lock(struct litmus_lock* l);
5844 +int rsm_mutex_unlock(struct litmus_lock* l);
5845 +int rsm_mutex_close(struct litmus_lock* l);
5846 +void rsm_mutex_free(struct litmus_lock* l);
5847 +struct litmus_lock* rsm_mutex_new(struct litmus_lock_ops*);
5848 +
5849 +
5850 +#endif
5851 \ No newline at end of file
5852 diff --git a/include/litmus/rt_domain.h b/include/litmus/rt_domain.h
5853 new file mode 100644
5854 index 0000000..ac24929
5855 --- /dev/null
5856 +++ b/include/litmus/rt_domain.h
5857 @@ -0,0 +1,182 @@
5858 +/* CLEANUP: Add comments and make it less messy.
5859 + *
5860 + */
5861 +
5862 +#ifndef __UNC_RT_DOMAIN_H__
5863 +#define __UNC_RT_DOMAIN_H__
5864 +
5865 +#include <litmus/bheap.h>
5866 +
5867 +#define RELEASE_QUEUE_SLOTS 127 /* prime */
5868 +
5869 +struct _rt_domain;
5870 +
5871 +typedef int (*check_resched_needed_t)(struct _rt_domain *rt);
5872 +typedef void (*release_jobs_t)(struct _rt_domain *rt, struct bheap* tasks);
5873 +
5874 +struct release_queue {
5875 +	/* each slot maintains a list of release heaps sorted
5876 +	 * by release time */
5877 +	struct list_head		slot[RELEASE_QUEUE_SLOTS];
5878 +};
5879 +
5880 +typedef struct _rt_domain {
5881 +	/* runnable rt tasks are in here */
5882 +	raw_spinlock_t 			ready_lock;
5883 +	struct bheap	 		ready_queue;
5884 +
5885 +	/* real-time tasks waiting for release are in here */
5886 +	raw_spinlock_t 			release_lock;
5887 +	struct release_queue 		release_queue;
5888 +
5889 +#ifdef CONFIG_RELEASE_MASTER
5890 +	int				release_master;
5891 +#endif
5892 +
5893 +	/* for moving tasks to the release queue */
5894 +	raw_spinlock_t			tobe_lock;
5895 +	struct list_head		tobe_released;
5896 +
5897 +	/* how do we check if we need to kick another CPU? */
5898 +	check_resched_needed_t		check_resched;
5899 +
5900 +	/* how do we release jobs? */
5901 +	release_jobs_t			release_jobs;
5902 +
5903 +	/* how are tasks ordered in the ready queue? */
5904 +	bheap_prio_t			order;
5905 +} rt_domain_t;
5906 +
5907 +struct release_heap {
5908 +	/* list_head for per-time-slot list */
5909 +	struct list_head		list;
5910 +	lt_t				release_time;
5911 +	/* all tasks to be released at release_time */
5912 +	struct bheap			heap;
5913 +	/* used to trigger the release */
5914 +	struct hrtimer			timer;
5915 +
5916 +#ifdef CONFIG_RELEASE_MASTER
5917 +	/* used to delegate releases */
5918 +	struct hrtimer_start_on_info	info;
5919 +#endif
5920 +	/* required for the timer callback */
5921 +	rt_domain_t*			dom;
5922 +};
5923 +
5924 +
5925 +static inline struct task_struct* __next_ready(rt_domain_t* rt)
5926 +{
5927 +	struct bheap_node *hn = bheap_peek(rt->order, &rt->ready_queue);
5928 +	if (hn)
5929 +		return bheap2task(hn);
5930 +	else
5931 +		return NULL;
5932 +}
5933 +
5934 +void rt_domain_init(rt_domain_t *rt, bheap_prio_t order,
5935 +		    check_resched_needed_t check,
5936 +		    release_jobs_t relase);
5937 +
5938 +void __add_ready(rt_domain_t* rt, struct task_struct *new);
5939 +void __merge_ready(rt_domain_t* rt, struct bheap *tasks);
5940 +void __add_release(rt_domain_t* rt, struct task_struct *task);
5941 +
5942 +static inline struct task_struct* __take_ready(rt_domain_t* rt)
5943 +{
5944 +	struct bheap_node* hn = bheap_take(rt->order, &rt->ready_queue);
5945 +	if (hn)
5946 +		return bheap2task(hn);
5947 +	else
5948 +		return NULL;
5949 +}
5950 +
5951 +static inline struct task_struct* __peek_ready(rt_domain_t* rt)
5952 +{
5953 +	struct bheap_node* hn = bheap_peek(rt->order, &rt->ready_queue);
5954 +	if (hn)
5955 +		return bheap2task(hn);
5956 +	else
5957 +		return NULL;
5958 +}
5959 +
5960 +static inline int  is_queued(struct task_struct *t)
5961 +{
5962 +	BUG_ON(!tsk_rt(t)->heap_node);
5963 +	return bheap_node_in_heap(tsk_rt(t)->heap_node);
5964 +}
5965 +
5966 +static inline void remove(rt_domain_t* rt, struct task_struct *t)
5967 +{
5968 +	bheap_delete(rt->order, &rt->ready_queue, tsk_rt(t)->heap_node);
5969 +}
5970 +
5971 +static inline void add_ready(rt_domain_t* rt, struct task_struct *new)
5972 +{
5973 +	unsigned long flags;
5974 +	/* first we need the write lock for rt_ready_queue */
5975 +	raw_spin_lock_irqsave(&rt->ready_lock, flags);
5976 +	__add_ready(rt, new);
5977 +	raw_spin_unlock_irqrestore(&rt->ready_lock, flags);
5978 +}
5979 +
5980 +static inline void merge_ready(rt_domain_t* rt, struct bheap* tasks)
5981 +{
5982 +	unsigned long flags;
5983 +	raw_spin_lock_irqsave(&rt->ready_lock, flags);
5984 +	__merge_ready(rt, tasks);
5985 +	raw_spin_unlock_irqrestore(&rt->ready_lock, flags);
5986 +}
5987 +
5988 +static inline struct task_struct* take_ready(rt_domain_t* rt)
5989 +{
5990 +	unsigned long flags;
5991 +	struct task_struct* ret;
5992 +	/* first we need the write lock for rt_ready_queue */
5993 +	raw_spin_lock_irqsave(&rt->ready_lock, flags);
5994 +	ret = __take_ready(rt);
5995 +	raw_spin_unlock_irqrestore(&rt->ready_lock, flags);
5996 +	return ret;
5997 +}
5998 +
5999 +
6000 +static inline void add_release(rt_domain_t* rt, struct task_struct *task)
6001 +{
6002 +	unsigned long flags;
6003 +	raw_spin_lock_irqsave(&rt->tobe_lock, flags);
6004 +	__add_release(rt, task);
6005 +	raw_spin_unlock_irqrestore(&rt->tobe_lock, flags);
6006 +}
6007 +
6008 +#ifdef CONFIG_RELEASE_MASTER
6009 +void __add_release_on(rt_domain_t* rt, struct task_struct *task,
6010 +		      int target_cpu);
6011 +
6012 +static inline void add_release_on(rt_domain_t* rt,
6013 +				  struct task_struct *task,
6014 +				  int target_cpu)
6015 +{
6016 +	unsigned long flags;
6017 +	raw_spin_lock_irqsave(&rt->tobe_lock, flags);
6018 +	__add_release_on(rt, task, target_cpu);
6019 +	raw_spin_unlock_irqrestore(&rt->tobe_lock, flags);
6020 +}
6021 +#endif
6022 +
6023 +static inline int __jobs_pending(rt_domain_t* rt)
6024 +{
6025 +	return !bheap_empty(&rt->ready_queue);
6026 +}
6027 +
6028 +static inline int jobs_pending(rt_domain_t* rt)
6029 +{
6030 +	unsigned long flags;
6031 +	int ret;
6032 +	/* first we need the write lock for rt_ready_queue */
6033 +	raw_spin_lock_irqsave(&rt->ready_lock, flags);
6034 +	ret = !bheap_empty(&rt->ready_queue);
6035 +	raw_spin_unlock_irqrestore(&rt->ready_lock, flags);
6036 +	return ret;
6037 +}
6038 +
6039 +#endif
6040 diff --git a/include/litmus/rt_param.h b/include/litmus/rt_param.h
6041 new file mode 100644
6042 index 0000000..0198884
6043 --- /dev/null
6044 +++ b/include/litmus/rt_param.h
6045 @@ -0,0 +1,307 @@
6046 +/*
6047 + * Definition of the scheduler plugin interface.
6048 + *
6049 + */
6050 +#ifndef _LINUX_RT_PARAM_H_
6051 +#define _LINUX_RT_PARAM_H_
6052 +
6053 +#include <litmus/fpmath.h>
6054 +
6055 +/* Litmus time type. */
6056 +typedef unsigned long long lt_t;
6057 +
6058 +static inline int lt_after(lt_t a, lt_t b)
6059 +{
6060 +	return ((long long) b) - ((long long) a) < 0;
6061 +}
6062 +#define lt_before(a, b) lt_after(b, a)
6063 +
6064 +static inline int lt_after_eq(lt_t a, lt_t b)
6065 +{
6066 +	return ((long long) a) - ((long long) b) >= 0;
6067 +}
6068 +#define lt_before_eq(a, b) lt_after_eq(b, a)
6069 +
6070 +/* different types of clients */
6071 +typedef enum {
6072 +	RT_CLASS_HARD,
6073 +	RT_CLASS_SOFT,
6074 +	RT_CLASS_SOFT_W_SLIP,
6075 +	RT_CLASS_BEST_EFFORT
6076 +} task_class_t;
6077 +
6078 +typedef enum {
6079 +	NO_ENFORCEMENT,      /* job may overrun unhindered */
6080 +	QUANTUM_ENFORCEMENT, /* budgets are only checked on quantum boundaries */
6081 +	PRECISE_ENFORCEMENT  /* budgets are enforced with hrtimers */
6082 +} budget_policy_t;
6083 +
6084 +struct rt_task {
6085 +	lt_t 		exec_cost;
6086 +	lt_t 		period;
6087 +	lt_t		phase;
6088 +	unsigned int	cpu;
6089 +	task_class_t	cls;
6090 +	budget_policy_t budget_policy; /* ignored by pfair */
6091 +};
6092 +
6093 +union np_flag {
6094 +	uint32_t raw;
6095 +	struct {
6096 +		/* Is the task currently in a non-preemptive section? */
6097 +		uint32_t flag:31;
6098 +		/* Should the task call into the scheduler? */
6099 +		uint32_t preempt:1;
6100 +	} np;
6101 +};
6102 +
6103 +struct affinity_observer_args
6104 +{
6105 +	int lock_od;
6106 +};
6107 +
6108 +struct gpu_affinity_observer_args
6109 +{
6110 +	struct affinity_observer_args obs;
6111 +	int replica_to_gpu_offset;
6112 +	int nr_simult_users;
6113 +	int relaxed_rules;
6114 +};
6115 +
6116 +/* The definition of the data that is shared between the kernel and real-time
6117 + * tasks via a shared page (see litmus/ctrldev.c).
6118 + *
6119 + * WARNING: User space can write to this, so don't trust
6120 + * the correctness of the fields!
6121 + *
6122 + * This servees two purposes: to enable efficient signaling
6123 + * of non-preemptive sections (user->kernel) and
6124 + * delayed preemptions (kernel->user), and to export
6125 + * some real-time relevant statistics such as preemption and
6126 + * migration data to user space. We can't use a device to export
6127 + * statistics because we want to avoid system call overhead when
6128 + * determining preemption/migration overheads).
6129 + */
6130 +struct control_page {
6131 +	volatile union np_flag sched;
6132 +
6133 +	/* to be extended */
6134 +};
6135 +
6136 +/* don't export internal data structures to user space (liblitmus) */
6137 +#ifdef __KERNEL__
6138 +
6139 +#include <litmus/binheap.h>
6140 +#include <linux/semaphore.h>
6141 +
6142 +struct _rt_domain;
6143 +struct bheap_node;
6144 +struct release_heap;
6145 +
6146 +struct rt_job {
6147 +	/* Time instant the the job was or will be released.  */
6148 +	lt_t	release;
6149 +	/* What is the current deadline? */
6150 +	lt_t   	deadline;
6151 +
6152 +	/* How much service has this job received so far? */
6153 +	lt_t	exec_time;
6154 +
6155 +	/* Which job is this. This is used to let user space
6156 +	 * specify which job to wait for, which is important if jobs
6157 +	 * overrun. If we just call sys_sleep_next_period() then we
6158 +	 * will unintentionally miss jobs after an overrun.
6159 +	 *
6160 +	 * Increase this sequence number when a job is released.
6161 +	 */
6162 +	unsigned int    job_no;
6163 +};
6164 +
6165 +struct pfair_param;
6166 +
6167 +enum klitirqd_sem_status
6168 +{
6169 +	NEED_TO_REACQUIRE,
6170 +	REACQUIRING,
6171 +	NOT_HELD,
6172 +	HELD
6173 +};
6174 +
6175 +typedef enum gpu_migration_dist
6176 +{
6177 +	// TODO: Make this variable against NR_NVIDIA_GPUS
6178 +	MIG_LOCAL = 0,
6179 +	MIG_NEAR = 1,
6180 +	MIG_MED = 2,
6181 +	MIG_FAR = 3,	// 8 GPUs in a binary tree hierarchy
6182 +	MIG_NONE = 4,
6183 +
6184 +	MIG_LAST = MIG_NONE
6185 +} gpu_migration_dist_t;
6186 +
6187 +typedef struct feedback_est{
6188 +	fp_t est;
6189 +	fp_t accum_err;
6190 +} feedback_est_t;
6191 +
6192 +/*	RT task parameters for scheduling extensions
6193 + *	These parameters are inherited during clone and therefore must
6194 + *	be explicitly set up before the task set is launched.
6195 + */
6196 +struct rt_param {
6197 +	/* is the task sleeping? */
6198 +	unsigned int 		flags:8;
6199 +
6200 +	/* do we need to check for srp blocking? */
6201 +	unsigned int		srp_non_recurse:1;
6202 +
6203 +	/* is the task present? (true if it can be scheduled) */
6204 +	unsigned int		present:1;
6205 +
6206 +#ifdef CONFIG_LITMUS_SOFTIRQD
6207 +    /* proxy threads have minimum priority by default */
6208 +    unsigned int        is_proxy_thread:1;
6209 +
6210 +	/* pointer to klitirqd currently working on this
6211 +	   task_struct's behalf.  only set by the task pointed
6212 +	   to by klitirqd.
6213 +
6214 +	   ptr only valid if is_proxy_thread == 0
6215 +	 */
6216 +	struct task_struct* cur_klitirqd;
6217 +
6218 +	/* Used to implement mutual execution exclusion between
6219 +	 * job and klitirqd execution.  Job must always hold
6220 +	 * it's klitirqd_sem to execute.  klitirqd instance
6221 +	 * must hold the semaphore before executing on behalf
6222 +	 * of a job.
6223 +	 */
6224 +	struct mutex				klitirqd_sem;
6225 +
6226 +	/* status of held klitirqd_sem, even if the held klitirqd_sem is from
6227 +	   another task (only proxy threads do this though).
6228 +	 */
6229 +	atomic_t					klitirqd_sem_stat;
6230 +#endif
6231 +
6232 +#ifdef CONFIG_LITMUS_NVIDIA
6233 +	/* number of top-half interrupts handled on behalf of current job */
6234 +	atomic_t					nv_int_count;
6235 +	long unsigned int			held_gpus;  // bitmap of held GPUs.
6236 +
6237 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
6238 +	fp_t	gpu_fb_param_a[MIG_LAST+1];
6239 +	fp_t	gpu_fb_param_b[MIG_LAST+1];
6240 +
6241 +	gpu_migration_dist_t	gpu_migration;
6242 +	int				last_gpu;
6243 +	feedback_est_t	gpu_migration_est[MIG_LAST+1]; // local, near, med, far
6244 +
6245 +	lt_t accum_gpu_time;
6246 +	lt_t gpu_time_stamp;
6247 +
6248 +	unsigned int suspend_gpu_tracker_on_block:1;
6249 +#endif
6250 +#endif
6251 +
6252 +#ifdef CONFIG_LITMUS_LOCKING
6253 +	/* Is the task being priority-boosted by a locking protocol? */
6254 +	unsigned int		priority_boosted:1;
6255 +	/* If so, when did this start? */
6256 +	lt_t			boost_start_time;
6257 +#endif
6258 +
6259 +	/* user controlled parameters */
6260 +	struct rt_task 		task_params;
6261 +
6262 +	/* timing parameters */
6263 +	struct rt_job 		job_params;
6264 +
6265 +	/* task representing the current "inherited" task
6266 +	 * priority, assigned by inherit_priority and
6267 +	 * return priority in the scheduler plugins.
6268 +	 * could point to self if PI does not result in
6269 +	 * an increased task priority.
6270 +	 */
6271 +	struct task_struct*	inh_task;
6272 +
6273 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
6274 +	raw_spinlock_t			hp_blocked_tasks_lock;
6275 +	struct binheap_handle	hp_blocked_tasks;
6276 +
6277 +	/* pointer to lock upon which is currently blocked */
6278 +	struct litmus_lock* blocked_lock;
6279 +#endif
6280 +
6281 +#ifdef CONFIG_NP_SECTION
6282 +	/* For the FMLP under PSN-EDF, it is required to make the task
6283 +	 * non-preemptive from kernel space. In order not to interfere with
6284 +	 * user space, this counter indicates the kernel space np setting.
6285 +	 * kernel_np > 0 => task is non-preemptive
6286 +	 */
6287 +	unsigned int	kernel_np;
6288 +#endif
6289 +
6290 +	/* This field can be used by plugins to store where the task
6291 +	 * is currently scheduled. It is the responsibility of the
6292 +	 * plugin to avoid race conditions.
6293 +	 *
6294 +	 * This used by GSN-EDF and PFAIR.
6295 +	 */
6296 +	volatile int		scheduled_on;
6297 +
6298 +	/* Is the stack of the task currently in use? This is updated by
6299 +	 * the LITMUS core.
6300 +	 *
6301 +	 * Be careful to avoid deadlocks!
6302 +	 */
6303 +	volatile int		stack_in_use;
6304 +
6305 +	/* This field can be used by plugins to store where the task
6306 +	 * is currently linked. It is the responsibility of the plugin
6307 +	 * to avoid race conditions.
6308 +	 *
6309 +	 * Used by GSN-EDF.
6310 +	 */
6311 +	volatile int		linked_on;
6312 +
6313 +	/* PFAIR/PD^2 state. Allocated on demand. */
6314 +	struct pfair_param*	pfair;
6315 +
6316 +	/* Fields saved before BE->RT transition.
6317 +	 */
6318 +	int old_policy;
6319 +	int old_prio;
6320 +
6321 +	/* ready queue for this task */
6322 +	struct _rt_domain* domain;
6323 +
6324 +	/* heap element for this task
6325 +	 *
6326 +	 * Warning: Don't statically allocate this node. The heap
6327 +	 *          implementation swaps these between tasks, thus after
6328 +	 *          dequeuing from a heap you may end up with a different node
6329 +	 *          then the one you had when enqueuing the task.  For the same
6330 +	 *          reason, don't obtain and store references to this node
6331 +	 *          other than this pointer (which is updated by the heap
6332 +	 *          implementation).
6333 +	 */
6334 +	struct bheap_node*	heap_node;
6335 +	struct release_heap*	rel_heap;
6336 +
6337 +	/* Used by rt_domain to queue task in release list.
6338 +	 */
6339 +	struct list_head list;
6340 +
6341 +	/* Pointer to the page shared between userspace and kernel. */
6342 +	struct control_page * ctrl_page;
6343 +};
6344 +
6345 +/*	Possible RT flags	*/
6346 +#define RT_F_RUNNING		0x00000000
6347 +#define RT_F_SLEEP		0x00000001
6348 +#define RT_F_EXIT_SEM		0x00000008
6349 +
6350 +#endif
6351 +
6352 +#endif
6353 diff --git a/include/litmus/sched_plugin.h b/include/litmus/sched_plugin.h
6354 new file mode 100644
6355 index 0000000..24a6858
6356 --- /dev/null
6357 +++ b/include/litmus/sched_plugin.h
6358 @@ -0,0 +1,183 @@
6359 +/*
6360 + * Definition of the scheduler plugin interface.
6361 + *
6362 + */
6363 +#ifndef _LINUX_SCHED_PLUGIN_H_
6364 +#define _LINUX_SCHED_PLUGIN_H_
6365 +
6366 +#include <linux/sched.h>
6367 +
6368 +#ifdef CONFIG_LITMUS_LOCKING
6369 +#include <litmus/locking.h>
6370 +#endif
6371 +
6372 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
6373 +#include <litmus/kexclu_affinity.h>
6374 +#endif
6375 +
6376 +#include <linux/interrupt.h>
6377 +
6378 +/************************ setup/tear down ********************/
6379 +
6380 +typedef long (*activate_plugin_t) (void);
6381 +typedef long (*deactivate_plugin_t) (void);
6382 +
6383 +
6384 +
6385 +/********************* scheduler invocation ******************/
6386 +
6387 +/*  Plugin-specific realtime tick handler */
6388 +typedef void (*scheduler_tick_t) (struct task_struct *cur);
6389 +/* Novell make sched decision function */
6390 +typedef struct task_struct* (*schedule_t)(struct task_struct * prev);
6391 +/* Clean up after the task switch has occured.
6392 + * This function is called after every (even non-rt) task switch.
6393 + */
6394 +typedef void (*finish_switch_t)(struct task_struct *prev);
6395 +
6396 +/********************* task state changes ********************/
6397 +
6398 +/* Called to setup a new real-time task.
6399 + * Release the first job, enqueue, etc.
6400 + * Task may already be running.
6401 + */
6402 +typedef void (*task_new_t) (struct task_struct *task,
6403 +			    int on_rq,
6404 +			    int running);
6405 +
6406 +/* Called to re-introduce a task after blocking.
6407 + * Can potentially be called multiple times.
6408 + */
6409 +typedef void (*task_wake_up_t) (struct task_struct *task);
6410 +/* called to notify the plugin of a blocking real-time task
6411 + * it will only be called for real-time tasks and before schedule is called */
6412 +typedef void (*task_block_t)  (struct task_struct *task);
6413 +/* Called when a real-time task exits or changes to a different scheduling
6414 + * class.
6415 + * Free any allocated resources
6416 + */
6417 +typedef void (*task_exit_t)    (struct task_struct *);
6418 +
6419 +/* Called when the current task attempts to create a new lock of a given
6420 + * protocol type. */
6421 +typedef long (*allocate_lock_t) (struct litmus_lock **lock, int type,
6422 +				 void* __user config);
6423 +
6424 +struct affinity_observer;
6425 +typedef long (*allocate_affinity_observer_t) (
6426 +								struct affinity_observer **aff_obs, int type,
6427 +								void* __user config);
6428 +
6429 +typedef void (*increase_prio_t)(struct task_struct* t, struct task_struct* prio_inh);
6430 +typedef void (*decrease_prio_t)(struct task_struct* t, struct task_struct* prio_inh);
6431 +typedef void (*nested_increase_prio_t)(struct task_struct* t, struct task_struct* prio_inh,
6432 +									  raw_spinlock_t *to_unlock, unsigned long irqflags);
6433 +typedef void (*nested_decrease_prio_t)(struct task_struct* t, struct task_struct* prio_inh,
6434 +									  raw_spinlock_t *to_unlock, unsigned long irqflags);
6435 +
6436 +typedef void (*increase_prio_klitirq_t)(struct task_struct* klitirqd,
6437 +                                        struct task_struct* old_owner,
6438 +                                        struct task_struct* new_owner);
6439 +typedef void (*decrease_prio_klitirqd_t)(struct task_struct* klitirqd,
6440 +                                         struct task_struct* old_owner);
6441 +
6442 +
6443 +typedef int (*enqueue_pai_tasklet_t)(struct tasklet_struct* tasklet);
6444 +typedef void (*change_prio_pai_tasklet_t)(struct task_struct *old_prio,
6445 +										  struct task_struct *new_prio);
6446 +typedef void (*run_tasklets_t)(struct task_struct* next);
6447 +
6448 +typedef raw_spinlock_t* (*get_dgl_spinlock_t) (struct task_struct *t);
6449 +
6450 +
6451 +typedef int (*higher_prio_t)(struct task_struct* a, struct task_struct* b);
6452 +
6453 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
6454 +
6455 +typedef enum
6456 +{
6457 +	BASE,
6458 +	EFFECTIVE
6459 +} comparison_mode_t;
6460 +
6461 +typedef int (*__higher_prio_t)(struct task_struct* a, comparison_mode_t a_mod,
6462 +							  struct task_struct* b, comparison_mode_t b_mod);
6463 +#endif
6464 +
6465 +
6466 +/********************* sys call backends  ********************/
6467 +/* This function causes the caller to sleep until the next release */
6468 +typedef long (*complete_job_t) (void);
6469 +
6470 +typedef long (*admit_task_t)(struct task_struct* tsk);
6471 +
6472 +typedef void (*release_at_t)(struct task_struct *t, lt_t start);
6473 +
6474 +struct sched_plugin {
6475 +	struct list_head	list;
6476 +	/* 	basic info 		*/
6477 +	char 			*plugin_name;
6478 +
6479 +	/*	setup			*/
6480 +	activate_plugin_t	activate_plugin;
6481 +	deactivate_plugin_t	deactivate_plugin;
6482 +
6483 +	/* 	scheduler invocation 	*/
6484 +	scheduler_tick_t        tick;
6485 +	schedule_t 		schedule;
6486 +	finish_switch_t 	finish_switch;
6487 +
6488 +	/*	syscall backend 	*/
6489 +	complete_job_t 		complete_job;
6490 +	release_at_t		release_at;
6491 +
6492 +	/*	task state changes 	*/
6493 +	admit_task_t		admit_task;
6494 +
6495 +    task_new_t			task_new;
6496 +	task_wake_up_t		task_wake_up;
6497 +	task_block_t		task_block;
6498 +	task_exit_t 		task_exit;
6499 +
6500 +	higher_prio_t		compare;
6501 +
6502 +#ifdef CONFIG_LITMUS_LOCKING
6503 +	/*	locking protocols	*/
6504 +	allocate_lock_t		allocate_lock;
6505 +	increase_prio_t		increase_prio;
6506 +	decrease_prio_t		decrease_prio;
6507 +#endif
6508 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
6509 +	nested_increase_prio_t nested_increase_prio;
6510 +	nested_decrease_prio_t nested_decrease_prio;
6511 +	__higher_prio_t		__compare;
6512 +#endif
6513 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
6514 +	get_dgl_spinlock_t	get_dgl_spinlock;
6515 +#endif
6516 +
6517 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
6518 +	allocate_affinity_observer_t allocate_aff_obs;
6519 +#endif
6520 +
6521 +#ifdef CONFIG_LITMUS_SOFTIRQD
6522 +    increase_prio_klitirq_t		increase_prio_klitirqd;
6523 +    decrease_prio_klitirqd_t	decrease_prio_klitirqd;
6524 +#endif
6525 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
6526 +	enqueue_pai_tasklet_t		enqueue_pai_tasklet;
6527 +	change_prio_pai_tasklet_t	change_prio_pai_tasklet;
6528 +	run_tasklets_t				run_tasklets;
6529 +#endif
6530 +} __attribute__ ((__aligned__(SMP_CACHE_BYTES)));
6531 +
6532 +
6533 +extern struct sched_plugin *litmus;
6534 +
6535 +int register_sched_plugin(struct sched_plugin* plugin);
6536 +struct sched_plugin* find_sched_plugin(const char* name);
6537 +int print_sched_plugins(char* buf, int max);
6538 +
6539 +extern struct sched_plugin linux_sched_plugin;
6540 +
6541 +#endif
6542 diff --git a/include/litmus/sched_trace.h b/include/litmus/sched_trace.h
6543 new file mode 100644
6544 index 0000000..b1b71f6
6545 --- /dev/null
6546 +++ b/include/litmus/sched_trace.h
6547 @@ -0,0 +1,380 @@
6548 +/*
6549 + * sched_trace.h -- record scheduler events to a byte stream for offline analysis.
6550 + */
6551 +#ifndef _LINUX_SCHED_TRACE_H_
6552 +#define _LINUX_SCHED_TRACE_H_
6553 +
6554 +/* all times in nanoseconds */
6555 +
6556 +struct st_trace_header {
6557 +	u8	type;		/* Of what type is this record?  */
6558 +	u8	cpu;		/* On which CPU was it recorded? */
6559 +	u16	pid;		/* PID of the task.              */
6560 +	u32 job:24;		/* The job sequence number.      */
6561 +	u8  extra;
6562 +} __attribute__((packed));
6563 +
6564 +#define ST_NAME_LEN 16
6565 +struct st_name_data {
6566 +	char	cmd[ST_NAME_LEN];/* The name of the executable of this process. */
6567 +} __attribute__((packed));
6568 +
6569 +struct st_param_data {		/* regular params */
6570 +	u32	wcet;
6571 +	u32	period;
6572 +	u32	phase;
6573 +	u8	partition;
6574 +	u8	class;
6575 +	u8	__unused[2];
6576 +} __attribute__((packed));
6577 +
6578 +struct st_release_data {	/* A job is was/is going to be released. */
6579 +	u64	release;	/* What's the release time?              */
6580 +	u64	deadline;	/* By when must it finish?		 */
6581 +} __attribute__((packed));
6582 +
6583 +struct st_assigned_data {	/* A job was asigned to a CPU. 		 */
6584 +	u64	when;
6585 +	u8	target;		/* Where should it execute?	         */
6586 +	u8	__unused[7];
6587 +} __attribute__((packed));
6588 +
6589 +struct st_switch_to_data {	/* A process was switched to on a given CPU.   */
6590 +	u64	when;		/* When did this occur?                        */
6591 +	u32	exec_time;	/* Time the current job has executed.          */
6592 +	u8	__unused[4];
6593 +} __attribute__((packed));
6594 +
6595 +struct st_switch_away_data {	/* A process was switched away from on a given CPU. */
6596 +	u64	when;
6597 +	u64	exec_time;
6598 +} __attribute__((packed));
6599 +
6600 +struct st_completion_data {	/* A job completed. */
6601 +	u64	when;
6602 +	u8	forced:1; 	/* Set to 1 if job overran and kernel advanced to the
6603 +				 * next task automatically; set to 0 otherwise.
6604 +				 */
6605 +	u8	__uflags:7;
6606 +	u16 nv_int_count;
6607 +	u8	__unused[5];
6608 +} __attribute__((packed));
6609 +
6610 +struct st_block_data {		/* A task blocks. */
6611 +	u64	when;
6612 +	u64	__unused;
6613 +} __attribute__((packed));
6614 +
6615 +struct st_resume_data {		/* A task resumes. */
6616 +	u64	when;
6617 +	u64	__unused;
6618 +} __attribute__((packed));
6619 +
6620 +struct st_action_data {
6621 +	u64	when;
6622 +	u8	action;
6623 +	u8	__unused[7];
6624 +} __attribute__((packed));
6625 +
6626 +struct st_sys_release_data {
6627 +	u64	when;
6628 +	u64	release;
6629 +} __attribute__((packed));
6630 +
6631 +
6632 +struct st_tasklet_release_data {
6633 +	u64 when;
6634 +	u64 __unused;
6635 +} __attribute__((packed));
6636 +
6637 +struct st_tasklet_begin_data {
6638 +	u64 when;
6639 +	u16 exe_pid;
6640 +	u8  __unused[6];
6641 +} __attribute__((packed));
6642 +
6643 +struct st_tasklet_end_data {
6644 +	u64 when;
6645 +	u16 exe_pid;
6646 +	u8	flushed;
6647 +	u8	__unused[5];
6648 +} __attribute__((packed));
6649 +
6650 +
6651 +struct st_work_release_data {
6652 +	u64 when;
6653 +	u64 __unused;
6654 +} __attribute__((packed));
6655 +
6656 +struct st_work_begin_data {
6657 +	u64 when;
6658 +	u16 exe_pid;
6659 +	u8	__unused[6];
6660 +} __attribute__((packed));
6661 +
6662 +struct st_work_end_data {
6663 +	u64 when;
6664 +	u16 exe_pid;
6665 +	u8	flushed;
6666 +	u8	__unused[5];
6667 +} __attribute__((packed));
6668 +
6669 +struct st_effective_priority_change_data {
6670 +	u64 when;
6671 +	u16 inh_pid;
6672 +	u8	__unused[6];
6673 +} __attribute__((packed));
6674 +
6675 +struct st_nv_interrupt_begin_data {
6676 +	u64 when;
6677 +	u32 device;
6678 +	u32 serialNumber;
6679 +} __attribute__((packed));
6680 +
6681 +struct st_nv_interrupt_end_data {
6682 +	u64 when;
6683 +	u32 device;
6684 +	u32 serialNumber;
6685 +} __attribute__((packed));
6686 +
6687 +struct st_prediction_err_data {
6688 +	u64 distance;
6689 +	u64 rel_err;
6690 +} __attribute__((packed));
6691 +
6692 +struct st_migration_data {
6693 +	u64 observed;
6694 +	u64 estimated;
6695 +} __attribute__((packed));
6696 +
6697 +struct migration_info {
6698 +	u64 observed;
6699 +	u64 estimated;
6700 +	u8 distance;
6701 +} __attribute__((packed));
6702 +
6703 +#define DATA(x) struct st_ ## x ## _data x;
6704 +
6705 +typedef enum {
6706 +    ST_NAME = 1, /* Start at one, so that we can spot
6707 +				  * uninitialized records. */
6708 +	ST_PARAM,
6709 +	ST_RELEASE,
6710 +	ST_ASSIGNED,
6711 +	ST_SWITCH_TO,
6712 +	ST_SWITCH_AWAY,
6713 +	ST_COMPLETION,
6714 +	ST_BLOCK,
6715 +	ST_RESUME,
6716 +	ST_ACTION,
6717 +	ST_SYS_RELEASE,
6718 +	ST_TASKLET_RELEASE,
6719 +	ST_TASKLET_BEGIN,
6720 +	ST_TASKLET_END,
6721 +	ST_WORK_RELEASE,
6722 +	ST_WORK_BEGIN,
6723 +	ST_WORK_END,
6724 +	ST_EFF_PRIO_CHANGE,
6725 +	ST_NV_INTERRUPT_BEGIN,
6726 +	ST_NV_INTERRUPT_END,
6727 +
6728 +	ST_PREDICTION_ERR,
6729 +	ST_MIGRATION,
6730 +} st_event_record_type_t;
6731 +
6732 +struct st_event_record {
6733 +	struct st_trace_header hdr;
6734 +	union {
6735 +		u64 raw[2];
6736 +
6737 +		DATA(name);
6738 +		DATA(param);
6739 +		DATA(release);
6740 +		DATA(assigned);
6741 +		DATA(switch_to);
6742 +		DATA(switch_away);
6743 +		DATA(completion);
6744 +		DATA(block);
6745 +		DATA(resume);
6746 +		DATA(action);
6747 +		DATA(sys_release);
6748 +		DATA(tasklet_release);
6749 +		DATA(tasklet_begin);
6750 +		DATA(tasklet_end);
6751 +		DATA(work_release);
6752 +		DATA(work_begin);
6753 +		DATA(work_end);
6754 +		DATA(effective_priority_change);
6755 +		DATA(nv_interrupt_begin);
6756 +		DATA(nv_interrupt_end);
6757 +
6758 +		DATA(prediction_err);
6759 +		DATA(migration);
6760 +	} data;
6761 +} __attribute__((packed));
6762 +
6763 +#undef DATA
6764 +
6765 +#ifdef __KERNEL__
6766 +
6767 +#include <linux/sched.h>
6768 +#include <litmus/feather_trace.h>
6769 +
6770 +#ifdef CONFIG_SCHED_TASK_TRACE
6771 +
6772 +#define SCHED_TRACE(id, callback, task) \
6773 +	ft_event1(id, callback, task)
6774 +#define SCHED_TRACE2(id, callback, task, xtra) \
6775 +	ft_event2(id, callback, task, xtra)
6776 +#define SCHED_TRACE3(id, callback, task, xtra1, xtra2) \
6777 +	ft_event3(id, callback, task, xtra1, xtra2)
6778 +
6779 +/* provide prototypes; needed on sparc64 */
6780 +#ifndef NO_TASK_TRACE_DECLS
6781 +feather_callback void do_sched_trace_task_name(unsigned long id,
6782 +					       struct task_struct* task);
6783 +feather_callback void do_sched_trace_task_param(unsigned long id,
6784 +						struct task_struct* task);
6785 +feather_callback void do_sched_trace_task_release(unsigned long id,
6786 +						  struct task_struct* task);
6787 +feather_callback void do_sched_trace_task_switch_to(unsigned long id,
6788 +						    struct task_struct* task);
6789 +feather_callback void do_sched_trace_task_switch_away(unsigned long id,
6790 +						      struct task_struct* task);
6791 +feather_callback void do_sched_trace_task_completion(unsigned long id,
6792 +						     struct task_struct* task,
6793 +						     unsigned long forced);
6794 +feather_callback void do_sched_trace_task_block(unsigned long id,
6795 +						struct task_struct* task);
6796 +feather_callback void do_sched_trace_task_resume(unsigned long id,
6797 +						 struct task_struct* task);
6798 +feather_callback void do_sched_trace_action(unsigned long id,
6799 +					    struct task_struct* task,
6800 +					    unsigned long action);
6801 +feather_callback void do_sched_trace_sys_release(unsigned long id,
6802 +						 lt_t* start);
6803 +
6804 +
6805 +feather_callback void do_sched_trace_tasklet_release(unsigned long id,
6806 +												   struct task_struct* owner);
6807 +feather_callback void do_sched_trace_tasklet_begin(unsigned long id,
6808 +												  struct task_struct* owner);
6809 +feather_callback void do_sched_trace_tasklet_end(unsigned long id,
6810 +												 struct task_struct* owner,
6811 +												 unsigned long flushed);
6812 +
6813 +feather_callback void do_sched_trace_work_release(unsigned long id,
6814 +													 struct task_struct* owner);
6815 +feather_callback void do_sched_trace_work_begin(unsigned long id,
6816 +												struct task_struct* owner,
6817 +												struct task_struct* exe);
6818 +feather_callback void do_sched_trace_work_end(unsigned long id,
6819 +											  struct task_struct* owner,
6820 +											  struct task_struct* exe,
6821 +											  unsigned long flushed);
6822 +
6823 +feather_callback void do_sched_trace_eff_prio_change(unsigned long id,
6824 +											  struct task_struct* task,
6825 +											  struct task_struct* inh);
6826 +
6827 +feather_callback void do_sched_trace_nv_interrupt_begin(unsigned long id,
6828 +												u32 device);
6829 +feather_callback void do_sched_trace_nv_interrupt_end(unsigned long id,
6830 +												unsigned long unused);
6831 +
6832 +feather_callback void do_sched_trace_prediction_err(unsigned long id,
6833 +													  struct task_struct* task,
6834 +													  gpu_migration_dist_t* distance,
6835 +													  fp_t* rel_err);
6836 +
6837 +
6838 +
6839 +
6840 +
6841 +feather_callback void do_sched_trace_migration(unsigned long id,
6842 +											  struct task_struct* task,
6843 +											  struct migration_info* mig_info);
6844 +
6845 +
6846 +/* returns true if we're tracing an interrupt on current CPU */
6847 +/* int is_interrupt_tracing_active(void); */
6848 +
6849 +#endif
6850 +
6851 +#else
6852 +
6853 +#define SCHED_TRACE(id, callback, task)        /* no tracing */
6854 +#define SCHED_TRACE2(id, callback, task, xtra) /* no tracing */
6855 +#define SCHED_TRACE3(id, callback, task, xtra1, xtra2)
6856 +
6857 +#endif
6858 +
6859 +
6860 +#define SCHED_TRACE_BASE_ID 500
6861 +
6862 +
6863 +#define sched_trace_task_name(t) \
6864 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 1, do_sched_trace_task_name, t)
6865 +#define sched_trace_task_param(t) \
6866 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 2, do_sched_trace_task_param, t)
6867 +#define sched_trace_task_release(t) \
6868 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 3, do_sched_trace_task_release, t)
6869 +#define sched_trace_task_switch_to(t) \
6870 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 4, do_sched_trace_task_switch_to, t)
6871 +#define sched_trace_task_switch_away(t) \
6872 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 5, do_sched_trace_task_switch_away, t)
6873 +#define sched_trace_task_completion(t, forced) \
6874 +	SCHED_TRACE2(SCHED_TRACE_BASE_ID + 6, do_sched_trace_task_completion, t, \
6875 +		     (unsigned long) forced)
6876 +#define sched_trace_task_block(t) \
6877 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 7, do_sched_trace_task_block, t)
6878 +#define sched_trace_task_resume(t) \
6879 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 8, do_sched_trace_task_resume, t)
6880 +#define sched_trace_action(t, action) \
6881 +	SCHED_TRACE2(SCHED_TRACE_BASE_ID + 9, do_sched_trace_action, t, \
6882 +		     (unsigned long) action);
6883 +/* when is a pointer, it does not need an explicit cast to unsigned long */
6884 +#define sched_trace_sys_release(when) \
6885 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 10, do_sched_trace_sys_release, when)
6886 +
6887 +
6888 +#define sched_trace_tasklet_release(t) \
6889 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 11, do_sched_trace_tasklet_release, t)
6890 +
6891 +#define sched_trace_tasklet_begin(t) \
6892 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 12, do_sched_trace_tasklet_begin, t)
6893 +
6894 +#define sched_trace_tasklet_end(t, flushed) \
6895 +	SCHED_TRACE2(SCHED_TRACE_BASE_ID + 13, do_sched_trace_tasklet_end, t, flushed)
6896 +
6897 +
6898 +#define sched_trace_work_release(t) \
6899 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 14, do_sched_trace_work_release, t)
6900 +
6901 +#define sched_trace_work_begin(t, e) \
6902 +	SCHED_TRACE2(SCHED_TRACE_BASE_ID + 15, do_sched_trace_work_begin, t, e)
6903 +
6904 +#define sched_trace_work_end(t, e, flushed) \
6905 +	SCHED_TRACE3(SCHED_TRACE_BASE_ID + 16, do_sched_trace_work_end, t, e, flushed)
6906 +
6907 +
6908 +#define sched_trace_eff_prio_change(t, inh) \
6909 +	SCHED_TRACE2(SCHED_TRACE_BASE_ID + 17, do_sched_trace_eff_prio_change, t, inh)
6910 +
6911 +
6912 +#define sched_trace_nv_interrupt_begin(d) \
6913 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 18, do_sched_trace_nv_interrupt_begin, d)
6914 +#define sched_trace_nv_interrupt_end(d) \
6915 +	SCHED_TRACE(SCHED_TRACE_BASE_ID + 19, do_sched_trace_nv_interrupt_end, d)
6916 +
6917 +#define sched_trace_prediction_err(t, dist, rel_err) \
6918 +	SCHED_TRACE3(SCHED_TRACE_BASE_ID + 20, do_sched_trace_prediction_err, t, dist, rel_err)
6919 +
6920 +#define sched_trace_migration(t, mig_info) \
6921 +	SCHED_TRACE2(SCHED_TRACE_BASE_ID + 21, do_sched_trace_migration, t, mig_info)
6922 +
6923 +#define sched_trace_quantum_boundary() /* NOT IMPLEMENTED */
6924 +
6925 +#endif /* __KERNEL__ */
6926 +
6927 +#endif
6928 diff --git a/include/litmus/sched_trace_external.h b/include/litmus/sched_trace_external.h
6929 new file mode 100644
6930 index 0000000..e70e45e
6931 --- /dev/null
6932 +++ b/include/litmus/sched_trace_external.h
6933 @@ -0,0 +1,78 @@
6934 +/*
6935 + * sched_trace.h -- record scheduler events to a byte stream for offline analysis.
6936 + */
6937 +#ifndef _LINUX_SCHED_TRACE_EXTERNAL_H_
6938 +#define _LINUX_SCHED_TRACE_EXTERNAL_H_
6939 +
6940 +
6941 +#ifdef CONFIG_SCHED_TASK_TRACE
6942 +extern void __sched_trace_tasklet_begin_external(struct task_struct* t);
6943 +static inline void sched_trace_tasklet_begin_external(struct task_struct* t)
6944 +{
6945 +	__sched_trace_tasklet_begin_external(t);
6946 +}
6947 +
6948 +extern void __sched_trace_tasklet_end_external(struct task_struct* t, unsigned long flushed);
6949 +static inline void sched_trace_tasklet_end_external(struct task_struct* t, unsigned long flushed)
6950 +{
6951 +	__sched_trace_tasklet_end_external(t, flushed);
6952 +}
6953 +
6954 +extern void __sched_trace_work_begin_external(struct task_struct* t, struct task_struct* e);
6955 +static inline void sched_trace_work_begin_external(struct task_struct* t, struct task_struct* e)
6956 +{
6957 +	__sched_trace_work_begin_external(t, e);
6958 +}
6959 +
6960 +extern void __sched_trace_work_end_external(struct task_struct* t, struct task_struct* e, unsigned long f);
6961 +static inline void sched_trace_work_end_external(struct task_struct* t, struct task_struct* e, unsigned long f)
6962 +{
6963 +	__sched_trace_work_end_external(t, e, f);
6964 +}
6965 +
6966 +#ifdef CONFIG_LITMUS_NVIDIA
6967 +extern void __sched_trace_nv_interrupt_begin_external(u32 device);
6968 +static inline void sched_trace_nv_interrupt_begin_external(u32 device)
6969 +{
6970 +	__sched_trace_nv_interrupt_begin_external(device);
6971 +}
6972 +
6973 +extern void __sched_trace_nv_interrupt_end_external(u32 device);
6974 +static inline void sched_trace_nv_interrupt_end_external(u32 device)
6975 +{
6976 +	__sched_trace_nv_interrupt_end_external(device);
6977 +}
6978 +#endif
6979 +
6980 +#else
6981 +
6982 +// no tracing.
6983 +static inline void sched_trace_tasklet_begin_external(struct task_struct* t){}
6984 +static inline void sched_trace_tasklet_end_external(struct task_struct* t, unsigned long flushed){}
6985 +static inline void sched_trace_work_begin_external(struct task_struct* t, struct task_struct* e){}
6986 +static inline void sched_trace_work_end_external(struct task_struct* t, struct task_struct* e, unsigned long f){}
6987 +
6988 +#ifdef CONFIG_LITMUS_NVIDIA
6989 +static inline void sched_trace_nv_interrupt_begin_external(u32 device){}
6990 +static inline void sched_trace_nv_interrupt_end_external(u32 device){}
6991 +#endif
6992 +
6993 +#endif
6994 +
6995 +
6996 +#ifdef CONFIG_LITMUS_NVIDIA
6997 +
6998 +#define EX_TS(evt) \
6999 +extern void __##evt(void); \
7000 +static inline void EX_##evt(void) { __##evt(); }
7001 +
7002 +EX_TS(TS_NV_TOPISR_START)
7003 +EX_TS(TS_NV_TOPISR_END)
7004 +EX_TS(TS_NV_BOTISR_START)
7005 +EX_TS(TS_NV_BOTISR_END)
7006 +EX_TS(TS_NV_RELEASE_BOTISR_START)
7007 +EX_TS(TS_NV_RELEASE_BOTISR_END)
7008 +
7009 +#endif
7010 +
7011 +#endif
7012 diff --git a/include/litmus/srp.h b/include/litmus/srp.h
7013 new file mode 100644
7014 index 0000000..c9a4552
7015 --- /dev/null
7016 +++ b/include/litmus/srp.h
7017 @@ -0,0 +1,28 @@
7018 +#ifndef LITMUS_SRP_H
7019 +#define LITMUS_SRP_H
7020 +
7021 +struct srp_semaphore;
7022 +
7023 +struct srp_priority {
7024 +	struct list_head	list;
7025 +        unsigned int 		priority;
7026 +	pid_t			pid;
7027 +};
7028 +#define list2prio(l) list_entry(l, struct srp_priority, list)
7029 +
7030 +/* struct for uniprocessor SRP "semaphore" */
7031 +struct srp_semaphore {
7032 +	struct litmus_lock litmus_lock;
7033 +	struct srp_priority ceiling;
7034 +	struct task_struct* owner;
7035 +	int cpu; /* cpu associated with this "semaphore" and resource */
7036 +};
7037 +
7038 +/* map a task to its SRP preemption level priority */
7039 +typedef unsigned int (*srp_prioritization_t)(struct task_struct* t);
7040 +/* Must be updated by each plugin that uses SRP.*/
7041 +extern srp_prioritization_t get_srp_prio;
7042 +
7043 +struct srp_semaphore* allocate_srp_semaphore(void);
7044 +
7045 +#endif
7046 diff --git a/include/litmus/trace.h b/include/litmus/trace.h
7047 new file mode 100644
7048 index 0000000..e078aee
7049 --- /dev/null
7050 +++ b/include/litmus/trace.h
7051 @@ -0,0 +1,148 @@
7052 +#ifndef _SYS_TRACE_H_
7053 +#define	_SYS_TRACE_H_
7054 +
7055 +#ifdef CONFIG_SCHED_OVERHEAD_TRACE
7056 +
7057 +#include <litmus/feather_trace.h>
7058 +#include <litmus/feather_buffer.h>
7059 +
7060 +
7061 +/*********************** TIMESTAMPS ************************/
7062 +
7063 +enum task_type_marker {
7064 +	TSK_BE,
7065 +	TSK_RT,
7066 +	TSK_UNKNOWN
7067 +};
7068 +
7069 +struct timestamp {
7070 +	uint64_t		timestamp;
7071 +	uint32_t		seq_no;
7072 +	uint8_t			cpu;
7073 +	uint8_t			event;
7074 +	uint8_t			task_type:2;
7075 +	uint8_t			irq_flag:1;
7076 +	uint8_t			irq_count:5;
7077 +};
7078 +
7079 +/* tracing callbacks */
7080 +feather_callback void save_timestamp(unsigned long event);
7081 +feather_callback void save_timestamp_def(unsigned long event, unsigned long type);
7082 +feather_callback void save_timestamp_task(unsigned long event, unsigned long t_ptr);
7083 +feather_callback void save_timestamp_cpu(unsigned long event, unsigned long cpu);
7084 +feather_callback void save_task_latency(unsigned long event, unsigned long when_ptr);
7085 +
7086 +#define TIMESTAMP(id) ft_event0(id, save_timestamp)
7087 +
7088 +#define DTIMESTAMP(id, def)  ft_event1(id, save_timestamp_def, (unsigned long) def)
7089 +
7090 +#define TTIMESTAMP(id, task) \
7091 +	ft_event1(id, save_timestamp_task, (unsigned long) task)
7092 +
7093 +#define CTIMESTAMP(id, cpu) \
7094 +	ft_event1(id, save_timestamp_cpu, (unsigned long) cpu)
7095 +
7096 +#define LTIMESTAMP(id, task) \
7097 +	ft_event1(id, save_task_latency, (unsigned long) task)
7098 +
7099 +#else /* !CONFIG_SCHED_OVERHEAD_TRACE */
7100 +
7101 +#define TIMESTAMP(id)        /* no tracing */
7102 +
7103 +#define DTIMESTAMP(id, def)  /* no tracing */
7104 +
7105 +#define TTIMESTAMP(id, task) /* no tracing */
7106 +
7107 +#define CTIMESTAMP(id, cpu)  /* no tracing */
7108 +
7109 +#define LTIMESTAMP(id, when_ptr) /* no tracing */
7110 +
7111 +#endif
7112 +
7113 +
7114 +/* Convention for timestamps
7115 + * =========================
7116 + *
7117 + * In order to process the trace files with a common tool, we use the following
7118 + * convention to measure execution times: The end time id of a code segment is
7119 + * always the next number after the start time event id.
7120 + */
7121 +
7122 +
7123 +
7124 +#define TS_SCHED_START			DTIMESTAMP(100, TSK_UNKNOWN) /* we only
7125 +								      * care
7126 +								      * about
7127 +								      * next */
7128 +#define TS_SCHED_END(t)			TTIMESTAMP(101, t)
7129 +#define TS_SCHED2_START(t) 		TTIMESTAMP(102, t)
7130 +#define TS_SCHED2_END(t)       		TTIMESTAMP(103, t)
7131 +
7132 +#define TS_CXS_START(t)			TTIMESTAMP(104, t)
7133 +#define TS_CXS_END(t)			TTIMESTAMP(105, t)
7134 +
7135 +#define TS_RELEASE_START		DTIMESTAMP(106, TSK_RT)
7136 +#define TS_RELEASE_END			DTIMESTAMP(107, TSK_RT)
7137 +
7138 +#define TS_TICK_START(t)		TTIMESTAMP(110, t)
7139 +#define TS_TICK_END(t) 			TTIMESTAMP(111, t)
7140 +
7141 +
7142 +#define TS_PLUGIN_SCHED_START		/* TIMESTAMP(120) */  /* currently unused */
7143 +#define TS_PLUGIN_SCHED_END		/* TIMESTAMP(121) */
7144 +
7145 +#define TS_PLUGIN_TICK_START		/* TIMESTAMP(130) */
7146 +#define TS_PLUGIN_TICK_END		/* TIMESTAMP(131) */
7147 +
7148 +#define TS_ENTER_NP_START		TIMESTAMP(140)
7149 +#define TS_ENTER_NP_END			TIMESTAMP(141)
7150 +
7151 +#define TS_EXIT_NP_START		TIMESTAMP(150)
7152 +#define TS_EXIT_NP_END			TIMESTAMP(151)
7153 +
7154 +#define TS_LOCK_START			TIMESTAMP(170)
7155 +#define TS_LOCK_SUSPEND			TIMESTAMP(171)
7156 +#define TS_LOCK_RESUME			TIMESTAMP(172)
7157 +#define TS_LOCK_END				TIMESTAMP(173)
7158 +
7159 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
7160 +#define TS_DGL_LOCK_START			TIMESTAMP(175)
7161 +#define TS_DGL_LOCK_SUSPEND			TIMESTAMP(176)
7162 +#define TS_DGL_LOCK_RESUME			TIMESTAMP(177)
7163 +#define TS_DGL_LOCK_END				TIMESTAMP(178)
7164 +#endif
7165 +
7166 +#define TS_UNLOCK_START			TIMESTAMP(180)
7167 +#define TS_UNLOCK_END			TIMESTAMP(181)
7168 +
7169 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
7170 +#define TS_DGL_UNLOCK_START			TIMESTAMP(185)
7171 +#define TS_DGL_UNLOCK_END			TIMESTAMP(186)
7172 +#endif
7173 +
7174 +#define TS_SEND_RESCHED_START(c)	CTIMESTAMP(190, c)
7175 +#define TS_SEND_RESCHED_END		DTIMESTAMP(191, TSK_UNKNOWN)
7176 +
7177 +#define TS_RELEASE_LATENCY(when)	LTIMESTAMP(208, &(when))
7178 +
7179 +
7180 +#ifdef CONFIG_LITMUS_NVIDIA
7181 +
7182 +#define TS_NV_TOPISR_START		TIMESTAMP(200)
7183 +#define TS_NV_TOPISR_END		TIMESTAMP(201)
7184 +
7185 +#define TS_NV_BOTISR_START		TIMESTAMP(202)
7186 +#define TS_NV_BOTISR_END		TIMESTAMP(203)
7187 +
7188 +#define TS_NV_RELEASE_BOTISR_START	TIMESTAMP(204)
7189 +#define TS_NV_RELEASE_BOTISR_END	TIMESTAMP(205)
7190 +
7191 +#endif
7192 +
7193 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
7194 +#define TS_NV_SCHED_BOTISR_START	TIMESTAMP(206)
7195 +#define TS_NV_SCHED_BOTISR_END		TIMESTAMP(207)
7196 +#endif
7197 +
7198 +
7199 +#endif /* !_SYS_TRACE_H_ */
7200 diff --git a/include/litmus/trace_irq.h b/include/litmus/trace_irq.h
7201 new file mode 100644
7202 index 0000000..f18b127
7203 --- /dev/null
7204 +++ b/include/litmus/trace_irq.h
7205 @@ -0,0 +1,21 @@
7206 +#ifndef _LITMUS_TRACE_IRQ_H_
7207 +#define	_LITMUS_TRACE_IRQ_H_
7208 +
7209 +#ifdef CONFIG_SCHED_OVERHEAD_TRACE
7210 +
7211 +extern DEFINE_PER_CPU(atomic_t, irq_fired_count);
7212 +
7213 +static inline void ft_irq_fired(void)
7214 +{
7215 +	/* Only called with preemptions disabled.  */
7216 +	atomic_inc(&__get_cpu_var(irq_fired_count));
7217 +}
7218 +
7219 +
7220 +#else
7221 +
7222 +#define ft_irq_fired() /* nothing to do */
7223 +
7224 +#endif
7225 +
7226 +#endif
7227 diff --git a/include/litmus/unistd_32.h b/include/litmus/unistd_32.h
7228 new file mode 100644
7229 index 0000000..4fa514c
7230 --- /dev/null
7231 +++ b/include/litmus/unistd_32.h
7232 @@ -0,0 +1,24 @@
7233 +/*
7234 + * included from arch/x86/include/asm/unistd_32.h
7235 + *
7236 + * LITMUS^RT syscalls with "relative" numbers
7237 + */
7238 +#define __LSC(x) (__NR_LITMUS + x)
7239 +
7240 +#define __NR_set_rt_task_param	__LSC(0)
7241 +#define __NR_get_rt_task_param	__LSC(1)
7242 +#define __NR_complete_job	__LSC(2)
7243 +#define __NR_od_open		__LSC(3)
7244 +#define __NR_od_close		__LSC(4)
7245 +#define __NR_litmus_lock       	__LSC(5)
7246 +#define __NR_litmus_unlock	__LSC(6)
7247 +#define __NR_query_job_no	__LSC(7)
7248 +#define __NR_wait_for_job_release __LSC(8)
7249 +#define __NR_wait_for_ts_release __LSC(9)
7250 +#define __NR_release_ts		__LSC(10)
7251 +#define __NR_null_call		__LSC(11)
7252 +#define __NR_litmus_dgl_lock	__LSC(12)
7253 +#define __NR_litmus_dgl_unlock	__LSC(13)
7254 +#define __NR_register_nv_device			__LSC(14)
7255 +
7256 +#define NR_litmus_syscalls 15
7257 diff --git a/include/litmus/unistd_64.h b/include/litmus/unistd_64.h
7258 new file mode 100644
7259 index 0000000..f80dc45
7260 --- /dev/null
7261 +++ b/include/litmus/unistd_64.h
7262 @@ -0,0 +1,40 @@
7263 +/*
7264 + * included from arch/x86/include/asm/unistd_64.h
7265 + *
7266 + * LITMUS^RT syscalls with "relative" numbers
7267 + */
7268 +#define __LSC(x) (__NR_LITMUS + x)
7269 +
7270 +#define __NR_set_rt_task_param			__LSC(0)
7271 +__SYSCALL(__NR_set_rt_task_param, sys_set_rt_task_param)
7272 +#define __NR_get_rt_task_param			__LSC(1)
7273 +__SYSCALL(__NR_get_rt_task_param, sys_get_rt_task_param)
7274 +#define __NR_complete_job	  		__LSC(2)
7275 +__SYSCALL(__NR_complete_job, sys_complete_job)
7276 +#define __NR_od_open				__LSC(3)
7277 +__SYSCALL(__NR_od_open, sys_od_open)
7278 +#define __NR_od_close				__LSC(4)
7279 +__SYSCALL(__NR_od_close, sys_od_close)
7280 +#define __NR_litmus_lock	       		__LSC(5)
7281 +__SYSCALL(__NR_litmus_lock, sys_litmus_lock)
7282 +#define __NR_litmus_unlock	       		__LSC(6)
7283 +__SYSCALL(__NR_litmus_unlock, sys_litmus_unlock)
7284 +#define __NR_query_job_no			__LSC(7)
7285 +__SYSCALL(__NR_query_job_no, sys_query_job_no)
7286 +#define __NR_wait_for_job_release		__LSC(8)
7287 +__SYSCALL(__NR_wait_for_job_release, sys_wait_for_job_release)
7288 +#define __NR_wait_for_ts_release		__LSC(9)
7289 +__SYSCALL(__NR_wait_for_ts_release, sys_wait_for_ts_release)
7290 +#define __NR_release_ts				__LSC(10)
7291 +__SYSCALL(__NR_release_ts, sys_release_ts)
7292 +#define __NR_null_call				__LSC(11)
7293 +__SYSCALL(__NR_null_call, sys_null_call)
7294 +#define __NR_litmus_dgl_lock		__LSC(12)
7295 +__SYSCALL(__NR_litmus_dgl_lock, sys_litmus_dgl_lock)
7296 +#define __NR_litmus_dgl_unlock		__LSC(13)
7297 +__SYSCALL(__NR_litmus_dgl_unlock, sys_litmus_dgl_unlock)
7298 +#define __NR_register_nv_device			__LSC(14)
7299 +__SYSCALL(__NR_register_nv_device, sys_register_nv_device)
7300 +
7301 +
7302 +#define NR_litmus_syscalls 15
7303 diff --git a/kernel/exit.c b/kernel/exit.c
7304 index f2b321b..64879bd 100644
7305 --- a/kernel/exit.c
7306 +++ b/kernel/exit.c
7307 @@ -57,6 +57,8 @@
7308  #include <asm/pgtable.h>
7309  #include <asm/mmu_context.h>
7310  
7311 +extern void exit_od_table(struct task_struct *t);
7312 +
7313  static void exit_mm(struct task_struct * tsk);
7314  
7315  static void __unhash_process(struct task_struct *p, bool group_dead)
7316 @@ -980,6 +982,8 @@ NORET_TYPE void do_exit(long code)
7317  	if (unlikely(tsk->audit_context))
7318  		audit_free(tsk);
7319  
7320 +	exit_od_table(tsk);
7321 +
7322  	tsk->exit_code = code;
7323  	taskstats_exit(tsk, group_dead);
7324  
7325 diff --git a/kernel/fork.c b/kernel/fork.c
7326 index 0276c30..25c6111 100644
7327 --- a/kernel/fork.c
7328 +++ b/kernel/fork.c
7329 @@ -77,6 +77,9 @@
7330  
7331  #include <trace/events/sched.h>
7332  
7333 +#include <litmus/litmus.h>
7334 +#include <litmus/sched_plugin.h>
7335 +
7336  /*
7337   * Protected counters by write_lock_irq(&tasklist_lock)
7338   */
7339 @@ -191,6 +194,7 @@ void __put_task_struct(struct task_struct *tsk)
7340  	WARN_ON(atomic_read(&tsk->usage));
7341  	WARN_ON(tsk == current);
7342  
7343 +	exit_litmus(tsk);
7344  	exit_creds(tsk);
7345  	delayacct_tsk_free(tsk);
7346  	put_signal_struct(tsk->signal);
7347 @@ -275,6 +279,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig)
7348  
7349  	tsk->stack = ti;
7350  
7351 +	/* Don't let the new task be a real-time task. */
7352 +	litmus_fork(tsk);
7353 +
7354  	err = prop_local_init_single(&tsk->dirties);
7355  	if (err)
7356  		goto out;
7357 diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
7358 index a9205e3..11e8969 100644
7359 --- a/kernel/hrtimer.c
7360 +++ b/kernel/hrtimer.c
7361 @@ -46,6 +46,8 @@
7362  #include <linux/sched.h>
7363  #include <linux/timer.h>
7364  
7365 +#include <litmus/litmus.h>
7366 +
7367  #include <asm/uaccess.h>
7368  
7369  #include <trace/events/timer.h>
7370 @@ -1026,6 +1028,98 @@ hrtimer_start(struct hrtimer *timer, ktime_t tim, const enum hrtimer_mode mode)
7371  }
7372  EXPORT_SYMBOL_GPL(hrtimer_start);
7373  
7374 +#ifdef CONFIG_ARCH_HAS_SEND_PULL_TIMERS
7375 +
7376 +/**
7377 + * hrtimer_start_on_info_init - Initialize hrtimer_start_on_info
7378 + */
7379 +void hrtimer_start_on_info_init(struct hrtimer_start_on_info *info)
7380 +{
7381 +	memset(info, 0, sizeof(struct hrtimer_start_on_info));
7382 +	atomic_set(&info->state, HRTIMER_START_ON_INACTIVE);
7383 +}
7384 +
7385 +/**
7386 + *  hrtimer_pull - PULL_TIMERS_VECTOR callback on remote cpu
7387 + */
7388 +void hrtimer_pull(void)
7389 +{
7390 +	struct hrtimer_cpu_base *base = &__get_cpu_var(hrtimer_bases);
7391 +	struct hrtimer_start_on_info *info;
7392 +	struct list_head *pos, *safe, list;
7393 +
7394 +	raw_spin_lock(&base->lock);
7395 +	list_replace_init(&base->to_pull, &list);
7396 +	raw_spin_unlock(&base->lock);
7397 +
7398 +	list_for_each_safe(pos, safe, &list) {
7399 +		info = list_entry(pos, struct hrtimer_start_on_info, list);
7400 +		TRACE("pulled timer 0x%x\n", info->timer);
7401 +		list_del(pos);
7402 +		hrtimer_start(info->timer, info->time, info->mode);
7403 +	}
7404 +}
7405 +
7406 +/**
7407 + *  hrtimer_start_on - trigger timer arming on remote cpu
7408 + *  @cpu:	remote cpu
7409 + *  @info:	save timer information for enqueuing on remote cpu
7410 + *  @timer:	timer to be pulled
7411 + *  @time:	expire time
7412 + *  @mode:	timer mode
7413 + */
7414 +int hrtimer_start_on(int cpu, struct hrtimer_start_on_info* info,
7415 +		struct hrtimer *timer, ktime_t time,
7416 +		const enum hrtimer_mode mode)
7417 +{
7418 +	unsigned long flags;
7419 +	struct hrtimer_cpu_base* base;
7420 +	int in_use = 0, was_empty;
7421 +
7422 +	/* serialize access to info through the timer base */
7423 +	lock_hrtimer_base(timer, &flags);
7424 +
7425 +	in_use = (atomic_read(&info->state) != HRTIMER_START_ON_INACTIVE);
7426 +	if (!in_use) {
7427 +		INIT_LIST_HEAD(&info->list);
7428 +		info->timer = timer;
7429 +		info->time  = time;
7430 +		info->mode  = mode;
7431 +		/* mark as in use */
7432 +		atomic_set(&info->state, HRTIMER_START_ON_QUEUED);
7433 +	}
7434 +
7435 +	unlock_hrtimer_base(timer, &flags);
7436 +
7437 +	if (!in_use) {
7438 +		/* initiate pull  */
7439 +		preempt_disable();
7440 +		if (cpu == smp_processor_id()) {
7441 +			/* start timer locally; we may get called
7442 +			 * with rq->lock held, do not wake up anything
7443 +			 */
7444 +			TRACE("hrtimer_start_on: starting on local CPU\n");
7445 +			__hrtimer_start_range_ns(info->timer, info->time,
7446 +						 0, info->mode, 0);
7447 +		} else {
7448 +			TRACE("hrtimer_start_on: pulling to remote CPU\n");
7449 +			base = &per_cpu(hrtimer_bases, cpu);
7450 +			raw_spin_lock_irqsave(&base->lock, flags);
7451 +			was_empty = list_empty(&base->to_pull);
7452 +			list_add(&info->list, &base->to_pull);
7453 +			raw_spin_unlock_irqrestore(&base->lock, flags);
7454 +			if (was_empty)
7455 +				/* only send IPI if other no else
7456 +				 * has done so already
7457 +				 */
7458 +				smp_send_pull_timers(cpu);
7459 +		}
7460 +		preempt_enable();
7461 +	}
7462 +	return in_use;
7463 +}
7464 +
7465 +#endif
7466  
7467  /**
7468   * hrtimer_try_to_cancel - try to deactivate a timer
7469 @@ -1625,6 +1719,7 @@ static void __cpuinit init_hrtimers_cpu(int cpu)
7470  	}
7471  
7472  	hrtimer_init_hres(cpu_base);
7473 +	INIT_LIST_HEAD(&cpu_base->to_pull);
7474  }
7475  
7476  #ifdef CONFIG_HOTPLUG_CPU
7477 diff --git a/kernel/lockdep.c b/kernel/lockdep.c
7478 index 298c927..2bdcdc3 100644
7479 --- a/kernel/lockdep.c
7480 +++ b/kernel/lockdep.c
7481 @@ -542,7 +542,7 @@ static void print_lock(struct held_lock *hlock)
7482  	print_ip_sym(hlock->acquire_ip);
7483  }
7484  
7485 -static void lockdep_print_held_locks(struct task_struct *curr)
7486 +void lockdep_print_held_locks(struct task_struct *curr)
7487  {
7488  	int i, depth = curr->lockdep_depth;
7489  
7490 @@ -558,6 +558,7 @@ static void lockdep_print_held_locks(struct task_struct *curr)
7491  		print_lock(curr->held_locks + i);
7492  	}
7493  }
7494 +EXPORT_SYMBOL(lockdep_print_held_locks);
7495  
7496  static void print_kernel_version(void)
7497  {
7498 @@ -583,6 +584,10 @@ static int static_obj(void *obj)
7499  		      end   = (unsigned long) &_end,
7500  		      addr  = (unsigned long) obj;
7501  
7502 +	// GLENN
7503 +	return 1;
7504 +
7505 +
7506  	/*
7507  	 * static variable?
7508  	 */
7509 diff --git a/kernel/mutex.c b/kernel/mutex.c
7510 index d607ed5..96bcecd 100644
7511 --- a/kernel/mutex.c
7512 +++ b/kernel/mutex.c
7513 @@ -498,3 +498,128 @@ int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock)
7514  	return 1;
7515  }
7516  EXPORT_SYMBOL(atomic_dec_and_mutex_lock);
7517 +
7518 +
7519 +
7520 +
7521 +void mutex_lock_sfx(struct mutex *lock,
7522 +				   side_effect_t pre, unsigned long pre_arg,
7523 +				   side_effect_t post, unsigned long post_arg)
7524 +{
7525 +	long state = TASK_UNINTERRUPTIBLE;
7526 +
7527 +	struct task_struct *task = current;
7528 +	struct mutex_waiter waiter;
7529 +	unsigned long flags;
7530 +
7531 +	preempt_disable();
7532 +	mutex_acquire(&lock->dep_map, subclass, 0, ip);
7533 +
7534 +	spin_lock_mutex(&lock->wait_lock, flags);
7535 +
7536 +	if(pre)
7537 +	{
7538 +		if(unlikely(pre(pre_arg)))
7539 +		{
7540 +			// this will fuck with lockdep's CONFIG_PROVE_LOCKING...
7541 +			spin_unlock_mutex(&lock->wait_lock, flags);
7542 +			preempt_enable();
7543 +			return;
7544 +		}
7545 +	}
7546 +
7547 +	debug_mutex_lock_common(lock, &waiter);
7548 +	debug_mutex_add_waiter(lock, &waiter, task_thread_info(task));
7549 +
7550 +	/* add waiting tasks to the end of the waitqueue (FIFO): */
7551 +	list_add_tail(&waiter.list, &lock->wait_list);
7552 +	waiter.task = task;
7553 +
7554 +	if (atomic_xchg(&lock->count, -1) == 1)
7555 +		goto done;
7556 +
7557 +	lock_contended(&lock->dep_map, ip);
7558 +
7559 +	for (;;) {
7560 +		/*
7561 +		 * Lets try to take the lock again - this is needed even if
7562 +		 * we get here for the first time (shortly after failing to
7563 +		 * acquire the lock), to make sure that we get a wakeup once
7564 +		 * it's unlocked. Later on, if we sleep, this is the
7565 +		 * operation that gives us the lock. We xchg it to -1, so
7566 +		 * that when we release the lock, we properly wake up the
7567 +		 * other waiters:
7568 +		 */
7569 +		if (atomic_xchg(&lock->count, -1) == 1)
7570 +			break;
7571 +
7572 +		__set_task_state(task, state);
7573 +
7574 +		/* didnt get the lock, go to sleep: */
7575 +		spin_unlock_mutex(&lock->wait_lock, flags);
7576 +		preempt_enable_no_resched();
7577 +		schedule();
7578 +		preempt_disable();
7579 +		spin_lock_mutex(&lock->wait_lock, flags);
7580 +	}
7581 +
7582 +done:
7583 +	lock_acquired(&lock->dep_map, ip);
7584 +	/* got the lock - rejoice! */
7585 +	mutex_remove_waiter(lock, &waiter, current_thread_info());
7586 +	mutex_set_owner(lock);
7587 +
7588 +	/* set it to 0 if there are no waiters left: */
7589 +	if (likely(list_empty(&lock->wait_list)))
7590 +		atomic_set(&lock->count, 0);
7591 +
7592 +	if(post)
7593 +		post(post_arg);
7594 +
7595 +	spin_unlock_mutex(&lock->wait_lock, flags);
7596 +
7597 +	debug_mutex_free_waiter(&waiter);
7598 +	preempt_enable();
7599 +}
7600 +EXPORT_SYMBOL(mutex_lock_sfx);
7601 +
7602 +void mutex_unlock_sfx(struct mutex *lock,
7603 +					side_effect_t pre, unsigned long pre_arg,
7604 +					side_effect_t post, unsigned long post_arg)
7605 +{
7606 +	unsigned long flags;
7607 +
7608 +	spin_lock_mutex(&lock->wait_lock, flags);
7609 +
7610 +	if(pre)
7611 +		pre(pre_arg);
7612 +
7613 +	//mutex_release(&lock->dep_map, nested, _RET_IP_);
7614 +	mutex_release(&lock->dep_map, 1, _RET_IP_);
7615 +	debug_mutex_unlock(lock);
7616 +
7617 +	/*
7618 +	 * some architectures leave the lock unlocked in the fastpath failure
7619 +	 * case, others need to leave it locked. In the later case we have to
7620 +	 * unlock it here
7621 +	 */
7622 +	if (__mutex_slowpath_needs_to_unlock())
7623 +		atomic_set(&lock->count, 1);
7624 +
7625 +	if (!list_empty(&lock->wait_list)) {
7626 +		/* get the first entry from the wait-list: */
7627 +		struct mutex_waiter *waiter =
7628 +		list_entry(lock->wait_list.next,
7629 +				   struct mutex_waiter, list);
7630 +
7631 +		debug_mutex_wake_waiter(lock, waiter);
7632 +
7633 +		wake_up_process(waiter->task);
7634 +	}
7635 +
7636 +	if(post)
7637 +		post(post_arg);
7638 +
7639 +	spin_unlock_mutex(&lock->wait_lock, flags);
7640 +}
7641 +EXPORT_SYMBOL(mutex_unlock_sfx);
7642 diff --git a/kernel/printk.c b/kernel/printk.c
7643 index 3518539..b799a2e 100644
7644 --- a/kernel/printk.c
7645 +++ b/kernel/printk.c
7646 @@ -70,6 +70,13 @@ int console_printk[4] = {
7647  };
7648  
7649  /*
7650 + * divert printk() messages when there is a LITMUS^RT debug listener
7651 + */
7652 +#include <litmus/litmus.h>
7653 +int trace_override = 0;
7654 +int trace_recurse  = 0;
7655 +
7656 +/*
7657   * Low level drivers may need that to know if they can schedule in
7658   * their unblank() callback or not. So let's export it.
7659   */
7660 @@ -871,6 +878,9 @@ asmlinkage int vprintk(const char *fmt, va_list args)
7661  	/* Emit the output into the temporary buffer */
7662  	printed_len += vscnprintf(printk_buf + printed_len,
7663  				  sizeof(printk_buf) - printed_len, fmt, args);
7664 +	/* if LITMUS^RT tracer is active divert printk() msgs */
7665 +	if (trace_override && !trace_recurse)
7666 +		TRACE("%s", printk_buf);
7667  
7668  	p = printk_buf;
7669  
7670 @@ -947,7 +957,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)
7671  	 * Try to acquire and then immediately release the
7672  	 * console semaphore. The release will do all the
7673  	 * actual magic (print out buffers, wake up klogd,
7674 -	 * etc). 
7675 +	 * etc).
7676  	 *
7677  	 * The console_trylock_for_printk() function
7678  	 * will release 'logbuf_lock' regardless of whether it
7679 @@ -1220,7 +1230,7 @@ int printk_needs_cpu(int cpu)
7680  
7681  void wake_up_klogd(void)
7682  {
7683 -	if (waitqueue_active(&log_wait))
7684 +	if (!trace_override && waitqueue_active(&log_wait))
7685  		this_cpu_write(printk_pending, 1);
7686  }
7687  
7688 diff --git a/kernel/sched.c b/kernel/sched.c
7689 index fde6ff9..2f990b4 100644
7690 --- a/kernel/sched.c
7691 +++ b/kernel/sched.c
7692 @@ -80,6 +80,15 @@
7693  #include "workqueue_sched.h"
7694  #include "sched_autogroup.h"
7695  
7696 +#include <litmus/sched_trace.h>
7697 +#include <litmus/trace.h>
7698 +
7699 +#ifdef CONFIG_LITMUS_SOFTIRQD
7700 +#include <litmus/litmus_softirq.h>
7701 +#endif
7702 +
7703 +static void litmus_tick(struct rq*, struct task_struct*);
7704 +
7705  #define CREATE_TRACE_POINTS
7706  #include <trace/events/sched.h>
7707  
7708 @@ -410,6 +419,12 @@ struct rt_rq {
7709  #endif
7710  };
7711  
7712 +/* Litmus related fields in a runqueue */
7713 +struct litmus_rq {
7714 +	unsigned long nr_running;
7715 +	struct task_struct *prev;
7716 +};
7717 +
7718  #ifdef CONFIG_SMP
7719  
7720  /*
7721 @@ -475,6 +490,7 @@ struct rq {
7722  
7723  	struct cfs_rq cfs;
7724  	struct rt_rq rt;
7725 +	struct litmus_rq litmus;
7726  
7727  #ifdef CONFIG_FAIR_GROUP_SCHED
7728  	/* list of leaf cfs_rq on this cpu: */
7729 @@ -1045,6 +1061,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)
7730  	raw_spin_lock(&rq->lock);
7731  	update_rq_clock(rq);
7732  	rq->curr->sched_class->task_tick(rq, rq->curr, 1);
7733 +	litmus_tick(rq, rq->curr);
7734  	raw_spin_unlock(&rq->lock);
7735  
7736  	return HRTIMER_NORESTART;
7737 @@ -1773,7 +1790,7 @@ static inline void __set_task_cpu(struct task_struct *p, unsigned int cpu)
7738  
7739  static const struct sched_class rt_sched_class;
7740  
7741 -#define sched_class_highest (&stop_sched_class)
7742 +#define sched_class_highest (&litmus_sched_class)
7743  #define for_each_class(class) \
7744     for (class = sched_class_highest; class; class = class->next)
7745  
7746 @@ -2031,6 +2048,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
7747  #include "sched_rt.c"
7748  #include "sched_autogroup.c"
7749  #include "sched_stoptask.c"
7750 +#include "../litmus/sched_litmus.c"
7751  #ifdef CONFIG_SCHED_DEBUG
7752  # include "sched_debug.c"
7753  #endif
7754 @@ -2153,6 +2171,10 @@ static void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
7755  	 * A queue event has occurred, and we're going to schedule.  In
7756  	 * this case, we can save a useless back to back clock update.
7757  	 */
7758 +	/* LITMUS^RT:
7759 +	 * The "disable-clock-update" approach was buggy in Linux 2.6.36.
7760 +	 * The issue has been solved in 2.6.37.
7761 +	 */
7762  	if (rq->curr->on_rq && test_tsk_need_resched(rq->curr))
7763  		rq->skip_clock_update = 1;
7764  }
7765 @@ -2643,7 +2665,12 @@ static void ttwu_queue(struct task_struct *p, int cpu)
7766  	struct rq *rq = cpu_rq(cpu);
7767  
7768  #if defined(CONFIG_SMP)
7769 -	if (sched_feat(TTWU_QUEUE) && cpu != smp_processor_id()) {
7770 +	/*
7771 +	 * LITMUS^RT: whether to send an IPI to the remote CPU
7772 +	 * is plugin specific.
7773 +	 */
7774 +	if (!is_realtime(p) &&
7775 +			sched_feat(TTWU_QUEUE) && cpu != smp_processor_id()) {
7776  		sched_clock_cpu(cpu); /* sync clocks x-cpu */
7777  		ttwu_queue_remote(p, cpu);
7778  		return;
7779 @@ -2676,6 +2703,9 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
7780  	unsigned long flags;
7781  	int cpu, success = 0;
7782  
7783 +	if (is_realtime(p))
7784 +		TRACE_TASK(p, "try_to_wake_up() state:%d\n", p->state);
7785 +
7786  	smp_wmb();
7787  	raw_spin_lock_irqsave(&p->pi_lock, flags);
7788  	if (!(p->state & state))
7789 @@ -2712,6 +2742,12 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
7790  	 */
7791  	smp_rmb();
7792  
7793 +	/* LITMUS^RT: once the task can be safely referenced by this
7794 +	 * CPU, don't mess up with Linux load balancing stuff.
7795 +	 */
7796 +	if (is_realtime(p))
7797 +		goto litmus_out_activate;
7798 +
7799  	p->sched_contributes_to_load = !!task_contributes_to_load(p);
7800  	p->state = TASK_WAKING;
7801  
7802 @@ -2723,12 +2759,16 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
7803  		wake_flags |= WF_MIGRATED;
7804  		set_task_cpu(p, cpu);
7805  	}
7806 +
7807 +litmus_out_activate:
7808  #endif /* CONFIG_SMP */
7809  
7810  	ttwu_queue(p, cpu);
7811  stat:
7812  	ttwu_stat(p, cpu, wake_flags);
7813  out:
7814 +	if (is_realtime(p))
7815 +		TRACE_TASK(p, "try_to_wake_up() done state:%d\n", p->state);
7816  	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
7817  
7818  	return success;
7819 @@ -2839,7 +2879,8 @@ void sched_fork(struct task_struct *p)
7820  	 * Revert to default priority/policy on fork if requested.
7821  	 */
7822  	if (unlikely(p->sched_reset_on_fork)) {
7823 -		if (p->policy == SCHED_FIFO || p->policy == SCHED_RR) {
7824 +		if (p->policy == SCHED_FIFO || p->policy == SCHED_RR ||
7825 +		    p->policy == SCHED_LITMUS) {
7826  			p->policy = SCHED_NORMAL;
7827  			p->normal_prio = p->static_prio;
7828  		}
7829 @@ -3050,6 +3091,8 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
7830  	 */
7831  	prev_state = prev->state;
7832  	finish_arch_switch(prev);
7833 +	litmus->finish_switch(prev);
7834 +	prev->rt_param.stack_in_use = NO_CPU;
7835  #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
7836  	local_irq_disable();
7837  #endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
7838 @@ -3079,6 +3122,15 @@ static inline void pre_schedule(struct rq *rq, struct task_struct *prev)
7839  {
7840  	if (prev->sched_class->pre_schedule)
7841  		prev->sched_class->pre_schedule(rq, prev);
7842 +
7843 +	/* LITMUS^RT not very clean hack: we need to save the prev task
7844 +	 * as our scheduling decision rely on it (as we drop the rq lock
7845 +	 * something in prev can change...); there is no way to escape
7846 +	 * this ack apart from modifying pick_nex_task(rq, _prev_) or
7847 +	 * falling back on the previous solution of decoupling
7848 +	 * scheduling decisions
7849 +	 */
7850 +	rq->litmus.prev = prev;
7851  }
7852  
7853  /* rq->lock is NOT held, but preemption is disabled */
7854 @@ -4094,18 +4146,26 @@ void scheduler_tick(void)
7855  
7856  	sched_clock_tick();
7857  
7858 +	TS_TICK_START(current);
7859 +
7860  	raw_spin_lock(&rq->lock);
7861  	update_rq_clock(rq);
7862  	update_cpu_load_active(rq);
7863  	curr->sched_class->task_tick(rq, curr, 0);
7864 +
7865 +	/* litmus_tick may force current to resched */
7866 +	litmus_tick(rq, curr);
7867 +
7868  	raw_spin_unlock(&rq->lock);
7869  
7870  	perf_event_task_tick();
7871  
7872  #ifdef CONFIG_SMP
7873  	rq->idle_at_tick = idle_cpu(cpu);
7874 -	trigger_load_balance(rq, cpu);
7875 +	if (!is_realtime(current))
7876 +		trigger_load_balance(rq, cpu);
7877  #endif
7878 +	TS_TICK_END(current);
7879  }
7880  
7881  notrace unsigned long get_parent_ip(unsigned long addr)
7882 @@ -4225,12 +4285,20 @@ pick_next_task(struct rq *rq)
7883  	/*
7884  	 * Optimization: we know that if all tasks are in
7885  	 * the fair class we can call that function directly:
7886 -	 */
7887 -	if (likely(rq->nr_running == rq->cfs.nr_running)) {
7888 +
7889 +	 * NOT IN LITMUS^RT!
7890 +
7891 +	 * This breaks many assumptions in the plugins.
7892 +	 * Do not uncomment without thinking long and hard
7893 +	 * about how this affects global plugins such as GSN-EDF.
7894 +
7895 +	if (rq->nr_running == rq->cfs.nr_running) {
7896 +		TRACE("taking shortcut in pick_next_task()\n");
7897  		p = fair_sched_class.pick_next_task(rq);
7898  		if (likely(p))
7899  			return p;
7900  	}
7901 +	*/
7902  
7903  	for_each_class(class) {
7904  		p = class->pick_next_task(rq);
7905 @@ -4241,6 +4309,7 @@ pick_next_task(struct rq *rq)
7906  	BUG(); /* the idle class will always have a runnable task */
7907  }
7908  
7909 +
7910  /*
7911   * schedule() is the main scheduler function.
7912   */
7913 @@ -4253,11 +4322,23 @@ asmlinkage void __sched schedule(void)
7914  
7915  need_resched:
7916  	preempt_disable();
7917 +	sched_state_entered_schedule();
7918  	cpu = smp_processor_id();
7919  	rq = cpu_rq(cpu);
7920  	rcu_note_context_switch(cpu);
7921  	prev = rq->curr;
7922  
7923 +#ifdef CONFIG_LITMUS_SOFTIRQD
7924 +	release_klitirqd_lock(prev);
7925 +#endif
7926 +
7927 +	/* LITMUS^RT: quickly re-evaluate the scheduling decision
7928 +	 * if the previous one is no longer valid after CTX.
7929 +	 */
7930 +litmus_need_resched_nonpreemptible:
7931 +	TS_SCHED_START;
7932 +	sched_trace_task_switch_away(prev);
7933 +
7934  	schedule_debug(prev);
7935  
7936  	if (sched_feat(HRTICK))
7937 @@ -4314,7 +4395,10 @@ need_resched:
7938  		rq->curr = next;
7939  		++*switch_count;
7940  
7941 +		TS_SCHED_END(next);
7942 +		TS_CXS_START(next);
7943  		context_switch(rq, prev, next); /* unlocks the rq */
7944 +		TS_CXS_END(current);
7945  		/*
7946  		 * The context switch have flipped the stack from under us
7947  		 * and restored the local variables which were saved when
7948 @@ -4323,17 +4407,37 @@ need_resched:
7949  		 */
7950  		cpu = smp_processor_id();
7951  		rq = cpu_rq(cpu);
7952 -	} else
7953 +	} else {
7954 +		TS_SCHED_END(prev);
7955  		raw_spin_unlock_irq(&rq->lock);
7956 +	}
7957 +
7958 +	sched_trace_task_switch_to(current);
7959  
7960  	post_schedule(rq);
7961  
7962 +	if (sched_state_validate_switch())
7963 +		goto litmus_need_resched_nonpreemptible;
7964 +
7965  	preempt_enable_no_resched();
7966 +
7967  	if (need_resched())
7968  		goto need_resched;
7969 +
7970 +#ifdef LITMUS_SOFTIRQD
7971 +	reacquire_klitirqd_lock(prev);
7972 +#endif
7973 +
7974 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
7975 +	litmus->run_tasklets(prev);
7976 +#endif
7977 +
7978 +	srp_ceiling_block();
7979  }
7980  EXPORT_SYMBOL(schedule);
7981  
7982 +
7983 +
7984  #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
7985  
7986  static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
7987 @@ -4477,6 +4581,7 @@ static void __wake_up_common(wait_queue_head_t *q, unsigned int mode,
7988  	}
7989  }
7990  
7991 +
7992  /**
7993   * __wake_up - wake up threads blocked on a waitqueue.
7994   * @q: the waitqueue
7995 @@ -4600,6 +4705,17 @@ void complete_all(struct completion *x)
7996  }
7997  EXPORT_SYMBOL(complete_all);
7998  
7999 +void complete_n(struct completion *x, int n)
8000 +{
8001 +	unsigned long flags;
8002 +
8003 +	spin_lock_irqsave(&x->wait.lock, flags);
8004 +	x->done += n;
8005 +	__wake_up_common(&x->wait, TASK_NORMAL, n, 0, NULL);
8006 +	spin_unlock_irqrestore(&x->wait.lock, flags);
8007 +}
8008 +EXPORT_SYMBOL(complete_n);
8009 +
8010  static inline long __sched
8011  do_wait_for_common(struct completion *x, long timeout, int state)
8012  {
8013 @@ -4652,6 +4768,12 @@ void __sched wait_for_completion(struct completion *x)
8014  }
8015  EXPORT_SYMBOL(wait_for_completion);
8016  
8017 +void __sched __wait_for_completion_locked(struct completion *x)
8018 +{
8019 +	do_wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
8020 +}
8021 +EXPORT_SYMBOL(__wait_for_completion_locked);
8022 +
8023  /**
8024   * wait_for_completion_timeout: - waits for completion of a task (w/timeout)
8025   * @x:  holds the state of this particular completion
8026 @@ -5039,7 +5161,9 @@ __setscheduler(struct rq *rq, struct task_struct *p, int policy, int prio)
8027  	p->normal_prio = normal_prio(p);
8028  	/* we are holding p->pi_lock already */
8029  	p->prio = rt_mutex_getprio(p);
8030 -	if (rt_prio(p->prio))
8031 +	if (p->policy == SCHED_LITMUS)
8032 +		p->sched_class = &litmus_sched_class;
8033 +	else if (rt_prio(p->prio))
8034  		p->sched_class = &rt_sched_class;
8035  	else
8036  		p->sched_class = &fair_sched_class;
8037 @@ -5087,7 +5211,7 @@ recheck:
8038  
8039  		if (policy != SCHED_FIFO && policy != SCHED_RR &&
8040  				policy != SCHED_NORMAL && policy != SCHED_BATCH &&
8041 -				policy != SCHED_IDLE)
8042 +				policy != SCHED_IDLE && policy != SCHED_LITMUS)
8043  			return -EINVAL;
8044  	}
8045  
8046 @@ -5102,6 +5226,8 @@ recheck:
8047  		return -EINVAL;
8048  	if (rt_policy(policy) != (param->sched_priority != 0))
8049  		return -EINVAL;
8050 +	if (policy == SCHED_LITMUS && policy == p->policy)
8051 +		return -EINVAL;
8052  
8053  	/*
8054  	 * Allow unprivileged RT tasks to decrease priority:
8055 @@ -5145,6 +5271,12 @@ recheck:
8056  			return retval;
8057  	}
8058  
8059 +	if (policy == SCHED_LITMUS) {
8060 +		retval = litmus_admit_task(p);
8061 +		if (retval)
8062 +			return retval;
8063 +	}
8064 +
8065  	/*
8066  	 * make sure no PI-waiters arrive (or leave) while we are
8067  	 * changing the priority of the task:
8068 @@ -5203,10 +5335,19 @@ recheck:
8069  
8070  	p->sched_reset_on_fork = reset_on_fork;
8071  
8072 +	if (p->policy == SCHED_LITMUS)
8073 +		litmus_exit_task(p);
8074 +
8075  	oldprio = p->prio;
8076  	prev_class = p->sched_class;
8077  	__setscheduler(rq, p, policy, param->sched_priority);
8078  
8079 +	if (policy == SCHED_LITMUS) {
8080 +		p->rt_param.stack_in_use = running ? rq->cpu : NO_CPU;
8081 +		p->rt_param.present = running;
8082 +		litmus->task_new(p, on_rq, running);
8083 +	}
8084 +
8085  	if (running)
8086  		p->sched_class->set_curr_task(rq);
8087  	if (on_rq)
8088 @@ -5374,10 +5515,11 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
8089  	rcu_read_lock();
8090  
8091  	p = find_process_by_pid(pid);
8092 -	if (!p) {
8093 +	/* Don't set affinity if task not found and for LITMUS tasks */
8094 +	if (!p || is_realtime(p)) {
8095  		rcu_read_unlock();
8096  		put_online_cpus();
8097 -		return -ESRCH;
8098 +		return p ? -EPERM : -ESRCH;
8099  	}
8100  
8101  	/* Prevent p going away */
8102 diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
8103 index c768588..334eb47 100644
8104 --- a/kernel/sched_fair.c
8105 +++ b/kernel/sched_fair.c
8106 @@ -1890,6 +1890,9 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
8107  	int scale = cfs_rq->nr_running >= sched_nr_latency;
8108  	int next_buddy_marked = 0;
8109  
8110 +	if (unlikely(rt_prio(p->prio)) || p->policy == SCHED_LITMUS)
8111 +		goto preempt;
8112 +
8113  	if (unlikely(se == pse))
8114  		return;
8115  
8116 diff --git a/kernel/sched_rt.c b/kernel/sched_rt.c
8117 index 10d0182..58cf5d1 100644
8118 --- a/kernel/sched_rt.c
8119 +++ b/kernel/sched_rt.c
8120 @@ -1078,7 +1078,7 @@ static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
8121   */
8122  static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, int flags)
8123  {
8124 -	if (p->prio < rq->curr->prio) {
8125 +	if (p->prio < rq->curr->prio || p->policy == SCHED_LITMUS) {
8126  		resched_task(rq->curr);
8127  		return;
8128  	}
8129 diff --git a/kernel/semaphore.c b/kernel/semaphore.c
8130 index 94a62c0..c947a04 100644
8131 --- a/kernel/semaphore.c
8132 +++ b/kernel/semaphore.c
8133 @@ -33,11 +33,11 @@
8134  #include <linux/spinlock.h>
8135  #include <linux/ftrace.h>
8136  
8137 -static noinline void __down(struct semaphore *sem);
8138 +noinline void __down(struct semaphore *sem);
8139  static noinline int __down_interruptible(struct semaphore *sem);
8140  static noinline int __down_killable(struct semaphore *sem);
8141  static noinline int __down_timeout(struct semaphore *sem, long jiffies);
8142 -static noinline void __up(struct semaphore *sem);
8143 +noinline void __up(struct semaphore *sem);
8144  
8145  /**
8146   * down - acquire the semaphore
8147 @@ -190,11 +190,13 @@ EXPORT_SYMBOL(up);
8148  
8149  /* Functions for the contended case */
8150  
8151 +/*
8152  struct semaphore_waiter {
8153  	struct list_head list;
8154  	struct task_struct *task;
8155  	int up;
8156  };
8157 + */
8158  
8159  /*
8160   * Because this function is inlined, the 'state' parameter will be
8161 @@ -233,10 +235,12 @@ static inline int __sched __down_common(struct semaphore *sem, long state,
8162  	return -EINTR;
8163  }
8164  
8165 -static noinline void __sched __down(struct semaphore *sem)
8166 +noinline void __sched __down(struct semaphore *sem)
8167  {
8168  	__down_common(sem, TASK_UNINTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
8169  }
8170 +EXPORT_SYMBOL(__down);
8171 +
8172  
8173  static noinline int __sched __down_interruptible(struct semaphore *sem)
8174  {
8175 @@ -253,7 +257,7 @@ static noinline int __sched __down_timeout(struct semaphore *sem, long jiffies)
8176  	return __down_common(sem, TASK_UNINTERRUPTIBLE, jiffies);
8177  }
8178  
8179 -static noinline void __sched __up(struct semaphore *sem)
8180 +noinline void __sched __up(struct semaphore *sem)
8181  {
8182  	struct semaphore_waiter *waiter = list_first_entry(&sem->wait_list,
8183  						struct semaphore_waiter, list);
8184 @@ -261,3 +265,4 @@ static noinline void __sched __up(struct semaphore *sem)
8185  	waiter->up = 1;
8186  	wake_up_process(waiter->task);
8187  }
8188 +EXPORT_SYMBOL(__up);
8189 \ No newline at end of file
8190 diff --git a/kernel/softirq.c b/kernel/softirq.c
8191 index fca82c3..4d7b1a3 100644
8192 --- a/kernel/softirq.c
8193 +++ b/kernel/softirq.c
8194 @@ -29,6 +29,15 @@
8195  #include <trace/events/irq.h>
8196  
8197  #include <asm/irq.h>
8198 +
8199 +#include <litmus/litmus.h>
8200 +#include <litmus/sched_trace.h>
8201 +
8202 +#ifdef CONFIG_LITMUS_NVIDIA
8203 +#include <litmus/nvidia_info.h>
8204 +#include <litmus/trace.h>
8205 +#endif
8206 +
8207  /*
8208     - No shared variables, all the data are CPU local.
8209     - If a softirq needs serialization, let it serialize itself
8210 @@ -67,7 +76,7 @@ char *softirq_to_name[NR_SOFTIRQS] = {
8211   * to the pending events, so lets the scheduler to balance
8212   * the softirq load for us.
8213   */
8214 -static void wakeup_softirqd(void)
8215 +void wakeup_softirqd(void)
8216  {
8217  	/* Interrupts are disabled: no need to stop preemption */
8218  	struct task_struct *tsk = __this_cpu_read(ksoftirqd);
8219 @@ -193,6 +202,7 @@ void local_bh_enable_ip(unsigned long ip)
8220  }
8221  EXPORT_SYMBOL(local_bh_enable_ip);
8222  
8223 +
8224  /*
8225   * We restart softirq processing MAX_SOFTIRQ_RESTART times,
8226   * and we fall back to softirqd after that.
8227 @@ -206,65 +216,65 @@ EXPORT_SYMBOL(local_bh_enable_ip);
8228  
8229  asmlinkage void __do_softirq(void)
8230  {
8231 -	struct softirq_action *h;
8232 -	__u32 pending;
8233 -	int max_restart = MAX_SOFTIRQ_RESTART;
8234 -	int cpu;
8235 +    struct softirq_action *h;
8236 +    __u32 pending;
8237 +    int max_restart = MAX_SOFTIRQ_RESTART;
8238 +    int cpu;
8239  
8240 -	pending = local_softirq_pending();
8241 -	account_system_vtime(current);
8242 +    pending = local_softirq_pending();
8243 +    account_system_vtime(current);
8244  
8245 -	__local_bh_disable((unsigned long)__builtin_return_address(0),
8246 -				SOFTIRQ_OFFSET);
8247 -	lockdep_softirq_enter();
8248 +    __local_bh_disable((unsigned long)__builtin_return_address(0),
8249 +                SOFTIRQ_OFFSET);
8250 +    lockdep_softirq_enter();
8251  
8252 -	cpu = smp_processor_id();
8253 +    cpu = smp_processor_id();
8254  restart:
8255 -	/* Reset the pending bitmask before enabling irqs */
8256 -	set_softirq_pending(0);
8257 +    /* Reset the pending bitmask before enabling irqs */
8258 +    set_softirq_pending(0);
8259  
8260 -	local_irq_enable();
8261 +    local_irq_enable();
8262  
8263 -	h = softirq_vec;
8264 -
8265 -	do {
8266 -		if (pending & 1) {
8267 -			unsigned int vec_nr = h - softirq_vec;
8268 -			int prev_count = preempt_count();
8269 -
8270 -			kstat_incr_softirqs_this_cpu(vec_nr);
8271 -
8272 -			trace_softirq_entry(vec_nr);
8273 -			h->action(h);
8274 -			trace_softirq_exit(vec_nr);
8275 -			if (unlikely(prev_count != preempt_count())) {
8276 -				printk(KERN_ERR "huh, entered softirq %u %s %p"
8277 -				       "with preempt_count %08x,"
8278 -				       " exited with %08x?\n", vec_nr,
8279 -				       softirq_to_name[vec_nr], h->action,
8280 -				       prev_count, preempt_count());
8281 -				preempt_count() = prev_count;
8282 -			}
8283 +    h = softirq_vec;
8284  
8285 -			rcu_bh_qs(cpu);
8286 -		}
8287 -		h++;
8288 -		pending >>= 1;
8289 -	} while (pending);
8290 +    do {
8291 +        if (pending & 1) {
8292 +            unsigned int vec_nr = h - softirq_vec;
8293 +            int prev_count = preempt_count();
8294  
8295 -	local_irq_disable();
8296 +            kstat_incr_softirqs_this_cpu(vec_nr);
8297  
8298 -	pending = local_softirq_pending();
8299 -	if (pending && --max_restart)
8300 -		goto restart;
8301 +            trace_softirq_entry(vec_nr);
8302 +            h->action(h);
8303 +            trace_softirq_exit(vec_nr);
8304 +            if (unlikely(prev_count != preempt_count())) {
8305 +                printk(KERN_ERR "huh, entered softirq %u %s %p"
8306 +                       "with preempt_count %08x,"
8307 +                       " exited with %08x?\n", vec_nr,
8308 +                       softirq_to_name[vec_nr], h->action,
8309 +                       prev_count, preempt_count());
8310 +                preempt_count() = prev_count;
8311 +            }
8312  
8313 -	if (pending)
8314 -		wakeup_softirqd();
8315 +            rcu_bh_qs(cpu);
8316 +        }
8317 +        h++;
8318 +        pending >>= 1;
8319 +    } while (pending);
8320  
8321 -	lockdep_softirq_exit();
8322 +    local_irq_disable();
8323  
8324 -	account_system_vtime(current);
8325 -	__local_bh_enable(SOFTIRQ_OFFSET);
8326 +    pending = local_softirq_pending();
8327 +    if (pending && --max_restart)
8328 +        goto restart;
8329 +
8330 +    if (pending)
8331 +        wakeup_softirqd();
8332 +
8333 +    lockdep_softirq_exit();
8334 +
8335 +    account_system_vtime(current);
8336 +    __local_bh_enable(SOFTIRQ_OFFSET);
8337  }
8338  
8339  #ifndef __ARCH_HAS_DO_SOFTIRQ
8340 @@ -402,8 +412,99 @@ struct tasklet_head
8341  static DEFINE_PER_CPU(struct tasklet_head, tasklet_vec);
8342  static DEFINE_PER_CPU(struct tasklet_head, tasklet_hi_vec);
8343  
8344 +#ifdef CONFIG_LITMUS_NVIDIA
8345 +static int __do_nv_now(struct tasklet_struct* tasklet)
8346 +{
8347 +	int success = 1;
8348 +
8349 +	if(tasklet_trylock(tasklet)) {
8350 +		if (!atomic_read(&tasklet->count)) {
8351 +			if (!test_and_clear_bit(TASKLET_STATE_SCHED, &tasklet->state)) {
8352 +				BUG();
8353 +			}
8354 +			tasklet->func(tasklet->data);
8355 +			tasklet_unlock(tasklet);
8356 +		}
8357 +		else {
8358 +			success = 0;
8359 +		}
8360 +
8361 +		tasklet_unlock(tasklet);
8362 +	}
8363 +	else {
8364 +		success = 0;
8365 +	}
8366 +
8367 +	return success;
8368 +}
8369 +#endif
8370 +
8371 +
8372  void __tasklet_schedule(struct tasklet_struct *t)
8373  {
8374 +#ifdef CONFIG_LITMUS_NVIDIA
8375 +	if(is_nvidia_func(t->func))
8376 +	{
8377 +#if 0
8378 +		// do nvidia tasklets right away and return
8379 +		if(__do_nv_now(t))
8380 +			return;
8381 +#else
8382 +		u32 nvidia_device = get_tasklet_nv_device_num(t);
8383 +		//		TRACE("%s: Handling NVIDIA tasklet for device\t%u\tat\t%llu\n",
8384 +		//			  __FUNCTION__, nvidia_device,litmus_clock());
8385 +
8386 +		unsigned long flags;
8387 +		struct task_struct* device_owner;
8388 +
8389 +		lock_nv_registry(nvidia_device, &flags);
8390 +
8391 +		device_owner = get_nv_max_device_owner(nvidia_device);
8392 +
8393 +		if(device_owner==NULL)
8394 +		{
8395 +			t->owner = NULL;
8396 +		}
8397 +		else
8398 +		{
8399 +			if(is_realtime(device_owner))
8400 +			{
8401 +				TRACE("%s: Handling NVIDIA tasklet for device %u at %llu\n",
8402 +					  __FUNCTION__, nvidia_device,litmus_clock());
8403 +				TRACE("%s: the owner task %d of NVIDIA Device %u is RT-task\n",
8404 +					  __FUNCTION__,device_owner->pid,nvidia_device);
8405 +
8406 +				t->owner = device_owner;
8407 +				sched_trace_tasklet_release(t->owner);
8408 +
8409 +				if(likely(_litmus_tasklet_schedule(t,nvidia_device)))
8410 +				{
8411 +					unlock_nv_registry(nvidia_device, &flags);
8412 +					return;
8413 +				}
8414 +				else
8415 +				{
8416 +					t->owner = NULL; /* fall through to normal scheduling */
8417 +				}
8418 +			}
8419 +			else
8420 +			{
8421 +				t->owner = NULL;
8422 +			}
8423 +		}
8424 +		unlock_nv_registry(nvidia_device, &flags);
8425 +#endif
8426 +	}
8427 +
8428 +#endif
8429 +
8430 +	___tasklet_schedule(t);
8431 +}
8432 +EXPORT_SYMBOL(__tasklet_schedule);
8433 +
8434 +
8435 +void ___tasklet_schedule(struct tasklet_struct *t)
8436 +{
8437  	unsigned long flags;
8438  
8439  	local_irq_save(flags);
8440 @@ -413,11 +514,65 @@ void __tasklet_schedule(struct tasklet_struct *t)
8441  	raise_softirq_irqoff(TASKLET_SOFTIRQ);
8442  	local_irq_restore(flags);
8443  }
8444 +EXPORT_SYMBOL(___tasklet_schedule);
8445  
8446 -EXPORT_SYMBOL(__tasklet_schedule);
8447  
8448  void __tasklet_hi_schedule(struct tasklet_struct *t)
8449  {
8450 +#ifdef CONFIG_LITMUS_NVIDIA
8451 +	if(is_nvidia_func(t->func))
8452 +	{
8453 +		u32 nvidia_device = get_tasklet_nv_device_num(t);
8454 +		//		TRACE("%s: Handling NVIDIA tasklet for device\t%u\tat\t%llu\n",
8455 +		//			  __FUNCTION__, nvidia_device,litmus_clock());
8456 +
8457 +		unsigned long flags;
8458 +		struct task_struct* device_owner;
8459 +
8460 +		lock_nv_registry(nvidia_device, &flags);
8461 +
8462 +		device_owner = get_nv_max_device_owner(nvidia_device);
8463 +
8464 +		if(device_owner==NULL)
8465 +		{
8466 +			t->owner = NULL;
8467 +		}
8468 +		else
8469 +		{
8470 +			if( is_realtime(device_owner))
8471 +			{
8472 +				TRACE("%s: Handling NVIDIA tasklet for device %u\tat %llu\n",
8473 +					  __FUNCTION__, nvidia_device,litmus_clock());
8474 +				TRACE("%s: the owner task %d of NVIDIA Device %u is RT-task\n",
8475 +					  __FUNCTION__,device_owner->pid,nvidia_device);
8476 +
8477 +				t->owner = device_owner;
8478 +				sched_trace_tasklet_release(t->owner);
8479 +				if(likely(_litmus_tasklet_hi_schedule(t,nvidia_device)))
8480 +				{
8481 +					unlock_nv_registry(nvidia_device, &flags);
8482 +					return;
8483 +				}
8484 +				else
8485 +				{
8486 +					t->owner = NULL; /* fall through to normal scheduling */
8487 +				}
8488 +			}
8489 +			else
8490 +			{
8491 +				t->owner = NULL;
8492 +			}
8493 +		}
8494 +		unlock_nv_registry(nvidia_device, &flags);
8495 +	}
8496 +#endif
8497 +
8498 +	___tasklet_hi_schedule(t);
8499 +}
8500 +EXPORT_SYMBOL(__tasklet_hi_schedule);
8501 +
8502 +void ___tasklet_hi_schedule(struct tasklet_struct* t)
8503 +{
8504  	unsigned long flags;
8505  
8506  	local_irq_save(flags);
8507 @@ -427,19 +582,72 @@ void __tasklet_hi_schedule(struct tasklet_struct *t)
8508  	raise_softirq_irqoff(HI_SOFTIRQ);
8509  	local_irq_restore(flags);
8510  }
8511 -
8512 -EXPORT_SYMBOL(__tasklet_hi_schedule);
8513 +EXPORT_SYMBOL(___tasklet_hi_schedule);
8514  
8515  void __tasklet_hi_schedule_first(struct tasklet_struct *t)
8516  {
8517  	BUG_ON(!irqs_disabled());
8518 +#ifdef CONFIG_LITMUS_NVIDIA
8519 +	if(is_nvidia_func(t->func))
8520 +	{
8521 +		u32 nvidia_device = get_tasklet_nv_device_num(t);
8522 +		//		TRACE("%s: Handling NVIDIA tasklet for device\t%u\tat\t%llu\n",
8523 +		//			  __FUNCTION__, nvidia_device,litmus_clock());
8524 +		unsigned long flags;
8525 +		struct task_struct* device_owner;
8526 +
8527 +		lock_nv_registry(nvidia_device, &flags);
8528 +
8529 +		device_owner = get_nv_max_device_owner(nvidia_device);
8530 +
8531 +		if(device_owner==NULL)
8532 +		{
8533 +			t->owner = NULL;
8534 +		}
8535 +		else
8536 +		{
8537 +			if(is_realtime(device_owner))
8538 +			{
8539 +				TRACE("%s: Handling NVIDIA tasklet for device %u at %llu\n",
8540 +					  __FUNCTION__, nvidia_device,litmus_clock());
8541 +
8542 +				TRACE("%s: the owner task %d of NVIDIA Device %u is RT-task\n",
8543 +					  __FUNCTION__,device_owner->pid,nvidia_device);
8544 +
8545 +				t->owner = device_owner;
8546 +				sched_trace_tasklet_release(t->owner);
8547 +				if(likely(_litmus_tasklet_hi_schedule_first(t,nvidia_device)))
8548 +				{
8549 +					unlock_nv_registry(nvidia_device, &flags);
8550 +					return;
8551 +				}
8552 +				else
8553 +				{
8554 +					t->owner = NULL; /* fall through to normal scheduling */
8555 +				}
8556 +			}
8557 +			else
8558 +			{
8559 +				t->owner = NULL;
8560 +			}
8561 +		}
8562 +		unlock_nv_registry(nvidia_device, &flags);
8563 +	}
8564 +#endif
8565 +
8566 +	___tasklet_hi_schedule_first(t);
8567 +}
8568 +EXPORT_SYMBOL(__tasklet_hi_schedule_first);
8569 +
8570 +void ___tasklet_hi_schedule_first(struct tasklet_struct* t)
8571 +{
8572 +	BUG_ON(!irqs_disabled());
8573  
8574  	t->next = __this_cpu_read(tasklet_hi_vec.head);
8575  	__this_cpu_write(tasklet_hi_vec.head, t);
8576  	__raise_softirq_irqoff(HI_SOFTIRQ);
8577  }
8578 -
8579 -EXPORT_SYMBOL(__tasklet_hi_schedule_first);
8580 +EXPORT_SYMBOL(___tasklet_hi_schedule_first);
8581  
8582  static void tasklet_action(struct softirq_action *a)
8583  {
8584 @@ -495,6 +703,7 @@ static void tasklet_hi_action(struct softirq_action *a)
8585  			if (!atomic_read(&t->count)) {
8586  				if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
8587  					BUG();
8588 +
8589  				t->func(t->data);
8590  				tasklet_unlock(t);
8591  				continue;
8592 @@ -518,8 +727,13 @@ void tasklet_init(struct tasklet_struct *t,
8593  	t->next = NULL;
8594  	t->state = 0;
8595  	atomic_set(&t->count, 0);
8596 +
8597  	t->func = func;
8598  	t->data = data;
8599 +
8600 +#ifdef CONFIG_LITMUS_SOFTIRQD
8601 +	t->owner = NULL;
8602 +#endif
8603  }
8604  
8605  EXPORT_SYMBOL(tasklet_init);
8606 @@ -534,6 +748,7 @@ void tasklet_kill(struct tasklet_struct *t)
8607  			yield();
8608  		} while (test_bit(TASKLET_STATE_SCHED, &t->state));
8609  	}
8610 +
8611  	tasklet_unlock_wait(t);
8612  	clear_bit(TASKLET_STATE_SCHED, &t->state);
8613  }
8614 @@ -808,6 +1023,7 @@ void tasklet_kill_immediate(struct tasklet_struct *t, unsigned int cpu)
8615  	for (i = &per_cpu(tasklet_vec, cpu).head; *i; i = &(*i)->next) {
8616  		if (*i == t) {
8617  			*i = t->next;
8618 +
8619  			/* If this was the tail element, move the tail ptr */
8620  			if (*i == NULL)
8621  				per_cpu(tasklet_vec, cpu).tail = i;
8622 diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
8623 index d5097c4..0c0e02f 100644
8624 --- a/kernel/time/tick-sched.c
8625 +++ b/kernel/time/tick-sched.c
8626 @@ -766,12 +766,53 @@ static enum hrtimer_restart tick_sched_timer(struct hrtimer *timer)
8627  }
8628  
8629  /**
8630 + * tick_set_quanta_type - get the quanta type as a boot option
8631 + * Default is standard setup with ticks staggered over first
8632 + * half of tick period.
8633 + */
8634 +int quanta_type = LINUX_DEFAULT_TICKS;
8635 +static int __init tick_set_quanta_type(char *str)
8636 +{
8637 +	if (strcmp("aligned", str) == 0) {
8638 +		quanta_type = LITMUS_ALIGNED_TICKS;
8639 +		printk(KERN_INFO "LITMUS^RT: setting aligned quanta\n");
8640 +	}
8641 +	else if (strcmp("staggered", str) == 0) {
8642 +		quanta_type = LITMUS_STAGGERED_TICKS;
8643 +		printk(KERN_INFO "LITMUS^RT: setting staggered quanta\n");
8644 +	}
8645 +	return 1;
8646 +}
8647 +__setup("quanta=", tick_set_quanta_type);
8648 +
8649 +u64 cpu_stagger_offset(int cpu)
8650 +{
8651 +	u64 offset = 0;
8652 +	switch (quanta_type) {
8653 +		case LITMUS_ALIGNED_TICKS:
8654 +			offset = 0;
8655 +			break;
8656 +		case LITMUS_STAGGERED_TICKS:
8657 +			offset = ktime_to_ns(tick_period);
8658 +			do_div(offset, num_possible_cpus());
8659 +			offset *= cpu;
8660 +			break;
8661 +		default:
8662 +			offset = ktime_to_ns(tick_period) >> 1;
8663 +			do_div(offset, num_possible_cpus());
8664 +			offset *= cpu;
8665 +	}
8666 +	return offset;
8667 +}
8668 +
8669 +/**
8670   * tick_setup_sched_timer - setup the tick emulation timer
8671   */
8672  void tick_setup_sched_timer(void)
8673  {
8674  	struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
8675  	ktime_t now = ktime_get();
8676 +	u64 offset;
8677  
8678  	/*
8679  	 * Emulate tick processing via per-CPU hrtimers:
8680 @@ -782,6 +823,12 @@ void tick_setup_sched_timer(void)
8681  	/* Get the next period (per cpu) */
8682  	hrtimer_set_expires(&ts->sched_timer, tick_init_jiffy_update());
8683  
8684 +	/* Offset must be set correctly to achieve desired quanta type. */
8685 +	offset = cpu_stagger_offset(smp_processor_id());
8686 +
8687 +	/* Add the correct offset to expiration time */
8688 +	hrtimer_add_expires_ns(&ts->sched_timer, offset);
8689 +
8690  	for (;;) {
8691  		hrtimer_forward(&ts->sched_timer, now, tick_period);
8692  		hrtimer_start_expires(&ts->sched_timer,
8693 diff --git a/kernel/workqueue.c b/kernel/workqueue.c
8694 index 0400553..6b59d59 100644
8695 --- a/kernel/workqueue.c
8696 +++ b/kernel/workqueue.c
8697 @@ -44,6 +44,13 @@
8698  
8699  #include "workqueue_sched.h"
8700  
8701 +#ifdef CONFIG_LITMUS_NVIDIA
8702 +#include <litmus/litmus.h>
8703 +#include <litmus/sched_trace.h>
8704 +#include <litmus/nvidia_info.h>
8705 +#endif
8706 +
8707 +
8708  enum {
8709  	/* global_cwq flags */
8710  	GCWQ_MANAGE_WORKERS	= 1 << 0,	/* need to manage workers */
8711 @@ -1047,9 +1054,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
8712  		work_flags |= WORK_STRUCT_DELAYED;
8713  		worklist = &cwq->delayed_works;
8714  	}
8715 -
8716  	insert_work(cwq, work, worklist, work_flags);
8717 -
8718  	spin_unlock_irqrestore(&gcwq->lock, flags);
8719  }
8720  
8721 @@ -2687,10 +2692,70 @@ EXPORT_SYMBOL(cancel_delayed_work_sync);
8722   */
8723  int schedule_work(struct work_struct *work)
8724  {
8725 -	return queue_work(system_wq, work);
8726 +#if 0
8727 +#if defined(CONFIG_LITMUS_NVIDIA) && defined(CONFIG_LITMUS_SOFTIRQD)
8728 +	if(is_nvidia_func(work->func))
8729 +	{
8730 +		u32 nvidiaDevice = get_work_nv_device_num(work);
8731 +		
8732 +		//1) Ask Litmus which task owns GPU <nvidiaDevice>. (API to be defined.)
8733 +		unsigned long flags;
8734 +		struct task_struct* device_owner;
8735 +		
8736 +		lock_nv_registry(nvidiaDevice, &flags);
8737 +		
8738 +		device_owner = get_nv_max_device_owner(nvidiaDevice);
8739 +		
8740 +		//2) If there is an owner, set work->owner to the owner's task struct.
8741 +		if(device_owner==NULL) 
8742 +		{
8743 +			work->owner = NULL;
8744 +			//TRACE("%s: the owner task of NVIDIA Device %u is NULL\n",__FUNCTION__,nvidiaDevice);
8745 +		}
8746 +		else
8747 +		{
8748 +			if( is_realtime(device_owner))
8749 +			{
8750 +				TRACE("%s: Handling NVIDIA work for device\t%u\tat\t%llu\n",
8751 +					  __FUNCTION__, nvidiaDevice,litmus_clock());
8752 +				TRACE("%s: the owner task %d of NVIDIA Device %u is RT-task\n",
8753 +					  __FUNCTION__,
8754 +					  device_owner->pid,
8755 +					  nvidiaDevice);
8756 +				
8757 +				//3) Call litmus_schedule_work() and return (don't execute the rest
8758 +				//	of schedule_schedule()).
8759 +				work->owner = device_owner;
8760 +				sched_trace_work_release(work->owner);
8761 +				if(likely(litmus_schedule_work(work, nvidiaDevice)))
8762 +				{
8763 +					unlock_nv_registry(nvidiaDevice, &flags);
8764 +					return 1;
8765 +				}
8766 +				else
8767 +				{
8768 +					work->owner = NULL; /* fall through to normal work scheduling */
8769 +				}
8770 +			}
8771 +			else
8772 +			{
8773 +				work->owner = NULL;
8774 +			}
8775 +		}
8776 +		unlock_nv_registry(nvidiaDevice, &flags);
8777 +	}
8778 +#endif
8779 +#endif
8780 +	return(__schedule_work(work));
8781  }
8782  EXPORT_SYMBOL(schedule_work);
8783  
8784 +int __schedule_work(struct work_struct* work)
8785 +{
8786 +	return queue_work(system_wq, work);
8787 +}
8788 +EXPORT_SYMBOL(__schedule_work);
8789 +
8790  /*
8791   * schedule_work_on - put work task on a specific cpu
8792   * @cpu: cpu to put the work task on
8793 diff --git a/litmus/Kconfig b/litmus/Kconfig
8794 new file mode 100644
8795 index 0000000..03cc92c
8796 --- /dev/null
8797 +++ b/litmus/Kconfig
8798 @@ -0,0 +1,364 @@
8799 +menu "LITMUS^RT"
8800 +
8801 +menu "Scheduling"
8802 +
8803 +config PLUGIN_CEDF
8804 +        bool "Clustered-EDF"
8805 +	depends on X86 && SYSFS
8806 +        default y
8807 +        help
8808 +          Include the Clustered EDF (C-EDF) plugin in the kernel.
8809 +          This is appropriate for large platforms with shared caches.
8810 +          On smaller platforms (e.g., ARM PB11MPCore), using C-EDF
8811 +          makes little sense since there aren't any shared caches.
8812 +
8813 +config PLUGIN_PFAIR
8814 +	bool "PFAIR"
8815 +	depends on HIGH_RES_TIMERS && !NO_HZ
8816 +	default y
8817 +	help
8818 +	  Include the PFAIR plugin (i.e., the PD^2 scheduler) in the kernel.
8819 +	  The PFAIR plugin requires high resolution timers (for staggered quanta)
8820 +	  and does not support NO_HZ (quanta could be missed when the system is idle).
8821 +
8822 +	  If unsure, say Yes.
8823 +
8824 +config RELEASE_MASTER
8825 +        bool "Release-master Support"
8826 +	depends on ARCH_HAS_SEND_PULL_TIMERS
8827 +	default n
8828 +	help
8829 +           Allow one processor to act as a dedicated interrupt processor
8830 +           that services all timer interrupts, but that does not schedule
8831 +           real-time tasks. See RTSS'09 paper for details
8832 +	   (http://www.cs.unc.edu/~anderson/papers.html).
8833 +           Currently only supported by GSN-EDF.
8834 +
8835 +endmenu
8836 +
8837 +menu "Real-Time Synchronization"
8838 +
8839 +config NP_SECTION
8840 +        bool "Non-preemptive section support"
8841 +	default n
8842 +	help
8843 +	  Allow tasks to become non-preemptable.
8844 +          Note that plugins still need to explicitly support non-preemptivity.
8845 +          Currently, only GSN-EDF and PSN-EDF have such support.
8846 +
8847 +	  This is required to support locking protocols such as the FMLP.
8848 +	  If disabled, all tasks will be considered preemptable at all times.
8849 +
8850 +config LITMUS_LOCKING
8851 +        bool "Support for real-time locking protocols"
8852 +	depends on NP_SECTION
8853 +	default n
8854 +	help
8855 +	  Enable LITMUS^RT's deterministic multiprocessor real-time
8856 +	  locking protocols.
8857 +
8858 +	  Say Yes if you want to include locking protocols such as the FMLP and
8859 +	  Baker's SRP.
8860 +
8861 +config LITMUS_AFFINITY_LOCKING
8862 +	bool "Enable affinity infrastructure in k-exclusion locking protocols."
8863 +	depends on LITMUS_LOCKING
8864 +	default n
8865 +	help
8866 +	  Enable affinity tracking infrastructure in k-exclusion locking protocols.
8867 +	  This only enabled the *infrastructure* not actual affinity algorithms.
8868 +
8869 +	  If unsure, say No.
8870 +
8871 +config LITMUS_NESTED_LOCKING
8872 +		bool "Support for nested inheritance in locking protocols"
8873 +	depends on LITMUS_LOCKING
8874 +	default n
8875 +	help
8876 +	  Enable nested priority inheritance.
8877 +
8878 +config LITMUS_DGL_SUPPORT
8879 +	bool "Support for dynamic group locks"
8880 +	depends on LITMUS_NESTED_LOCKING
8881 +	default n
8882 +	help
8883 +	  Enable dynamic group lock support.
8884 +
8885 +config LITMUS_MAX_DGL_SIZE
8886 +	int "Maximum size of a dynamic group lock."
8887 +	depends on LITMUS_DGL_SUPPORT
8888 +	range 1 128
8889 +	default "10"
8890 +	help
8891 +		Dynamic group lock data structures are allocated on the process
8892 +		stack when a group is requested. We set a maximum size of
8893 +		locks in a dynamic group lock to avoid dynamic allocation.
8894 +
8895 +		TODO: Batch DGL requests exceeding LITMUS_MAX_DGL_SIZE.
8896 +
8897 +endmenu
8898 +
8899 +menu "Performance Enhancements"
8900 +
8901 +config SCHED_CPU_AFFINITY
8902 +	bool "Local Migration Affinity"
8903 +	depends on X86
8904 +	default y
8905 +	help
8906 +	  Rescheduled tasks prefer CPUs near to their previously used CPU.  This
8907 +	  may improve performance through possible preservation of cache affinity.
8908 +
8909 +	  Warning: May make bugs harder to find since tasks may migrate less often.
8910 +
8911 +	  NOTES:
8912 +	  	* Feature is not utilized by PFair/PD^2.
8913 +
8914 +	  Say Yes if unsure.
8915 +
8916 +endmenu
8917 +
8918 +menu "Tracing"
8919 +
8920 +config FEATHER_TRACE
8921 +	bool "Feather-Trace Infrastructure"
8922 +	default y
8923 +	help
8924 +	  Feather-Trace basic tracing infrastructure. Includes device file
8925 +	  driver and instrumentation point support.
8926 +
8927 +	  There are actually two implementations of Feather-Trace.
8928 +	  1) A slower, but portable, default implementation.
8929 +	  2) Architecture-specific implementations that rewrite kernel .text at runtime.
8930 +
8931 +	  If enabled, Feather-Trace will be based on 2) if available (currently only for x86).
8932 +	  However, if DEBUG_RODATA=y, then Feather-Trace will choose option 1) in any case
8933 +	  to avoid problems with write-protected .text pages.
8934 +
8935 +	  Bottom line: to avoid increased overheads, choose DEBUG_RODATA=n.
8936 +
8937 +	  Note that this option only enables the basic Feather-Trace infrastructure;
8938 +	  you still need to enable SCHED_TASK_TRACE and/or SCHED_OVERHEAD_TRACE to
8939 +	  actually enable any events.
8940 +
8941 +config SCHED_TASK_TRACE
8942 +	bool "Trace real-time tasks"
8943 +	depends on FEATHER_TRACE
8944 +	default y
8945 +	help
8946 +	  Include support for the sched_trace_XXX() tracing functions. This
8947 +          allows the collection of real-time task events such as job
8948 +	  completions, job releases, early completions, etc. This results in  a
8949 +	  small overhead in the scheduling code. Disable if the overhead is not
8950 +	  acceptable (e.g., benchmarking).
8951 +
8952 +	  Say Yes for debugging.
8953 +	  Say No for overhead tracing.
8954 +
8955 +config SCHED_TASK_TRACE_SHIFT
8956 +       int "Buffer size for sched_trace_xxx() events"
8957 +       depends on SCHED_TASK_TRACE
8958 +       range 8 15
8959 +       default 9
8960 +       help
8961 +
8962 +         Select the buffer size of sched_trace_xxx() events as a power of two.
8963 +	 These buffers are statically allocated as per-CPU data. Each event
8964 +	 requires 24 bytes storage plus one additional flag byte. Too large
8965 +	 buffers can cause issues with the per-cpu allocator (and waste
8966 +	 memory). Too small buffers can cause scheduling events to be lost. The
8967 +	 "right" size is workload dependent and depends on the number of tasks,
8968 +	 each task's period, each task's number of suspensions, and how often
8969 +	 the buffer is flushed.
8970 +
8971 +	 Examples: 12 =>   4k events
8972 +		   10 =>   1k events
8973 +		    8 =>  512 events
8974 +
8975 +config SCHED_OVERHEAD_TRACE
8976 +	bool "Record timestamps for overhead measurements"
8977 +	depends on FEATHER_TRACE
8978 +	default n
8979 +	help
8980 +	  Export event stream for overhead tracing.
8981 +	  Say Yes for overhead tracing.
8982 +
8983 +config SCHED_DEBUG_TRACE
8984 +	bool "TRACE() debugging"
8985 +	default y
8986 +	help
8987 +	  Include support for sched_trace_log_messageg(), which is used to
8988 +	  implement TRACE(). If disabled, no TRACE() messages will be included
8989 +	  in the kernel, and no overheads due to debugging statements will be
8990 +	  incurred by the scheduler. Disable if the overhead is not acceptable
8991 +	  (e.g. benchmarking).
8992 +
8993 +	  Say Yes for debugging.
8994 +	  Say No for overhead tracing.
8995 +
8996 +config SCHED_DEBUG_TRACE_SHIFT
8997 +       int "Buffer size for TRACE() buffer"
8998 +       depends on SCHED_DEBUG_TRACE
8999 +       range 14 22
9000 +       default 18
9001 +       help
9002 +
9003 +	Select the amount of memory needed per for the TRACE() buffer, as a
9004 +	power of two. The TRACE() buffer is global and statically allocated. If
9005 +	the buffer is too small, there will be holes in the TRACE() log if the
9006 +	buffer-flushing task is starved.
9007 +
9008 +	The default should be sufficient for most systems. Increase the buffer
9009 +	size if the log contains holes. Reduce the buffer size when running on
9010 +	a memory-constrained system.
9011 +
9012 +	Examples: 14 =>  16KB
9013 +		  18 => 256KB
9014 +		  20 =>   1MB
9015 +
9016 +        This buffer is exported to usespace using a misc device as
9017 +        'litmus/log'. On a system with default udev rules, a corresponding
9018 +        character device node should be created at /dev/litmus/log. The buffer
9019 +        can be flushed using cat, e.g., 'cat /dev/litmus/log > my_log_file.txt'.
9020 +
9021 +config SCHED_DEBUG_TRACE_CALLER
9022 +       bool "Include [function@file:line] tag in TRACE() log"
9023 +       depends on SCHED_DEBUG_TRACE
9024 +       default n
9025 +       help
9026 +         With this option enabled, TRACE() prepends
9027 +
9028 +	      "[<function name>@<filename>:<line number>]"
9029 +
9030 +	 to each message in the debug log. Enable this to aid in figuring out
9031 +         what was called in which order. The downside is that it adds a lot of
9032 +         clutter.
9033 +
9034 +	 If unsure, say No.
9035 +
9036 +config PREEMPT_STATE_TRACE
9037 +       bool "Trace preemption state machine transitions"
9038 +       depends on SCHED_DEBUG_TRACE
9039 +       default n
9040 +       help
9041 +         With this option enabled, each CPU will log when it transitions
9042 +	 states in the preemption state machine. This state machine is
9043 +	 used to determine how to react to IPIs (avoid races with in-flight IPIs).
9044 +
9045 +	 Warning: this creates a lot of information in the debug trace. Only
9046 +	 recommended when you are debugging preemption-related races.
9047 +
9048 +	 If unsure, say No.
9049 +
9050 +endmenu
9051 +
9052 +menu "Interrupt Handling"
9053 +
9054 +choice
9055 +	prompt "Scheduling of interrupt bottom-halves in Litmus."
9056 +	default LITMUS_SOFTIRQD_NONE
9057 +	depends on LITMUS_LOCKING && !LITMUS_THREAD_ALL_SOFTIRQ
9058 +	help
9059 +		Schedule tasklets with known priorities in Litmus.
9060 +
9061 +config LITMUS_SOFTIRQD_NONE
9062 +	bool "No tasklet scheduling in Litmus."
9063 +	help
9064 +	  Don't schedule tasklets in Litmus.  Default.
9065 +
9066 +config LITMUS_SOFTIRQD
9067 +	bool "Spawn klitirqd interrupt handling threads."
9068 +	help
9069 +	  Create klitirqd interrupt handling threads.  Work must be
9070 +	  specifically dispatched to these workers.  (Softirqs for
9071 +	  Litmus tasks are not magically redirected to klitirqd.)
9072 +
9073 +	  G-EDF/RM, C-EDF/RM ONLY for now!
9074 +
9075 +
9076 +config LITMUS_PAI_SOFTIRQD
9077 +	bool "Defer tasklets to context switch points."
9078 +	help
9079 +	  Only execute scheduled tasklet bottom halves at
9080 +	  scheduling points.  Trades context switch overhead
9081 +	  at the cost of non-preemptive durations of bottom half
9082 +	  processing.
9083 +
9084 +	  G-EDF/RM, C-EDF/RM ONLY for now!
9085 +
9086 +endchoice
9087 +
9088 +
9089 +config NR_LITMUS_SOFTIRQD
9090 +	   int "Number of klitirqd."
9091 +	   depends on LITMUS_SOFTIRQD
9092 +	   range 1 4096
9093 +	   default "1"
9094 +	   help
9095 +	     Should be <= to the number of CPUs in your system.
9096 +
9097 +config LITMUS_NVIDIA
9098 +	  bool "Litmus handling of NVIDIA interrupts."
9099 +	  default n
9100 +	  help
9101 +	    Direct tasklets from NVIDIA devices to Litmus's klitirqd
9102 +		or PAI interrupt handling routines.
9103 +
9104 +		If unsure, say No.
9105 +
9106 +config LITMUS_AFFINITY_AWARE_GPU_ASSINGMENT
9107 +	  bool "Enable affinity-aware heuristics to improve GPU assignment."
9108 +	  depends on LITMUS_NVIDIA && LITMUS_AFFINITY_LOCKING
9109 +	  default n
9110 +	  help
9111 +	    Enable several heuristics to improve the assignment
9112 +		of GPUs to real-time tasks to reduce the overheads
9113 +		of memory migrations.
9114 +
9115 +		If unsure, say No.
9116 +
9117 +config NV_DEVICE_NUM
9118 +	   int "Number of NVIDIA GPUs."
9119 +	   depends on LITMUS_SOFTIRQD || LITMUS_PAI_SOFTIRQD
9120 +	   range 1 4096
9121 +	   default "1"
9122 +	   help
9123 +	     Should be (<= to the number of CPUs) and
9124 +		 (<= to the number of GPUs) in your system.
9125 +
9126 +config NV_MAX_SIMULT_USERS
9127 +	int "Maximum number of threads sharing a GPU simultanously"
9128 +	depends on LITMUS_SOFTIRQD || LITMUS_PAI_SOFTIRQD
9129 +	range 1 3
9130 +	default "2"
9131 +	help
9132 +		Should be equal to the #copy_engines + #execution_engines
9133 +		of the GPUs in your system.
9134 +
9135 +		Scientific/Professional GPUs = 3  (ex. M2070, Quadro 6000?)
9136 +		Consumer Fermi/Kepler GPUs   = 2  (GTX-4xx thru -6xx)
9137 +		Older                        = 1  (ex. GTX-2xx)
9138 +
9139 +choice
9140 +	  prompt "CUDA/Driver Version Support"
9141 +	  default CUDA_4_0
9142 +	  depends on LITMUS_NVIDIA
9143 +	  help
9144 +	  	Select the version of CUDA/driver to support.
9145 +
9146 +config CUDA_4_0
9147 +	  bool "CUDA 4.0"
9148 +	  depends on LITMUS_NVIDIA
9149 +	  help
9150 +	  	Support CUDA 4.0 RC2 (dev. driver version: x86_64-270.40)
9151 +
9152 +config CUDA_3_2
9153 +	  bool "CUDA 3.2"
9154 +	  depends on LITMUS_NVIDIA
9155 +	  help
9156 +	  	Support CUDA 3.2 (dev. driver version: x86_64-260.24)
9157 +
9158 +endchoice
9159 +
9160 +endmenu
9161 +
9162 +endmenu
9163 diff --git a/litmus/Makefile b/litmus/Makefile
9164 new file mode 100644
9165 index 0000000..080cbf6
9166 --- /dev/null
9167 +++ b/litmus/Makefile
9168 @@ -0,0 +1,38 @@
9169 +#
9170 +# Makefile for LITMUS^RT
9171 +#
9172 +
9173 +obj-y     = sched_plugin.o litmus.o \
9174 +	    preempt.o \
9175 +	    litmus_proc.o \
9176 +	    budget.o \
9177 +	    clustered.o \
9178 +	    jobs.o \
9179 +	    sync.o \
9180 +	    rt_domain.o \
9181 +	    edf_common.o \
9182 +	    fdso.o \
9183 +	    locking.o \
9184 +	    srp.o \
9185 +	    bheap.o \
9186 +        binheap.o \
9187 +	    ctrldev.o \
9188 +	    sched_gsn_edf.o \
9189 +	    sched_psn_edf.o \
9190 +        kfmlp_lock.o
9191 +
9192 +obj-$(CONFIG_PLUGIN_CEDF) += sched_cedf.o
9193 +obj-$(CONFIG_PLUGIN_PFAIR) += sched_pfair.o
9194 +obj-$(CONFIG_SCHED_CPU_AFFINITY) += affinity.o
9195 +
9196 +obj-$(CONFIG_FEATHER_TRACE) += ft_event.o ftdev.o
9197 +obj-$(CONFIG_SCHED_TASK_TRACE) += sched_task_trace.o
9198 +obj-$(CONFIG_SCHED_DEBUG_TRACE) += sched_trace.o
9199 +obj-$(CONFIG_SCHED_OVERHEAD_TRACE) += trace.o
9200 +
9201 +obj-$(CONFIG_LITMUS_NESTED_LOCKING) += rsm_lock.o ikglp_lock.o
9202 +obj-$(CONFIG_LITMUS_SOFTIRQD) += litmus_softirq.o
9203 +obj-$(CONFIG_LITMUS_PAI_SOFTIRQD) += litmus_pai_softirq.o
9204 +obj-$(CONFIG_LITMUS_NVIDIA) += nvidia_info.o sched_trace_external.o
9205 +
9206 +obj-$(CONFIG_LITMUS_AFFINITY_LOCKING) += kexclu_affinity.o gpu_affinity.o
9207 diff --git a/litmus/affinity.c b/litmus/affinity.c
9208 new file mode 100644
9209 index 0000000..cd93249
9210 --- /dev/null
9211 +++ b/litmus/affinity.c
9212 @@ -0,0 +1,42 @@
9213 +#include <linux/cpu.h>
9214 +
9215 +#include <litmus/affinity.h>
9216 +
9217 +struct neighborhood neigh_info[NR_CPUS];
9218 +
9219 +/* called by _init_litmus() */
9220 +void init_topology(void) {
9221 +	int cpu;
9222 +	int i;
9223 +	int chk;
9224 +	int depth = num_cache_leaves;
9225 +
9226 +	if (depth > NUM_CACHE_LEVELS)
9227 +		depth = NUM_CACHE_LEVELS;
9228 +
9229 +	for_each_online_cpu(cpu) {
9230 +		for (i = 0; i < depth; ++i) {
9231 +			chk = get_shared_cpu_map((struct cpumask *)&neigh_info[cpu].neighbors[i], cpu, i);
9232 +			if (chk) {
9233 +				/* failed */
9234 +				neigh_info[cpu].size[i] = 0;
9235 +			} else {
9236 +				/* size = num bits in mask */
9237 +				neigh_info[cpu].size[i] =
9238 +					cpumask_weight((struct cpumask *)&neigh_info[cpu].neighbors[i]);
9239 +			}
9240 +			printk("CPU %d has %d neighbors at level %d. (mask = %lx)\n",
9241 +							cpu, neigh_info[cpu].size[i], i,
9242 +							*cpumask_bits(neigh_info[cpu].neighbors[i]));
9243 +		}
9244 +
9245 +		/* set data for non-existent levels */
9246 +		for (; i < NUM_CACHE_LEVELS; ++i) {
9247 +			neigh_info[cpu].size[i] = 0;
9248 +
9249 +			printk("CPU %d has %d neighbors at level %d. (mask = %lx)\n",
9250 +						cpu, neigh_info[cpu].size[i], i, 0lu);
9251 +		}
9252 +	}
9253 +}
9254 +
9255 diff --git a/litmus/bheap.c b/litmus/bheap.c
9256 new file mode 100644
9257 index 0000000..528af97
9258 --- /dev/null
9259 +++ b/litmus/bheap.c
9260 @@ -0,0 +1,314 @@
9261 +#include "linux/kernel.h"
9262 +#include "litmus/bheap.h"
9263 +
9264 +void bheap_init(struct bheap* heap)
9265 +{
9266 +	heap->head = NULL;
9267 +	heap->min  = NULL;
9268 +}
9269 +
9270 +void bheap_node_init(struct bheap_node** _h, void* value)
9271 +{
9272 +	struct bheap_node* h = *_h;
9273 +	h->parent = NULL;
9274 +	h->next   = NULL;
9275 +	h->child  = NULL;
9276 +	h->degree = NOT_IN_HEAP;
9277 +	h->value  = value;
9278 +	h->ref    = _h;
9279 +}
9280 +
9281 +
9282 +/* make child a subtree of root */
9283 +static void __bheap_link(struct bheap_node* root,
9284 +			struct bheap_node* child)
9285 +{
9286 +	child->parent = root;
9287 +	child->next   = root->child;
9288 +	root->child   = child;
9289 +	root->degree++;
9290 +}
9291 +
9292 +/* merge root lists */
9293 +static  struct bheap_node* __bheap_merge(struct bheap_node* a,
9294 +					     struct bheap_node* b)
9295 +{
9296 +	struct bheap_node* head = NULL;
9297 +	struct bheap_node** pos = &head;
9298 +
9299 +	while (a && b) {
9300 +		if (a->degree < b->degree) {
9301 +			*pos = a;
9302 +			a = a->next;
9303 +		} else {
9304 +			*pos = b;
9305 +			b = b->next;
9306 +		}
9307 +		pos = &(*pos)->next;
9308 +	}
9309 +	if (a)
9310 +		*pos = a;
9311 +	else
9312 +		*pos = b;
9313 +	return head;
9314 +}
9315 +
9316 +/* reverse a linked list of nodes. also clears parent pointer */
9317 +static  struct bheap_node* __bheap_reverse(struct bheap_node* h)
9318 +{
9319 +	struct bheap_node* tail = NULL;
9320 +	struct bheap_node* next;
9321 +
9322 +	if (!h)
9323 +		return h;
9324 +
9325 +	h->parent = NULL;
9326 +	while (h->next) {
9327 +		next    = h->next;
9328 +		h->next = tail;
9329 +		tail    = h;
9330 +		h       = next;
9331 +		h->parent = NULL;
9332 +	}
9333 +	h->next = tail;
9334 +	return h;
9335 +}
9336 +
9337 +static  void __bheap_min(bheap_prio_t higher_prio, struct bheap* heap,
9338 +			      struct bheap_node** prev, struct bheap_node** node)
9339 +{
9340 +	struct bheap_node *_prev, *cur;
9341 +	*prev = NULL;
9342 +
9343 +	if (!heap->head) {
9344 +		*node = NULL;
9345 +		return;
9346 +	}
9347 +
9348 +	*node = heap->head;
9349 +	_prev = heap->head;
9350 +	cur   = heap->head->next;
9351 +	while (cur) {
9352 +		if (higher_prio(cur, *node)) {
9353 +			*node = cur;
9354 +			*prev = _prev;
9355 +		}
9356 +		_prev = cur;
9357 +		cur   = cur->next;
9358 +	}
9359 +}
9360 +
9361 +static  void __bheap_union(bheap_prio_t higher_prio, struct bheap* heap,
9362 +				struct bheap_node* h2)
9363 +{
9364 +	struct bheap_node* h1;
9365 +	struct bheap_node *prev, *x, *next;
9366 +	if (!h2)
9367 +		return;
9368 +	h1 = heap->head;
9369 +	if (!h1) {
9370 +		heap->head = h2;
9371 +		return;
9372 +	}
9373 +	h1 = __bheap_merge(h1, h2);
9374 +	prev = NULL;
9375 +	x    = h1;
9376 +	next = x->next;
9377 +	while (next) {
9378 +		if (x->degree != next->degree ||
9379 +		    (next->next && next->next->degree == x->degree)) {
9380 +			/* nothing to do, advance */
9381 +			prev = x;
9382 +			x    = next;
9383 +		} else if (higher_prio(x, next)) {
9384 +			/* x becomes the root of next */
9385 +			x->next = next->next;
9386 +			__bheap_link(x, next);
9387 +		} else {
9388 +			/* next becomes the root of x */
9389 +			if (prev)
9390 +				prev->next = next;
9391 +			else
9392 +				h1 = next;
9393 +			__bheap_link(next, x);
9394 +			x = next;
9395 +		}
9396 +		next = x->next;
9397 +	}
9398 +	heap->head = h1;
9399 +}
9400 +
9401 +static struct bheap_node* __bheap_extract_min(bheap_prio_t higher_prio,
9402 +					    struct bheap* heap)
9403 +{
9404 +	struct bheap_node *prev, *node;
9405 +	__bheap_min(higher_prio, heap, &prev, &node);
9406 +	if (!node)
9407 +		return NULL;
9408 +	if (prev)
9409 +		prev->next = node->next;
9410 +	else
9411 +		heap->head = node->next;
9412 +	__bheap_union(higher_prio, heap, __bheap_reverse(node->child));
9413 +	return node;
9414 +}
9415 +
9416 +/* insert (and reinitialize) a node into the heap */
9417 +void bheap_insert(bheap_prio_t higher_prio, struct bheap* heap,
9418 +		 struct bheap_node* node)
9419 +{
9420 +	struct bheap_node *min;
9421 +	node->child  = NULL;
9422 +	node->parent = NULL;
9423 +	node->next   = NULL;
9424 +	node->degree = 0;
9425 +	if (heap->min && higher_prio(node, heap->min)) {
9426 +		/* swap min cache */
9427 +		min = heap->min;
9428 +		min->child  = NULL;
9429 +		min->parent = NULL;
9430 +		min->next   = NULL;
9431 +		min->degree = 0;
9432 +		__bheap_union(higher_prio, heap, min);
9433 +		heap->min   = node;
9434 +	} else
9435 +		__bheap_union(higher_prio, heap, node);
9436 +}
9437 +
9438 +void bheap_uncache_min(bheap_prio_t higher_prio, struct bheap* heap)
9439 +{
9440 +	struct bheap_node* min;
9441 +	if (heap->min) {
9442 +		min = heap->min;
9443 +		heap->min = NULL;
9444 +		bheap_insert(higher_prio, heap, min);
9445 +	}
9446 +}
9447 +
9448 +/* merge addition into target */
9449 +void bheap_union(bheap_prio_t higher_prio,
9450 +		struct bheap* target, struct bheap* addition)
9451 +{
9452 +	/* first insert any cached minima, if necessary */
9453 +	bheap_uncache_min(higher_prio, target);
9454 +	bheap_uncache_min(higher_prio, addition);
9455 +	__bheap_union(higher_prio, target, addition->head);
9456 +	/* this is a destructive merge */
9457 +	addition->head = NULL;
9458 +}
9459 +
9460 +struct bheap_node* bheap_peek(bheap_prio_t higher_prio,
9461 +			    struct bheap* heap)
9462 +{
9463 +	if (!heap->min)
9464 +		heap->min = __bheap_extract_min(higher_prio, heap);
9465 +	return heap->min;
9466 +}
9467 +
9468 +struct bheap_node* bheap_take(bheap_prio_t higher_prio,
9469 +			    struct bheap* heap)
9470 +{
9471 +	struct bheap_node *node;
9472 +	if (!heap->min)
9473 +		heap->min = __bheap_extract_min(higher_prio, heap);
9474 +	node = heap->min;
9475 +	heap->min = NULL;
9476 +	if (node)
9477 +		node->degree = NOT_IN_HEAP;
9478 +	return node;
9479 +}
9480 +
9481 +int bheap_decrease(bheap_prio_t higher_prio, struct bheap_node* node)
9482 +{
9483 +	struct bheap_node  *parent;
9484 +	struct bheap_node** tmp_ref;
9485 +	void* tmp;
9486 +
9487 +	/* bubble up */
9488 +	parent = node->parent;
9489 +	while (parent && higher_prio(node, parent)) {
9490 +		/* swap parent and node */
9491 +		tmp           = parent->value;
9492 +		parent->value = node->value;
9493 +		node->value   = tmp;
9494 +		/* swap references */
9495 +		*(parent->ref) = node;
9496 +		*(node->ref)   = parent;
9497 +		tmp_ref        = parent->ref;
9498 +		parent->ref    = node->ref;
9499 +		node->ref      = tmp_ref;
9500 +		/* step up */
9501 +		node   = parent;
9502 +		parent = node->parent;
9503 +	}
9504 +
9505 +	return parent != NULL;
9506 +}
9507 +
9508 +void bheap_delete(bheap_prio_t higher_prio, struct bheap* heap,
9509 +		 struct bheap_node* node)
9510 +{
9511 +	struct bheap_node *parent, *prev, *pos;
9512 +	struct bheap_node** tmp_ref;
9513 +	void* tmp;
9514 +
9515 +	if (heap->min != node) {
9516 +		/* bubble up */
9517 +		parent = node->parent;
9518 +		while (parent) {
9519 +			/* swap parent and node */
9520 +			tmp           = parent->value;
9521 +			parent->value = node->value;
9522 +			node->value   = tmp;
9523 +			/* swap references */
9524 +			*(parent->ref) = node;
9525 +			*(node->ref)   = parent;
9526 +			tmp_ref        = parent->ref;
9527 +			parent->ref    = node->ref;
9528 +			node->ref      = tmp_ref;
9529 +			/* step up */
9530 +			node   = parent;
9531 +			parent = node->parent;
9532 +		}
9533 +		/* now delete:
9534 +		 * first find prev */
9535 +		prev = NULL;
9536 +		pos  = heap->head;
9537 +		while (pos != node) {
9538 +			prev = pos;
9539 +			pos  = pos->next;
9540 +		}
9541 +		/* we have prev, now remove node */
9542 +		if (prev)
9543 +			prev->next = node->next;
9544 +		else
9545 +			heap->head = node->next;
9546 +		__bheap_union(higher_prio, heap, __bheap_reverse(node->child));
9547 +	} else
9548 +		heap->min = NULL;
9549 +	node->degree = NOT_IN_HEAP;
9550 +}
9551 +
9552 +/* allocate a heap node for value and insert into the heap */
9553 +int bheap_add(bheap_prio_t higher_prio, struct bheap* heap,
9554 +	     void* value, int gfp_flags)
9555 +{
9556 +	struct bheap_node* hn = bheap_node_alloc(gfp_flags);
9557 +	if (likely(hn)) {
9558 +		bheap_node_init(&hn, value);
9559 +		bheap_insert(higher_prio, heap, hn);
9560 +	}
9561 +	return hn != NULL;
9562 +}
9563 +
9564 +void* bheap_take_del(bheap_prio_t higher_prio,
9565 +		    struct bheap* heap)
9566 +{
9567 +	struct bheap_node* hn = bheap_take(higher_prio, heap);
9568 +	void* ret = NULL;
9569 +	if (hn) {
9570 +		ret = hn->value;
9571 +		bheap_node_free(hn);
9572 +	}
9573 +	return ret;
9574 +}
9575 diff --git a/litmus/binheap.c b/litmus/binheap.c
9576 new file mode 100644
9577 index 0000000..8d42403
9578 --- /dev/null
9579 +++ b/litmus/binheap.c
9580 @@ -0,0 +1,443 @@
9581 +#include <litmus/binheap.h>
9582 +
9583 +//extern void dump_node_data(struct binheap_node* parent, struct binheap_node* child);
9584 +//extern void dump_node_data2(struct binheap_handle *handle, struct binheap_node* bad_node);
9585 +
9586 +int binheap_is_in_this_heap(struct binheap_node *node,
9587 +	struct binheap_handle* heap)
9588 +{
9589 +	if(!binheap_is_in_heap(node)) {
9590 +		return 0;
9591 +	}
9592 +
9593 +	while(node->parent != NULL) {
9594 +		node = node->parent;
9595 +	}
9596 +
9597 +	return (node == heap->root);
9598 +}
9599 +
9600 +/* Update the node reference pointers.  Same logic as Litmus binomial heap. */
9601 +static void __update_ref(struct binheap_node *parent,
9602 +	struct binheap_node *child)
9603 +{
9604 +	*(parent->ref_ptr) = child;
9605 +	*(child->ref_ptr) = parent;
9606 +
9607 +	swap(parent->ref_ptr, child->ref_ptr);
9608 +}
9609 +
9610 +/* Swaps data between two nodes. */
9611 +static void __binheap_swap(struct binheap_node *parent,
9612 +	struct binheap_node *child)
9613 +{
9614 +//	if(parent == BINHEAP_POISON || child == BINHEAP_POISON) {
9615 +//		dump_node_data(parent, child);
9616 +//		BUG();
9617 +//	}
9618 +
9619 +	swap(parent->data, child->data);
9620 +	__update_ref(parent, child);
9621 +}
9622 +
9623 +
9624 +/* Swaps memory and data between two nodes. Actual nodes swap instead of
9625 + * just data.  Needed when we delete nodes from the heap.
9626 + */
9627 +static void __binheap_swap_safe(struct binheap_handle *handle,
9628 +	struct binheap_node *a,
9629 +	struct binheap_node *b)
9630 +{
9631 +	swap(a->data, b->data);
9632 +	__update_ref(a, b);
9633 +
9634 +	if((a->parent != NULL) && (a->parent == b->parent)) {
9635 +		/* special case: shared parent */
9636 +		swap(a->parent->left, a->parent->right);
9637 +	}
9638 +	else {
9639 +		/* Update pointers to swap parents. */
9640 +
9641 +		if(a->parent) {
9642 +			if(a == a->parent->left) {
9643 +				a->parent->left = b;
9644 +			}
9645 +			else {
9646 +				a->parent->right = b;
9647 +			}
9648 +		}
9649 +
9650 +		if(b->parent) {
9651 +			if(b == b->parent->left) {
9652 +				b->parent->left = a;
9653 +			}
9654 +			else {
9655 +				b->parent->right = a;
9656 +			}
9657 +		}
9658 +
9659 +		swap(a->parent, b->parent);
9660 +	}
9661 +
9662 +	/* swap children */
9663 +
9664 +	if(a->left) {
9665 +		a->left->parent = b;
9666 +
9667 +		if(a->right) {
9668 +			a->right->parent = b;
9669 +		}
9670 +	}
9671 +
9672 +	if(b->left) {
9673 +		b->left->parent = a;
9674 +
9675 +		if(b->right) {
9676 +			b->right->parent = a;
9677 +		}
9678 +	}
9679 +
9680 +	swap(a->left, b->left);
9681 +	swap(a->right, b->right);
9682 +
9683 +
9684 +	/* update next/last/root pointers */
9685 +
9686 +	if(a == handle->next) {
9687 +		handle->next = b;
9688 +	}
9689 +	else if(b == handle->next) {
9690 +		handle->next = a;
9691 +	}
9692 +
9693 +	if(a == handle->last) {
9694 +		handle->last = b;
9695 +	}
9696 +	else if(b == handle->last) {
9697 +		handle->last = a;
9698 +	}
9699 +
9700 +	if(a == handle->root) {
9701 +		handle->root = b;
9702 +	}
9703 +	else if(b == handle->root) {
9704 +		handle->root = a;
9705 +	}
9706 +}
9707 +
9708 +
9709 +/**
9710 + * Update the pointer to the last node in the complete binary tree.
9711 + * Called internally after the root node has been deleted.
9712 + */
9713 +static void __binheap_update_last(struct binheap_handle *handle)
9714 +{
9715 +	struct binheap_node *temp = handle->last;
9716 +
9717 +	/* find a "bend" in the tree. */
9718 +	while(temp->parent && (temp == temp->parent->left)) {
9719 +		temp = temp->parent;
9720 +	}
9721 +
9722 +	/* step over to sibling if we're not at root */
9723 +	if(temp->parent != NULL) {
9724 +		temp = temp->parent->left;
9725 +	}
9726 +
9727 +	/* now travel right as far as possible. */
9728 +	while(temp->right != NULL) {
9729 +		temp = temp->right;
9730 +	}
9731 +
9732 +	/* take one step to the left if we're not at the bottom-most level. */
9733 +	if(temp->left != NULL) {
9734 +		temp = temp->left;
9735 +	}
9736 +
9737 +	//BUG_ON(!(temp->left == NULL && temp->right == NULL));
9738 +
9739 +	handle->last = temp;
9740 +}
9741 +
9742 +/**
9743 + * Update the pointer to the node that will take the next inserted node.
9744 + * Called internally after a node has been inserted.
9745 + */
9746 +static void __binheap_update_next(struct binheap_handle *handle)
9747 +{
9748 +	struct binheap_node *temp = handle->next;
9749 +
9750 +	/* find a "bend" in the tree. */
9751 +	while(temp->parent && (temp == temp->parent->right)) {
9752 +		temp = temp->parent;
9753 +	}
9754 +
9755 +	/* step over to sibling if we're not at root */
9756 +	if(temp->parent != NULL) {
9757 +		temp = temp->parent->right;
9758 +	}
9759 +
9760 +	/* now travel left as far as possible. */
9761 +	while(temp->left != NULL) {
9762 +		temp = temp->left;
9763 +	}
9764 +
9765 +	handle->next = temp;
9766 +}
9767 +
9768 +
9769 +
9770 +/* bubble node up towards root */
9771 +static void __binheap_bubble_up(
9772 +	struct binheap_handle *handle,
9773 +	struct binheap_node *node)
9774 +{
9775 +	//BUG_ON(!binheap_is_in_heap(node));
9776 +//	if(!binheap_is_in_heap(node))
9777 +//	{
9778 +//		dump_node_data2(handle, node);
9779 +//		BUG();
9780 +//	}
9781 +
9782 +	while((node->parent != NULL) &&
9783 +		  ((node->data == BINHEAP_POISON) /* let BINHEAP_POISON data bubble to the top */ ||
9784 +		   handle->compare(node, node->parent))) {
9785 +			  __binheap_swap(node->parent, node);
9786 +			  node = node->parent;
9787 +
9788 +//			  if(!binheap_is_in_heap(node))
9789 +//			  {
9790 +//				  dump_node_data2(handle, node);
9791 +//				  BUG();
9792 +//			  }
9793 +	}
9794 +}
9795 +
9796 +
9797 +/* bubble node down, swapping with min-child */
9798 +static void __binheap_bubble_down(struct binheap_handle *handle)
9799 +{
9800 +	struct binheap_node *node = handle->root;
9801 +
9802 +	while(node->left != NULL) {
9803 +		if(node->right && handle->compare(node->right, node->left)) {
9804 +			if(handle->compare(node->right, node)) {
9805 +				__binheap_swap(node, node->right);
9806 +				node = node->right;
9807 +			}
9808 +			else {
9809 +				break;
9810 +			}
9811 +		}
9812 +		else {
9813 +			if(handle->compare(node->left, node)) {
9814 +				__binheap_swap(node, node->left);
9815 +				node = node->left;
9816 +			}
9817 +			else {
9818 +				break;
9819 +			}
9820 +		}
9821 +	}
9822 +}
9823 +
9824 +
9825 +
9826 +void __binheap_add(struct binheap_node *new_node,
9827 +	struct binheap_handle *handle,
9828 +	void *data)
9829 +{
9830 +//	if(binheap_is_in_heap(new_node))
9831 +//	{
9832 +//		dump_node_data2(handle, new_node);
9833 +//		BUG();
9834 +//	}
9835 +
9836 +	new_node->data = data;
9837 +	new_node->ref = new_node;
9838 +	new_node->ref_ptr = &(new_node->ref);
9839 +
9840 +	if(!binheap_empty(handle)) {
9841 +		/* insert left side first */
9842 +		if(handle->next->left == NULL) {
9843 +			handle->next->left = new_node;
9844 +			new_node->parent = handle->next;
9845 +			new_node->left = NULL;
9846 +			new_node->right = NULL;
9847 +
9848 +			handle->last = new_node;
9849 +
9850 +			__binheap_bubble_up(handle, new_node);
9851 +		}
9852 +		else {
9853 +			/* left occupied. insert right. */
9854 +			handle->next->right = new_node;
9855 +			new_node->parent = handle->next;
9856 +			new_node->left = NULL;
9857 +			new_node->right = NULL;
9858 +
9859 +			handle->last = new_node;
9860 +
9861 +			__binheap_update_next(handle);
9862 +			__binheap_bubble_up(handle, new_node);
9863 +		}
9864 +	}
9865 +	else {
9866 +		/* first node in heap */
9867 +
9868 +		new_node->parent = NULL;
9869 +		new_node->left = NULL;
9870 +		new_node->right = NULL;
9871 +
9872 +		handle->root = new_node;
9873 +		handle->next = new_node;
9874 +		handle->last = new_node;
9875 +	}
9876 +}
9877 +
9878 +
9879 +
9880 +/**
9881 + * Removes the root node from the heap. The node is removed after coalescing
9882 + * the binheap_node with its original data pointer at the root of the tree.
9883 + *
9884 + * The 'last' node in the tree is then swapped up to the root and bubbled
9885 + * down.
9886 + */
9887 +void __binheap_delete_root(struct binheap_handle *handle,
9888 +	struct binheap_node *container)
9889 +{
9890 +	struct binheap_node *root = handle->root;
9891 +
9892 +//	if(!binheap_is_in_heap(container))
9893 +//	{
9894 +//		dump_node_data2(handle, container);
9895 +//		BUG();
9896 +//	}
9897 +
9898 +	if(root != container) {
9899 +		/* coalesce */
9900 +		__binheap_swap_safe(handle, root, container);
9901 +		root = container;
9902 +	}
9903 +
9904 +	if(handle->last != root) {
9905 +		/* swap 'last' node up to root and bubble it down. */
9906 +
9907 +		struct binheap_node *to_move = handle->last;
9908 +
9909 +		if(to_move->parent != root) {
9910 +			handle->next = to_move->parent;
9911 +
9912 +			if(handle->next->right == to_move) {
9913 +				/* disconnect from parent */
9914 +				to_move->parent->right = NULL;
9915 +				handle->last = handle->next->left;
9916 +			}
9917 +			else {
9918 +				/* find new 'last' before we disconnect */
9919 +				__binheap_update_last(handle);
9920 +
9921 +				/* disconnect from parent */
9922 +				to_move->parent->left = NULL;
9923 +			}
9924 +		}
9925 +		else {
9926 +			/* 'last' is direct child of root */
9927 +
9928 +			handle->next = to_move;
9929 +
9930 +			if(to_move == to_move->parent->right) {
9931 +				to_move->parent->right = NULL;
9932 +				handle->last = to_move->parent->left;
9933 +			}
9934 +			else {
9935 +				to_move->parent->left = NULL;
9936 +				handle->last = to_move;
9937 +			}
9938 +		}
9939 +		to_move->parent = NULL;
9940 +
9941 +		/* reconnect as root.  We can't just swap data ptrs since root node
9942 +		 * may be freed after this function returns.
9943 +		 */
9944 +		to_move->left = root->left;
9945 +		to_move->right = root->right;
9946 +		if(to_move->left != NULL) {
9947 +			to_move->left->parent = to_move;
9948 +		}
9949 +		if(to_move->right != NULL) {
9950 +			to_move->right->parent = to_move;
9951 +		}
9952 +
9953 +		handle->root = to_move;
9954 +
9955 +		/* bubble down */
9956 +		__binheap_bubble_down(handle);
9957 +	}
9958 +	else {
9959 +		/* removing last node in tree */
9960 +		handle->root = NULL;
9961 +		handle->next = NULL;
9962 +		handle->last = NULL;
9963 +	}
9964 +
9965 +	/* mark as removed */
9966 +	container->parent = BINHEAP_POISON;
9967 +}
9968 +
9969 +
9970 +/**
9971 + * Delete an arbitrary node.  Bubble node to delete up to the root,
9972 + * and then delete to root.
9973 + */
9974 +void __binheap_delete(struct binheap_node *node_to_delete,
9975 +	struct binheap_handle *handle)
9976 +{
9977 +	struct binheap_node *target = node_to_delete->ref;
9978 +	void *temp_data = target->data;
9979 +
9980 +//	if(!binheap_is_in_heap(node_to_delete))
9981 +//	{
9982 +//		dump_node_data2(handle, node_to_delete);
9983 +//		BUG();
9984 +//	}
9985 +//
9986 +//	if(!binheap_is_in_heap(target))
9987 +//	{
9988 +//		dump_node_data2(handle, target);
9989 +//		BUG();
9990 +//	}
9991 +
9992 +	/* temporarily set data to null to allow node to bubble up to the top. */
9993 +	target->data = BINHEAP_POISON;
9994 +
9995 +	__binheap_bubble_up(handle, target);
9996 +	__binheap_delete_root(handle, node_to_delete);
9997 +
9998 +	node_to_delete->data = temp_data;  /* restore node data pointer */
9999 +	//node_to_delete->parent = BINHEAP_POISON; /* poison the node */
10000 +}
10001 +
10002 +/**
10003 + * Bubble up a node whose pointer has decreased in value.
10004 + */
10005 +void __binheap_decrease(struct binheap_node *orig_node,
10006 +	struct binheap_handle *handle)
10007 +{
10008 +	struct binheap_node *target = orig_node->ref;
10009 +
10010 +//	if(!binheap_is_in_heap(orig_node))
10011 +//	{
10012 +//		dump_node_data2(handle, orig_node);
10013 +//		BUG();
10014 +//	}
10015 +//
10016 +//	if(!binheap_is_in_heap(target))
10017 +//	{
10018 +//		dump_node_data2(handle, target);
10019 +//		BUG();
10020 +//	}
10021 +//
10022 +	__binheap_bubble_up(handle, target);
10023 +}
10024 diff --git a/litmus/budget.c b/litmus/budget.c
10025 new file mode 100644
10026 index 0000000..310e9a3
10027 --- /dev/null
10028 +++ b/litmus/budget.c
10029 @@ -0,0 +1,111 @@
10030 +#include <linux/sched.h>
10031 +#include <linux/percpu.h>
10032 +#include <linux/hrtimer.h>
10033 +
10034 +#include <litmus/litmus.h>
10035 +#include <litmus/preempt.h>
10036 +
10037 +struct enforcement_timer {
10038 +	/* The enforcement timer is used to accurately police
10039 +	 * slice budgets. */
10040 +	struct hrtimer		timer;
10041 +	int			armed;
10042 +};
10043 +
10044 +DEFINE_PER_CPU(struct enforcement_timer, budget_timer);
10045 +
10046 +static enum hrtimer_restart on_enforcement_timeout(struct hrtimer *timer)
10047 +{
10048 +	struct enforcement_timer* et = container_of(timer,
10049 +						    struct enforcement_timer,
10050 +						    timer);
10051 +	unsigned long flags;
10052 +
10053 +	local_irq_save(flags);
10054 +	TRACE("enforcement timer fired.\n");
10055 +	et->armed = 0;
10056 +	/* activate scheduler */
10057 +	litmus_reschedule_local();
10058 +	local_irq_restore(flags);
10059 +
10060 +	return  HRTIMER_NORESTART;
10061 +}
10062 +
10063 +/* assumes called with IRQs off */
10064 +static void cancel_enforcement_timer(struct enforcement_timer* et)
10065 +{
10066 +	int ret;
10067 +
10068 +	TRACE("cancelling enforcement timer.\n");
10069 +
10070 +	/* Since interrupts are disabled and et->armed is only
10071 +	 * modified locally, we do not need any locks.
10072 +	 */
10073 +
10074 +	if (et->armed) {
10075 +		ret = hrtimer_try_to_cancel(&et->timer);
10076 +		/* Should never be inactive. */
10077 +		BUG_ON(ret == 0);
10078 +		/* Should never be running concurrently. */
10079 +		BUG_ON(ret == -1);
10080 +
10081 +		et->armed = 0;
10082 +	}
10083 +}
10084 +
10085 +/* assumes called with IRQs off */
10086 +static void arm_enforcement_timer(struct enforcement_timer* et,
10087 +				  struct task_struct* t)
10088 +{
10089 +	lt_t when_to_fire;
10090 +	TRACE_TASK(t, "arming enforcement timer.\n");
10091 +
10092 +	/* Calling this when there is no budget left for the task
10093 +	 * makes no sense, unless the task is non-preemptive. */
10094 +	BUG_ON(budget_exhausted(t) && (!is_np(t)));
10095 +
10096 +	/* __hrtimer_start_range_ns() cancels the timer
10097 +	 * anyway, so we don't have to check whether it is still armed */
10098 +
10099 +	if (likely(!is_np(t))) {
10100 +		when_to_fire = litmus_clock() + budget_remaining(t);
10101 +		__hrtimer_start_range_ns(&et->timer,
10102 +					 ns_to_ktime(when_to_fire),
10103 +					 0 /* delta */,
10104 +					 HRTIMER_MODE_ABS_PINNED,
10105 +					 0 /* no wakeup */);
10106 +		et->armed = 1;
10107 +	}
10108 +}
10109 +
10110 +
10111 +/* expects to be called with IRQs off */
10112 +void update_enforcement_timer(struct task_struct* t)
10113 +{
10114 +	struct enforcement_timer* et = &__get_cpu_var(budget_timer);
10115 +
10116 +	if (t && budget_precisely_enforced(t)) {
10117 +		/* Make sure we call into the scheduler when this budget
10118 +		 * expires. */
10119 +		arm_enforcement_timer(et, t);
10120 +	} else if (et->armed) {
10121 +		/* Make sure we don't cause unnecessary interrupts. */
10122 +		cancel_enforcement_timer(et);
10123 +	}
10124 +}
10125 +
10126 +
10127 +static int __init init_budget_enforcement(void)
10128 +{
10129 +	int cpu;
10130 +	struct enforcement_timer* et;
10131 +
10132 +	for (cpu = 0; cpu < NR_CPUS; cpu++)  {
10133 +		et = &per_cpu(budget_timer, cpu);
10134 +		hrtimer_init(&et->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
10135 +		et->timer.function = on_enforcement_timeout;
10136 +	}
10137 +	return 0;
10138 +}
10139 +
10140 +module_init(init_budget_enforcement);
10141 diff --git a/litmus/clustered.c b/litmus/clustered.c
10142 new file mode 100644
10143 index 0000000..6fe1b51
10144 --- /dev/null
10145 +++ b/litmus/clustered.c
10146 @@ -0,0 +1,111 @@
10147 +#include <linux/gfp.h>
10148 +#include <linux/cpumask.h>
10149 +#include <linux/list.h>
10150 +
10151 +#include <litmus/clustered.h>
10152 +
10153 +#ifndef CONFIG_X86
10154 +/* fake get_shared_cpu_map() on non-x86 architectures */
10155 +
10156 +int get_shared_cpu_map(cpumask_var_t mask, unsigned int cpu, int index)
10157 +{
10158 +	if (index != 1)
10159 +		return 1;
10160 +	else {
10161 +		/* Fake L1: CPU is all by itself. */
10162 +		cpumask_clear(mask);
10163 +		cpumask_set_cpu(cpu, mask);
10164 +		return 0;
10165 +	}
10166 +}
10167 +
10168 +#endif
10169 +
10170 +int get_cluster_size(enum cache_level level)
10171 +{
10172 +	cpumask_var_t mask;
10173 +	int ok;
10174 +	int num_cpus;
10175 +
10176 +	if (level == GLOBAL_CLUSTER)
10177 +		return num_online_cpus();
10178 +	else {
10179 +		if (!zalloc_cpumask_var(&mask, GFP_ATOMIC))
10180 +			return -ENOMEM;
10181 +		/* assumes CPU 0 is representative of all CPUs */
10182 +		ok = get_shared_cpu_map(mask, 0, level);
10183 +		/* ok == 0 means we got the map; otherwise it's an invalid cache level */
10184 +		if (ok == 0)
10185 +			num_cpus = cpumask_weight(mask);
10186 +		free_cpumask_var(mask);
10187 +
10188 +		if (ok == 0)
10189 +			return num_cpus;
10190 +		else
10191 +			return -EINVAL;
10192 +	}
10193 +}
10194 +
10195 +int assign_cpus_to_clusters(enum cache_level level,
10196 +			    struct scheduling_cluster* clusters[],
10197 +			    unsigned int num_clusters,
10198 +			    struct cluster_cpu* cpus[],
10199 +			    unsigned int num_cpus)
10200 +{
10201 +	cpumask_var_t mask;
10202 +	unsigned int i, free_cluster = 0, low_cpu;
10203 +	int err = 0;
10204 +
10205 +	if (!zalloc_cpumask_var(&mask, GFP_ATOMIC))
10206 +		return -ENOMEM;
10207 +
10208 +	/* clear cluster pointers */
10209 +	for (i = 0; i < num_cpus; i++) {
10210 +		cpus[i]->id      = i;
10211 +		cpus[i]->cluster = NULL;
10212 +	}
10213 +
10214 +	/* initialize clusters */
10215 +	for (i = 0; i < num_clusters; i++) {
10216 +		clusters[i]->id = i;
10217 +		INIT_LIST_HEAD(&clusters[i]->cpus);
10218 +	}
10219 +
10220 +	/* Assign each CPU. Two assumtions are made:
10221 +	 * 1) The index of a cpu in cpus corresponds to its processor id (i.e., the index in a cpu mask).
10222 +	 * 2) All cpus that belong to some cluster are online.
10223 +	 */
10224 +	for_each_online_cpu(i) {
10225 +		/* get lowest-id CPU in cluster */
10226 +		if (level != GLOBAL_CLUSTER) {
10227 +			err = get_shared_cpu_map(mask, cpus[i]->id, level);
10228 +			if (err != 0) {
10229 +				/* ugh... wrong cache level? Either caller screwed up
10230 +				 * or the CPU topology is weird. */
10231 +				printk(KERN_ERR "Could not set up clusters for L%d sharing (max: L%d).\n",
10232 +				       level, err);
10233 +				err = -EINVAL;
10234 +				goto out;
10235 +			}
10236 +			low_cpu = cpumask_first(mask);
10237 +		} else
10238 +			low_cpu = 0;
10239 +		if (low_cpu == i) {
10240 +			/* caller must provide an appropriate number of clusters */
10241 +			BUG_ON(free_cluster >= num_clusters);
10242 +
10243 +			/* create new cluster */
10244 +			cpus[i]->cluster = clusters[free_cluster++];
10245 +		} else {
10246 +			/* low_cpu points to the right cluster
10247 +			 * Assumption: low_cpu is actually online and was processed earlier. */
10248 +			cpus[i]->cluster = cpus[low_cpu]->cluster;
10249 +		}
10250 +		/* enqueue in cpus list */
10251 +		list_add_tail(&cpus[i]->cluster_list, &cpus[i]->cluster->cpus);
10252 +		printk(KERN_INFO "Assigning CPU%u to cluster %u\n.", i, cpus[i]->cluster->id);
10253 +	}
10254 +out:
10255 +	free_cpumask_var(mask);
10256 +	return err;
10257 +}
10258 diff --git a/litmus/ctrldev.c b/litmus/ctrldev.c
10259 new file mode 100644
10260 index 0000000..6677a67
10261 --- /dev/null
10262 +++ b/litmus/ctrldev.c
10263 @@ -0,0 +1,150 @@
10264 +#include <linux/sched.h>
10265 +#include <linux/mm.h>
10266 +#include <linux/fs.h>
10267 +#include <linux/miscdevice.h>
10268 +#include <linux/module.h>
10269 +
10270 +#include <litmus/litmus.h>
10271 +
10272 +/* only one page for now, but we might want to add a RO version at some point */
10273 +
10274 +#define CTRL_NAME        "litmus/ctrl"
10275 +
10276 +/* allocate t->rt_param.ctrl_page*/
10277 +static int alloc_ctrl_page(struct task_struct *t)
10278 +{
10279 +	int err = 0;
10280 +
10281 +	/* only allocate if the task doesn't have one yet */
10282 +	if (!tsk_rt(t)->ctrl_page) {
10283 +		tsk_rt(t)->ctrl_page = (void*) get_zeroed_page(GFP_KERNEL);
10284 +		if (!tsk_rt(t)->ctrl_page)
10285 +			err = -ENOMEM;
10286 +		/* will get de-allocated in task teardown */
10287 +		TRACE_TASK(t, "%s ctrl_page = %p\n", __FUNCTION__,
10288 +			   tsk_rt(t)->ctrl_page);
10289 +	}
10290 +	return err;
10291 +}
10292 +
10293 +static int map_ctrl_page(struct task_struct *t, struct vm_area_struct* vma)
10294 +{
10295 +	int err;
10296 +	unsigned long pfn;
10297 +
10298 +	struct page* ctrl = virt_to_page(tsk_rt(t)->ctrl_page);
10299 +
10300 +	/* Increase ref count. Is decreased when vma is destroyed. */
10301 +	get_page(ctrl);
10302 +
10303 +	/* compute page frame number */
10304 +	pfn = page_to_pfn(ctrl);
10305 +
10306 +	TRACE_CUR(CTRL_NAME
10307 +		  ": mapping %p (pfn:%lx, %lx) to 0x%lx (prot:%lx)\n",
10308 +		  tsk_rt(t)->ctrl_page, pfn, page_to_pfn(ctrl), vma->vm_start,
10309 +		  vma->vm_page_prot);
10310 +
10311 +	/* Map it into the vma. Make sure to use PAGE_SHARED, otherwise
10312 +	 * userspace actually gets a copy-on-write page. */
10313 +	err = remap_pfn_range(vma, vma->vm_start, pfn, PAGE_SIZE, PAGE_SHARED);
10314 +
10315 +	if (err)
10316 +		TRACE_CUR(CTRL_NAME ": remap_pfn_range() failed (%d)\n", err);
10317 +
10318 +	return err;
10319 +}
10320 +
10321 +static void litmus_ctrl_vm_close(struct vm_area_struct* vma)
10322 +{
10323 +	TRACE_CUR("%s flags=0x%x prot=0x%x\n", __FUNCTION__,
10324 +		  vma->vm_flags, vma->vm_page_prot);
10325 +
10326 +	TRACE_CUR(CTRL_NAME
10327 +		  ": %p:%p vma:%p vma->vm_private_data:%p closed.\n",
10328 +		  (void*) vma->vm_start, (void*) vma->vm_end, vma,
10329 +		  vma->vm_private_data, current->comm,
10330 +		  current->pid);
10331 +}
10332 +
10333 +static int litmus_ctrl_vm_fault(struct vm_area_struct* vma,
10334 +				      struct vm_fault* vmf)
10335 +{
10336 +	/* This function should never be called, since
10337 +	 * all pages should have been mapped by mmap()
10338 +	 * already. */
10339 +	TRACE_CUR("%s flags=0x%x\n", __FUNCTION__, vma->vm_flags);
10340 +
10341 +	/* nope, you only get one page */
10342 +	return VM_FAULT_SIGBUS;
10343 +}
10344 +
10345 +static struct vm_operations_struct litmus_ctrl_vm_ops = {
10346 +	.close = litmus_ctrl_vm_close,
10347 +	.fault = litmus_ctrl_vm_fault,
10348 +};
10349 +
10350 +static int litmus_ctrl_mmap(struct file* filp, struct vm_area_struct* vma)
10351 +{
10352 +	int err = 0;
10353 +
10354 +	/* first make sure mapper knows what he's doing */
10355 +
10356 +	/* you can only get one page */
10357 +	if (vma->vm_end - vma->vm_start != PAGE_SIZE)
10358 +		return -EINVAL;
10359 +
10360 +	/* you can only map the "first" page */
10361 +	if (vma->vm_pgoff != 0)
10362 +		return -EINVAL;
10363 +
10364 +	/* you can't share it with anyone */
10365 +	if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED))
10366 +		return -EINVAL;
10367 +
10368 +	vma->vm_ops = &litmus_ctrl_vm_ops;
10369 +	/* this mapping should not be kept across forks,
10370 +	 * and cannot be expanded */
10371 +	vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
10372 +
10373 +	err = alloc_ctrl_page(current);
10374 +	if (!err)
10375 +		err = map_ctrl_page(current, vma);
10376 +
10377 +	TRACE_CUR("%s flags=0x%x prot=0x%lx\n",
10378 +		  __FUNCTION__, vma->vm_flags, vma->vm_page_prot);
10379 +
10380 +	return err;
10381 +}
10382 +
10383 +static struct file_operations litmus_ctrl_fops = {
10384 +	.owner = THIS_MODULE,
10385 +	.mmap  = litmus_ctrl_mmap,
10386 +};
10387 +
10388 +static struct miscdevice litmus_ctrl_dev = {
10389 +	.name  = CTRL_NAME,
10390 +	.minor = MISC_DYNAMIC_MINOR,
10391 +	.fops  = &litmus_ctrl_fops,
10392 +};
10393 +
10394 +static int __init init_litmus_ctrl_dev(void)
10395 +{
10396 +	int err;
10397 +
10398 +	BUILD_BUG_ON(sizeof(struct control_page) > PAGE_SIZE);
10399 +
10400 +	printk("Initializing LITMUS^RT control device.\n");
10401 +	err = misc_register(&litmus_ctrl_dev);
10402 +	if (err)
10403 +		printk("Could not allocate %s device (%d).\n", CTRL_NAME, err);
10404 +	return err;
10405 +}
10406 +
10407 +static void __exit exit_litmus_ctrl_dev(void)
10408 +{
10409 +	misc_deregister(&litmus_ctrl_dev);
10410 +}
10411 +
10412 +module_init(init_litmus_ctrl_dev);
10413 +module_exit(exit_litmus_ctrl_dev);
10414 diff --git a/litmus/edf_common.c b/litmus/edf_common.c
10415 new file mode 100644
10416 index 0000000..b346bdd
10417 --- /dev/null
10418 +++ b/litmus/edf_common.c
10419 @@ -0,0 +1,211 @@
10420 +/*
10421 + * kernel/edf_common.c
10422 + *
10423 + * Common functions for EDF based scheduler.
10424 + */
10425 +
10426 +#include <linux/percpu.h>
10427 +#include <linux/sched.h>
10428 +#include <linux/list.h>
10429 +
10430 +#include <litmus/litmus.h>
10431 +#include <litmus/sched_plugin.h>
10432 +#include <litmus/sched_trace.h>
10433 +
10434 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
10435 +#include <litmus/locking.h>
10436 +#endif
10437 +
10438 +#include <litmus/edf_common.h>
10439 +
10440 +
10441 +
10442 +/* edf_higher_prio -  returns true if first has a higher EDF priority
10443 + *                    than second. Deadline ties are broken by PID.
10444 + *
10445 + * both first and second may be NULL
10446 + */
10447 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
10448 +int __edf_higher_prio(
10449 +	struct task_struct* first, comparison_mode_t first_mode,
10450 +	struct task_struct* second, comparison_mode_t second_mode)
10451 +#else
10452 +int edf_higher_prio(struct task_struct* first, struct task_struct* second)
10453 +#endif
10454 +{
10455 +	struct task_struct *first_task = first;
10456 +	struct task_struct *second_task = second;
10457 +
10458 +	/* There is no point in comparing a task to itself. */
10459 +	if (first && first == second) {
10460 +		TRACE_CUR("WARNING: pointless edf priority comparison: %s/%d\n", first->comm, first->pid);
10461 +		WARN_ON(1);
10462 +		return 0;
10463 +	}
10464 +
10465 +
10466 +	/* check for NULL tasks */
10467 +	if (!first || !second) {
10468 +		return first && !second;
10469 +	}
10470 +
10471 +#ifdef CONFIG_LITMUS_LOCKING
10472 +	/* Check for EFFECTIVE priorities. Change task
10473 +	 * used for comparison in such a case.
10474 +	 */
10475 +	if (unlikely(first->rt_param.inh_task)
10476 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
10477 +		&& (first_mode == EFFECTIVE)
10478 +#endif
10479 +		) {
10480 +		first_task = first->rt_param.inh_task;
10481 +	}
10482 +	if (unlikely(second->rt_param.inh_task)
10483 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
10484 +		&& (second_mode == EFFECTIVE)
10485 +#endif
10486 +		) {
10487 +		second_task = second->rt_param.inh_task;
10488 +	}
10489 +
10490 +	/* Check for priority boosting. Tie-break by start of boosting.
10491 +	 */
10492 +	if (unlikely(is_priority_boosted(first_task))) {
10493 +		/* first_task is boosted, how about second_task? */
10494 +		if (!is_priority_boosted(second_task) ||
10495 +		    lt_before(get_boost_start(first_task),
10496 +					  get_boost_start(second_task))) {
10497 +			return 1;
10498 +		}
10499 +		else {
10500 +			return 0;
10501 +		}
10502 +	}
10503 +	else if (unlikely(is_priority_boosted(second_task))) {
10504 +		/* second_task is boosted, first is not*/
10505 +		return 0;
10506 +	}
10507 +
10508 +#endif
10509 +
10510 +//	// rate-monotonic for testing
10511 +//	if (!is_realtime(second_task)) {
10512 +//		return true;
10513 +//	}
10514 +//
10515 +//	if (shorter_period(first_task, second_task)) {
10516 +//		return true;
10517 +//	}
10518 +//
10519 +//	if (get_period(first_task) == get_period(second_task)) {
10520 +//		if (first_task->pid < second_task->pid) {
10521 +//			return true;
10522 +//		}
10523 +//		else if (first_task->pid == second_task->pid) {
10524 +//			return !second->rt_param.inh_task;
10525 +//		}
10526 +//	}
10527 +
10528 +	if (!is_realtime(second_task)) {
10529 +		return true;
10530 +	}
10531 +
10532 +	if (earlier_deadline(first_task, second_task)) {
10533 +		return true;
10534 +	}
10535 +	if (get_deadline(first_task) == get_deadline(second_task)) {
10536 +
10537 +		if (shorter_period(first_task, second_task)) {
10538 +			return true;
10539 +		}
10540 +		if (get_rt_period(first_task) == get_rt_period(second_task)) {
10541 +			if (first_task->pid < second_task->pid) {
10542 +				return true;
10543 +			}
10544 +			if (first_task->pid == second_task->pid) {
10545 +#ifdef CONFIG_LITMUS_SOFTIRQD
10546 +				if (first_task->rt_param.is_proxy_thread <
10547 +					second_task->rt_param.is_proxy_thread) {
10548 +					return true;
10549 +				}
10550 +				if(first_task->rt_param.is_proxy_thread == second_task->rt_param.is_proxy_thread) {
10551 +					return !second->rt_param.inh_task;
10552 +				}
10553 +#else
10554 +				return !second->rt_param.inh_task;
10555 +#endif
10556 +			}
10557 +
10558 +		}
10559 +	}
10560 +
10561 +	return false;
10562 +}
10563 +
10564 +
10565 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
10566 +int edf_higher_prio(struct task_struct* first, struct task_struct* second)
10567 +{
10568 +	return __edf_higher_prio(first, EFFECTIVE, second, EFFECTIVE);
10569 +}
10570 +
10571 +int edf_max_heap_order(struct binheap_node *a, struct binheap_node *b)
10572 +{
10573 +	struct nested_info *l_a = (struct nested_info *)binheap_entry(a, struct nested_info, hp_binheap_node);
10574 +	struct nested_info *l_b = (struct nested_info *)binheap_entry(b, struct nested_info, hp_binheap_node);
10575 +
10576 +	return __edf_higher_prio(l_a->hp_waiter_eff_prio, EFFECTIVE, l_b->hp_waiter_eff_prio, EFFECTIVE);
10577 +}
10578 +
10579 +int edf_min_heap_order(struct binheap_node *a, struct binheap_node *b)
10580 +{
10581 +	return edf_max_heap_order(b, a);  // swap comparison
10582 +}
10583 +
10584 +int edf_max_heap_base_priority_order(struct binheap_node *a, struct binheap_node *b)
10585 +{
10586 +	struct nested_info *l_a = (struct nested_info *)binheap_entry(a, struct nested_info, hp_binheap_node);
10587 +	struct nested_info *l_b = (struct nested_info *)binheap_entry(b, struct nested_info, hp_binheap_node);
10588 +
10589 +	return __edf_higher_prio(l_a->hp_waiter_eff_prio, BASE, l_b->hp_waiter_eff_prio, BASE);
10590 +}
10591 +
10592 +int edf_min_heap_base_priority_order(struct binheap_node *a, struct binheap_node *b)
10593 +{
10594 +	return edf_max_heap_base_priority_order(b, a);  // swap comparison
10595 +}
10596 +#endif
10597 +
10598 +
10599 +int edf_ready_order(struct bheap_node* a, struct bheap_node* b)
10600 +{
10601 +	return edf_higher_prio(bheap2task(a), bheap2task(b));
10602 +}
10603 +
10604 +void edf_domain_init(rt_domain_t* rt, check_resched_needed_t resched,
10605 +		      release_jobs_t release)
10606 +{
10607 +	rt_domain_init(rt,  edf_ready_order, resched, release);
10608 +}
10609 +
10610 +/* need_to_preempt - check whether the task t needs to be preempted
10611 + *                   call only with irqs disabled and with  ready_lock acquired
10612 + *                   THIS DOES NOT TAKE NON-PREEMPTIVE SECTIONS INTO ACCOUNT!
10613 + */
10614 +int edf_preemption_needed(rt_domain_t* rt, struct task_struct *t)
10615 +{
10616 +	/* we need the read lock for edf_ready_queue */
10617 +	/* no need to preempt if there is nothing pending */
10618 +	if (!__jobs_pending(rt))
10619 +		return 0;
10620 +	/* we need to reschedule if t doesn't exist */
10621 +	if (!t)
10622 +		return 1;
10623 +
10624 +	/* NOTE: We cannot check for non-preemptibility since we
10625 +	 *       don't know what address space we're currently in.
10626 +	 */
10627 +
10628 +	/* make sure to get non-rt stuff out of the way */
10629 +	return !is_realtime(t) || edf_higher_prio(__next_ready(rt), t);
10630 +}
10631 diff --git a/litmus/fdso.c b/litmus/fdso.c
10632 new file mode 100644
10633 index 0000000..18fc61b
10634 --- /dev/null
10635 +++ b/litmus/fdso.c
10636 @@ -0,0 +1,306 @@
10637 +/* fdso.c - file descriptor attached shared objects
10638 + *
10639 + * (c) 2007 B. Brandenburg, LITMUS^RT project
10640 + *
10641 + * Notes:
10642 + *   - objects descriptor (OD) tables are not cloned during a fork.
10643 + *   - objects are created on-demand, and freed after the last reference
10644 + *     is dropped.
10645 + *   - for now, object types are hard coded.
10646 + *   - As long as we have live objects, we keep a reference to the inode.
10647 + */
10648 +
10649 +#include <linux/errno.h>
10650 +#include <linux/sched.h>
10651 +#include <linux/mutex.h>
10652 +#include <linux/file.h>
10653 +#include <asm/uaccess.h>
10654 +
10655 +#include <litmus/fdso.h>
10656 +
10657 +extern struct fdso_ops generic_lock_ops;
10658 +
10659 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
10660 +extern struct fdso_ops generic_affinity_ops;
10661 +#endif
10662 +
10663 +static const struct fdso_ops* fdso_ops[] = {
10664 +	&generic_lock_ops, /* FMLP_SEM */
10665 +	&generic_lock_ops, /* SRP_SEM */
10666 +	&generic_lock_ops, /* RSM_MUTEX */
10667 +	&generic_lock_ops, /* IKGLP_SEM */
10668 +	&generic_lock_ops, /* KFMLP_SEM */
10669 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
10670 +	&generic_affinity_ops, /* IKGLP_SIMPLE_GPU_AFF_OBS */
10671 +	&generic_affinity_ops, /* IKGLP_GPU_AFF_OBS */
10672 +	&generic_affinity_ops, /* KFMLP_SIMPLE_GPU_AFF_OBS */
10673 +	&generic_affinity_ops, /* KFMLP_GPU_AFF_OBS */
10674 +#endif
10675 +};
10676 +
10677 +static int fdso_create(void** obj_ref, obj_type_t type, void* __user config)
10678 +{
10679 +	if (fdso_ops[type]->create)
10680 +		return fdso_ops[type]->create(obj_ref, type, config);
10681 +	else
10682 +		return -EINVAL;
10683 +}
10684 +
10685 +static void fdso_destroy(obj_type_t type, void* obj)
10686 +{
10687 +	fdso_ops[type]->destroy(type, obj);
10688 +}
10689 +
10690 +static int fdso_open(struct od_table_entry* entry, void* __user config)
10691 +{
10692 +	if (fdso_ops[entry->obj->type]->open)
10693 +		return fdso_ops[entry->obj->type]->open(entry, config);
10694 +	else
10695 +		return 0;
10696 +}
10697 +
10698 +static int fdso_close(struct od_table_entry* entry)
10699 +{
10700 +	if (fdso_ops[entry->obj->type]->close)
10701 +		return fdso_ops[entry->obj->type]->close(entry);
10702 +	else
10703 +		return 0;
10704 +}
10705 +
10706 +/* inode must be locked already */
10707 +static int alloc_inode_obj(struct inode_obj_id** obj_ref,
10708 +			   struct inode* inode,
10709 +			   obj_type_t type,
10710 +			   unsigned int id,
10711 +			   void* __user config)
10712 +{
10713 +	struct inode_obj_id* obj;
10714 +	void* raw_obj;
10715 +	int err;
10716 +
10717 +	obj = kmalloc(sizeof(*obj), GFP_KERNEL);
10718 +	if (!obj) {
10719 +		return -ENOMEM;
10720 +	}
10721 +
10722 +	err = fdso_create(&raw_obj, type, config);
10723 +	if (err != 0) {
10724 +		kfree(obj);
10725 +		return err;
10726 +	}
10727 +
10728 +	INIT_LIST_HEAD(&obj->list);
10729 +	atomic_set(&obj->count, 1);
10730 +	obj->type  = type;
10731 +	obj->id    = id;
10732 +	obj->obj   = raw_obj;
10733 +	obj->inode = inode;
10734 +
10735 +	list_add(&obj->list, &inode->i_obj_list);
10736 +	atomic_inc(&inode->i_count);
10737 +
10738 +	printk(KERN_DEBUG "alloc_inode_obj(%p, %d, %d): object created\n", inode, type, id);
10739 +
10740 +	*obj_ref = obj;
10741 +	return 0;
10742 +}
10743 +
10744 +/* inode must be locked already */
10745 +static struct inode_obj_id* get_inode_obj(struct inode* inode,
10746 +					  obj_type_t type,
10747 +					  unsigned int id)
10748 +{
10749 +	struct list_head* pos;
10750 +	struct inode_obj_id* obj = NULL;
10751 +
10752 +	list_for_each(pos, &inode->i_obj_list) {
10753 +		obj = list_entry(pos, struct inode_obj_id, list);
10754 +		if (obj->id == id && obj->type == type) {
10755 +			atomic_inc(&obj->count);
10756 +			return obj;
10757 +		}
10758 +	}
10759 +	printk(KERN_DEBUG "get_inode_obj(%p, %d, %d): couldn't find object\n", inode, type, id);
10760 +	return NULL;
10761 +}
10762 +
10763 +
10764 +static void put_inode_obj(struct inode_obj_id* obj)
10765 +{
10766 +	struct inode* inode;
10767 +	int let_go = 0;
10768 +
10769 +	inode = obj->inode;
10770 +	if (atomic_dec_and_test(&obj->count)) {
10771 +
10772 +		mutex_lock(&inode->i_obj_mutex);
10773 +		/* no new references can be obtained */
10774 +		if (!atomic_read(&obj->count)) {
10775 +			list_del(&obj->list);
10776 +			fdso_destroy(obj->type, obj->obj);
10777 +			kfree(obj);
10778 +			let_go = 1;
10779 +		}
10780 +		mutex_unlock(&inode->i_obj_mutex);
10781 +		if (let_go)
10782 +			iput(inode);
10783 +	}
10784 +}
10785 +
10786 +static struct od_table_entry*  get_od_entry(struct task_struct* t)
10787 +{
10788 +	struct od_table_entry* table;
10789 +	int i;
10790 +
10791 +
10792 +	table = t->od_table;
10793 +	if (!table) {
10794 +		table = kzalloc(sizeof(*table) * MAX_OBJECT_DESCRIPTORS,
10795 +				GFP_KERNEL);
10796 +		t->od_table = table;
10797 +	}
10798 +
10799 +	for (i = 0; table &&  i < MAX_OBJECT_DESCRIPTORS; i++)
10800 +		if (!table[i].used) {
10801 +			table[i].used = 1;
10802 +			return table + i;
10803 +		}
10804 +	return NULL;
10805 +}
10806 +
10807 +static int put_od_entry(struct od_table_entry* od)
10808 +{
10809 +	put_inode_obj(od->obj);
10810 +	od->used = 0;
10811 +	return 0;
10812 +}
10813 +
10814 +void exit_od_table(struct task_struct* t)
10815 +{
10816 +	int i;
10817 +
10818 +	if (t->od_table) {
10819 +		for (i = 0; i < MAX_OBJECT_DESCRIPTORS; i++)
10820 +			if (t->od_table[i].used)
10821 +				put_od_entry(t->od_table + i);
10822 +		kfree(t->od_table);
10823 +		t->od_table = NULL;
10824 +	}
10825 +}
10826 +
10827 +static int do_sys_od_open(struct file* file, obj_type_t type, int id,
10828 +			  void* __user config)
10829 +{
10830 +	int idx = 0, err = 0;
10831 +	struct inode* inode;
10832 +	struct inode_obj_id* obj = NULL;
10833 +	struct od_table_entry* entry;
10834 +
10835 +	inode = file->f_dentry->d_inode;
10836 +
10837 +	entry = get_od_entry(current);
10838 +	if (!entry)
10839 +		return -ENOMEM;
10840 +
10841 +	mutex_lock(&inode->i_obj_mutex);
10842 +	obj = get_inode_obj(inode, type, id);
10843 +	if (!obj)
10844 +		err = alloc_inode_obj(&obj, inode, type, id, config);
10845 +	if (err != 0) {
10846 +		obj = NULL;
10847 +		idx = err;
10848 +		entry->used = 0;
10849 +	} else {
10850 +		entry->obj   = obj;
10851 +		entry->class = fdso_ops[type];
10852 +		idx = entry - current->od_table;
10853 +	}
10854 +
10855 +	mutex_unlock(&inode->i_obj_mutex);
10856 +
10857 +	/* open only if creation succeeded */
10858 +	if (!err)
10859 +		err = fdso_open(entry, config);
10860 +	if (err < 0) {
10861 +		/* The class rejected the open call.
10862 +		 * We need to clean up and tell user space.
10863 +		 */
10864 +		if (obj)
10865 +			put_od_entry(entry);
10866 +		idx = err;
10867 +	}
10868 +
10869 +	return idx;
10870 +}
10871 +
10872 +
10873 +struct od_table_entry* get_entry_for_od(int od)
10874 +{
10875 +	struct task_struct *t = current;
10876 +
10877 +	if (!t->od_table)
10878 +		return NULL;
10879 +	if (od < 0 || od >= MAX_OBJECT_DESCRIPTORS)
10880 +		return NULL;
10881 +	if (!t->od_table[od].used)
10882 +		return NULL;
10883 +	return t->od_table + od;
10884 +}
10885 +
10886 +
10887 +asmlinkage long sys_od_open(int fd, int type, int obj_id, void* __user config)
10888 +{
10889 +	int ret = 0;
10890 +	struct file*  file;
10891 +
10892 +	/*
10893 +	   1) get file from fd, get inode from file
10894 +	   2) lock inode
10895 +	   3) try to lookup object
10896 +	   4) if not present create and enqueue object, inc inode refcnt
10897 +	   5) increment refcnt of object
10898 +	   6) alloc od_table_entry, setup ptrs
10899 +	   7) unlock inode
10900 +	   8) return offset in od_table as OD
10901 +	 */
10902 +
10903 +	if (type < MIN_OBJ_TYPE || type > MAX_OBJ_TYPE) {
10904 +		ret = -EINVAL;
10905 +		goto out;
10906 +	}
10907 +
10908 +	file = fget(fd);
10909 +	if (!file) {
10910 +		ret = -EBADF;
10911 +		goto out;
10912 +	}
10913 +
10914 +	ret = do_sys_od_open(file, type, obj_id, config);
10915 +
10916 +	fput(file);
10917 +
10918 +out:
10919 +	return ret;
10920 +}
10921 +
10922 +
10923 +asmlinkage long sys_od_close(int od)
10924 +{
10925 +	int ret = -EINVAL;
10926 +	struct task_struct *t = current;
10927 +
10928 +	if (od < 0 || od >= MAX_OBJECT_DESCRIPTORS)
10929 +		return ret;
10930 +
10931 +	if (!t->od_table || !t->od_table[od].used)
10932 +		return ret;
10933 +
10934 +
10935 +	/* give the class a chance to reject the close
10936 +	 */
10937 +	ret = fdso_close(t->od_table + od);
10938 +	if (ret == 0)
10939 +		ret = put_od_entry(t->od_table + od);
10940 +
10941 +	return ret;
10942 +}
10943 diff --git a/litmus/ft_event.c b/litmus/ft_event.c
10944 new file mode 100644
10945 index 0000000..399a07b
10946 --- /dev/null
10947 +++ b/litmus/ft_event.c
10948 @@ -0,0 +1,43 @@
10949 +#include <linux/types.h>
10950 +
10951 +#include <litmus/feather_trace.h>
10952 +
10953 +#if !defined(CONFIG_ARCH_HAS_FEATHER_TRACE) || defined(CONFIG_DEBUG_RODATA)
10954 +/* provide dummy implementation */
10955 +
10956 +int ft_events[MAX_EVENTS];
10957 +
10958 +int ft_enable_event(unsigned long id)
10959 +{
10960 +	if (id < MAX_EVENTS) {
10961 +		ft_events[id]++;
10962 +		return 1;
10963 +	} else
10964 +		return 0;
10965 +}
10966 +
10967 +int ft_disable_event(unsigned long id)
10968 +{
10969 +	if (id < MAX_EVENTS && ft_events[id]) {
10970 +		ft_events[id]--;
10971 +		return 1;
10972 +	} else
10973 +		return 0;
10974 +}
10975 +
10976 +int ft_disable_all_events(void)
10977 +{
10978 +	int i;
10979 +
10980 +	for (i = 0; i < MAX_EVENTS; i++)
10981 +		ft_events[i] = 0;
10982 +
10983 +	return MAX_EVENTS;
10984 +}
10985 +
10986 +int ft_is_event_enabled(unsigned long id)
10987 +{
10988 +	return 	id < MAX_EVENTS && ft_events[id];
10989 +}
10990 +
10991 +#endif
10992 diff --git a/litmus/ftdev.c b/litmus/ftdev.c
10993 new file mode 100644
10994 index 0000000..06fcf4c
10995 --- /dev/null
10996 +++ b/litmus/ftdev.c
10997 @@ -0,0 +1,439 @@
10998 +#include <linux/sched.h>
10999 +#include <linux/fs.h>
11000 +#include <linux/slab.h>
11001 +#include <linux/cdev.h>
11002 +#include <asm/uaccess.h>
11003 +#include <linux/module.h>
11004 +#include <linux/device.h>
11005 +
11006 +#include <litmus/litmus.h>
11007 +#include <litmus/feather_trace.h>
11008 +#include <litmus/ftdev.h>
11009 +
11010 +struct ft_buffer* alloc_ft_buffer(unsigned int count, size_t size)
11011 +{
11012 +	struct ft_buffer* buf;
11013 +	size_t total = (size + 1) * count;
11014 +	char* mem;
11015 +	int order = 0, pages = 1;
11016 +
11017 +	buf = kmalloc(sizeof(*buf), GFP_KERNEL);
11018 +	if (!buf)
11019 +		return NULL;
11020 +
11021 +	total = (total / PAGE_SIZE) + (total % PAGE_SIZE != 0);
11022 +	while (pages < total) {
11023 +		order++;
11024 +		pages *= 2;
11025 +	}
11026 +
11027 +	mem = (char*) __get_free_pages(GFP_KERNEL, order);
11028 +	if (!mem) {
11029 +		kfree(buf);
11030 +		return NULL;
11031 +	}
11032 +
11033 +	if (!init_ft_buffer(buf, count, size,
11034 +			    mem + (count * size),  /* markers at the end */
11035 +			    mem)) {                /* buffer objects     */
11036 +		free_pages((unsigned long) mem, order);
11037 +		kfree(buf);
11038 +		return NULL;
11039 +	}
11040 +	return buf;
11041 +}
11042 +
11043 +void free_ft_buffer(struct ft_buffer* buf)
11044 +{
11045 +	int order = 0, pages = 1;
11046 +	size_t total;
11047 +
11048 +	if (buf) {
11049 +		total = (buf->slot_size + 1) * buf->slot_count;
11050 +		total = (total / PAGE_SIZE) + (total % PAGE_SIZE != 0);
11051 +		while (pages < total) {
11052 +			order++;
11053 +			pages *= 2;
11054 +		}
11055 +		free_pages((unsigned long) buf->buffer_mem, order);
11056 +		kfree(buf);
11057 +	}
11058 +}
11059 +
11060 +struct ftdev_event {
11061 +	int id;
11062 +	struct ftdev_event* next;
11063 +};
11064 +
11065 +static int activate(struct ftdev_event** chain, int id)
11066 +{
11067 +	struct ftdev_event* ev = kmalloc(sizeof(*ev), GFP_KERNEL);
11068 +	if (ev) {
11069 +		printk(KERN_INFO
11070 +		       "Enabling feather-trace event %d.\n", (int) id);
11071 +		ft_enable_event(id);
11072 +		ev->id = id;
11073 +		ev->next = *chain;
11074 +		*chain    = ev;
11075 +	}
11076 +	return ev ? 0 : -ENOMEM;
11077 +}
11078 +
11079 +static void deactivate(struct ftdev_event** chain, int id)
11080 +{
11081 +	struct ftdev_event **cur = chain;
11082 +	struct ftdev_event *nxt;
11083 +	while (*cur) {
11084 +		if ((*cur)->id == id) {
11085 +			nxt   = (*cur)->next;
11086 +			kfree(*cur);
11087 +			*cur  = nxt;
11088 +			printk(KERN_INFO
11089 +			       "Disabling feather-trace event %d.\n", (int) id);
11090 +			ft_disable_event(id);
11091 +			break;
11092 +		}
11093 +		cur = &(*cur)->next;
11094 +	}
11095 +}
11096 +
11097 +static int ftdev_open(struct inode *in, struct file *filp)
11098 +{
11099 +	struct ftdev* ftdev;
11100 +	struct ftdev_minor* ftdm;
11101 +	unsigned int buf_idx = iminor(in);
11102 +	int err = 0;
11103 +
11104 +	ftdev = container_of(in->i_cdev, struct ftdev, cdev);
11105 +
11106 +	if (buf_idx >= ftdev->minor_cnt) {
11107 +		err = -ENODEV;
11108 +		goto out;
11109 +	}
11110 +	if (ftdev->can_open && (err = ftdev->can_open(ftdev, buf_idx)))
11111 +		goto out;
11112 +
11113 +	ftdm = ftdev->minor + buf_idx;
11114 +	ftdm->ftdev = ftdev;
11115 +	filp->private_data = ftdm;
11116 +
11117 +	if (mutex_lock_interruptible(&ftdm->lock)) {
11118 +		err = -ERESTARTSYS;
11119 +		goto out;
11120 +	}
11121 +
11122 +	if (!ftdm->readers && ftdev->alloc)
11123 +		err = ftdev->alloc(ftdev, buf_idx);
11124 +	if (0 == err)
11125 +		ftdm->readers++;
11126 +
11127 +	mutex_unlock(&ftdm->lock);
11128 +out:
11129 +	return err;
11130 +}
11131 +
11132 +static int ftdev_release(struct inode *in, struct file *filp)
11133 +{
11134 +	struct ftdev* ftdev;
11135 +	struct ftdev_minor* ftdm;
11136 +	unsigned int buf_idx = iminor(in);
11137 +	int err = 0;
11138 +
11139 +	ftdev = container_of(in->i_cdev, struct ftdev, cdev);
11140 +
11141 +	if (buf_idx >= ftdev->minor_cnt) {
11142 +		err = -ENODEV;
11143 +		goto out;
11144 +	}
11145 +	ftdm = ftdev->minor + buf_idx;
11146 +
11147 +	if (mutex_lock_interruptible(&ftdm->lock)) {
11148 +		err = -ERESTARTSYS;
11149 +		goto out;
11150 +	}
11151 +
11152 +	if (ftdm->readers == 1) {
11153 +		while (ftdm->events)
11154 +			deactivate(&ftdm->events, ftdm->events->id);
11155 +
11156 +		/* wait for any pending events to complete */
11157 +		set_current_state(TASK_UNINTERRUPTIBLE);
11158 +		schedule_timeout(HZ);
11159 +
11160 +		printk(KERN_ALERT "Failed trace writes: %u\n",
11161 +		       ftdm->buf->failed_writes);
11162 +
11163 +		if (ftdev->free)
11164 +			ftdev->free(ftdev, buf_idx);
11165 +	}
11166 +
11167 +	ftdm->readers--;
11168 +	mutex_unlock(&ftdm->lock);
11169 +out:
11170 +	return err;
11171 +}
11172 +
11173 +/* based on ft_buffer_read
11174 + * @returns < 0 : page fault
11175 + *          = 0 : no data available
11176 + *          = 1 : one slot copied
11177 + */
11178 +static int ft_buffer_copy_to_user(struct ft_buffer* buf, char __user *dest)
11179 +{
11180 +	unsigned int idx;
11181 +	int err = 0;
11182 +	if (buf->free_count != buf->slot_count) {
11183 +		/* data available */
11184 +		idx = buf->read_idx % buf->slot_count;
11185 +		if (buf->slots[idx] == SLOT_READY) {
11186 +			err = copy_to_user(dest, ((char*) buf->buffer_mem) +
11187 +					   idx * buf->slot_size,
11188 +					   buf->slot_size);
11189 +			if (err == 0) {
11190 +				/* copy ok */
11191 +				buf->slots[idx] = SLOT_FREE;
11192 +				buf->read_idx++;
11193 +				fetch_and_inc(&buf->free_count);
11194 +				err = 1;
11195 +			}
11196 +		}
11197 +	}
11198 +	return err;
11199 +}
11200 +
11201 +static ssize_t ftdev_read(struct file *filp,
11202 +			  char __user *to, size_t len, loff_t *f_pos)
11203 +{
11204 +	/* 	we ignore f_pos, this is strictly sequential */
11205 +
11206 +	ssize_t err = 0;
11207 +	size_t chunk;
11208 +	int copied;
11209 +	struct ftdev_minor* ftdm = filp->private_data;
11210 +
11211 +	if (mutex_lock_interruptible(&ftdm->lock)) {
11212 +		err = -ERESTARTSYS;
11213 +		goto out;
11214 +	}
11215 +
11216 +
11217 +	chunk = ftdm->buf->slot_size;
11218 +	while (len >= chunk) {
11219 +		copied = ft_buffer_copy_to_user(ftdm->buf, to);
11220 +		if (copied == 1) {
11221 +			len    -= chunk;
11222 +			to     += chunk;
11223 +			err    += chunk;
11224 +	        } else if (err == 0 && copied == 0 && ftdm->events) {
11225 +			/* Only wait if there are any events enabled and only
11226 +			 * if we haven't copied some data yet. We cannot wait
11227 +			 * here with copied data because that data would get
11228 +			 * lost if the task is interrupted (e.g., killed).
11229 +			 */
11230 +			set_current_state(TASK_INTERRUPTIBLE);
11231 +			schedule_timeout(50);
11232 +			if (signal_pending(current)) {
11233 +				if (err == 0)
11234 +					/* nothing read yet, signal problem */
11235 +					err = -ERESTARTSYS;
11236 +				break;
11237 +			}
11238 +		} else if (copied < 0) {
11239 +			/* page fault */
11240 +			err = copied;
11241 +			break;
11242 +		} else
11243 +			/* nothing left to get, return to user space */
11244 +			break;
11245 +	}
11246 +	mutex_unlock(&ftdm->lock);
11247 +out:
11248 +	return err;
11249 +}
11250 +
11251 +static long ftdev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
11252 +{
11253 +	long err = -ENOIOCTLCMD;
11254 +	struct ftdev_minor* ftdm = filp->private_data;
11255 +
11256 +	if (mutex_lock_interruptible(&ftdm->lock)) {
11257 +		err = -ERESTARTSYS;
11258 +		goto out;
11259 +	}
11260 +
11261 +	/* FIXME: check id against list of acceptable events */
11262 +
11263 +	switch (cmd) {
11264 +	case  FTDEV_ENABLE_CMD:
11265 +		if (activate(&ftdm->events, arg))
11266 +			err = -ENOMEM;
11267 +		else
11268 +			err = 0;
11269 +		break;
11270 +
11271 +	case FTDEV_DISABLE_CMD:
11272 +		deactivate(&ftdm->events, arg);
11273 +		err = 0;
11274 +		break;
11275 +
11276 +	default:
11277 +		printk(KERN_DEBUG "ftdev: strange ioctl (%u, %lu)\n", cmd, arg);
11278 +	};
11279 +
11280 +	mutex_unlock(&ftdm->lock);
11281 +out:
11282 +	return err;
11283 +}
11284 +
11285 +static ssize_t ftdev_write(struct file *filp, const char __user *from,
11286 +			   size_t len, loff_t *f_pos)
11287 +{
11288 +	struct ftdev_minor* ftdm = filp->private_data;
11289 +	ssize_t err = -EINVAL;
11290 +	struct ftdev* ftdev = ftdm->ftdev;
11291 +
11292 +	/* dispatch write to buffer-specific code, if available */
11293 +	if (ftdev->write)
11294 +		err = ftdev->write(ftdm->buf, len, from);
11295 +
11296 +	return err;
11297 +}
11298 +
11299 +struct file_operations ftdev_fops = {
11300 +	.owner   = THIS_MODULE,
11301 +	.open    = ftdev_open,
11302 +	.release = ftdev_release,
11303 +	.write   = ftdev_write,
11304 +	.read    = ftdev_read,
11305 +	.unlocked_ioctl = ftdev_ioctl,
11306 +};
11307 +
11308 +int ftdev_init(	struct ftdev* ftdev, struct module* owner,
11309 +		const int minor_cnt, const char* name)
11310 +{
11311 +	int i, err;
11312 +
11313 +	BUG_ON(minor_cnt < 1);
11314 +
11315 +	cdev_init(&ftdev->cdev, &ftdev_fops);
11316 +	ftdev->name = name;
11317 +	ftdev->minor_cnt = minor_cnt;
11318 +	ftdev->cdev.owner = owner;
11319 +	ftdev->cdev.ops = &ftdev_fops;
11320 +	ftdev->alloc    = NULL;
11321 +	ftdev->free     = NULL;
11322 +	ftdev->can_open = NULL;
11323 +	ftdev->write	= NULL;
11324 +
11325 +	ftdev->minor = kcalloc(ftdev->minor_cnt, sizeof(*ftdev->minor),
11326 +			GFP_KERNEL);
11327 +	if (!ftdev->minor) {
11328 +		printk(KERN_WARNING "ftdev(%s): Could not allocate memory\n",
11329 +			ftdev->name);
11330 +		err = -ENOMEM;
11331 +		goto err_out;
11332 +	}
11333 +
11334 +	for (i = 0; i < ftdev->minor_cnt; i++) {
11335 +		mutex_init(&ftdev->minor[i].lock);
11336 +		ftdev->minor[i].readers = 0;
11337 +		ftdev->minor[i].buf     = NULL;
11338 +		ftdev->minor[i].events  = NULL;
11339 +	}
11340 +
11341 +	ftdev->class = class_create(owner, ftdev->name);
11342 +	if (IS_ERR(ftdev->class)) {
11343 +		err = PTR_ERR(ftdev->class);
11344 +		printk(KERN_WARNING "ftdev(%s): "
11345 +			"Could not create device class.\n", ftdev->name);
11346 +		goto err_dealloc;
11347 +	}
11348 +
11349 +	return 0;
11350 +
11351 +err_dealloc:
11352 +	kfree(ftdev->minor);
11353 +err_out:
11354 +	return err;
11355 +}
11356 +
11357 +/*
11358 + * Destroy minor devices up to, but not including, up_to.
11359 + */
11360 +static void ftdev_device_destroy(struct ftdev* ftdev, unsigned int up_to)
11361 +{
11362 +	dev_t minor_cntr;
11363 +
11364 +	if (up_to < 1)
11365 +		up_to = (ftdev->minor_cnt < 1) ? 0 : ftdev->minor_cnt;
11366 +
11367 +	for (minor_cntr = 0; minor_cntr < up_to; ++minor_cntr)
11368 +		device_destroy(ftdev->class, MKDEV(ftdev->major, minor_cntr));
11369 +}
11370 +
11371 +void ftdev_exit(struct ftdev* ftdev)
11372 +{
11373 +	printk("ftdev(%s): Exiting\n", ftdev->name);
11374 +	ftdev_device_destroy(ftdev, -1);
11375 +	cdev_del(&ftdev->cdev);
11376 +	unregister_chrdev_region(MKDEV(ftdev->major, 0), ftdev->minor_cnt);
11377 +	class_destroy(ftdev->class);
11378 +	kfree(ftdev->minor);
11379 +}
11380 +
11381 +int register_ftdev(struct ftdev* ftdev)
11382 +{
11383 +	struct device **device;
11384 +	dev_t trace_dev_tmp, minor_cntr;
11385 +	int err;
11386 +
11387 +	err = alloc_chrdev_region(&trace_dev_tmp, 0, ftdev->minor_cnt,
11388 +			ftdev->name);
11389 +	if (err) {
11390 +		printk(KERN_WARNING "ftdev(%s): "
11391 +		       "Could not allocate char. device region (%d minors)\n",
11392 +		       ftdev->name, ftdev->minor_cnt);
11393 +		goto err_out;
11394 +	}
11395 +
11396 +	ftdev->major = MAJOR(trace_dev_tmp);
11397 +
11398 +	err = cdev_add(&ftdev->cdev, trace_dev_tmp, ftdev->minor_cnt);
11399 +	if (err) {
11400 +		printk(KERN_WARNING "ftdev(%s): "
11401 +		       "Could not add cdev for major %u with %u minor(s).\n",
11402 +		       ftdev->name, ftdev->major, ftdev->minor_cnt);
11403 +		goto err_unregister;
11404 +	}
11405 +
11406 +	/* create the minor device(s) */
11407 +	for (minor_cntr = 0; minor_cntr < ftdev->minor_cnt; ++minor_cntr)
11408 +	{
11409 +		trace_dev_tmp = MKDEV(ftdev->major, minor_cntr);
11410 +		device = &ftdev->minor[minor_cntr].device;
11411 +
11412 +		*device = device_create(ftdev->class, NULL, trace_dev_tmp, NULL,
11413 +				"litmus/%s%d", ftdev->name, minor_cntr);
11414 +		if (IS_ERR(*device)) {
11415 +			err = PTR_ERR(*device);
11416 +			printk(KERN_WARNING "ftdev(%s): "
11417 +				"Could not create device major/minor number "
11418 +				"%u/%u\n", ftdev->name, ftdev->major,
11419 +				minor_cntr);
11420 +			printk(KERN_WARNING "ftdev(%s): "
11421 +				"will attempt deletion of allocated devices.\n",
11422 +				ftdev->name);
11423 +			goto err_minors;
11424 +		}
11425 +	}
11426 +
11427 +	return 0;
11428 +
11429 +err_minors:
11430 +	ftdev_device_destroy(ftdev, minor_cntr);
11431 +	cdev_del(&ftdev->cdev);
11432 +err_unregister:
11433 +	unregister_chrdev_region(MKDEV(ftdev->major, 0), ftdev->minor_cnt);
11434 +err_out:
11435 +	return err;
11436 +}
11437 diff --git a/litmus/gpu_affinity.c b/litmus/gpu_affinity.c
11438 new file mode 100644
11439 index 0000000..9762be1
11440 --- /dev/null
11441 +++ b/litmus/gpu_affinity.c
11442 @@ -0,0 +1,113 @@
11443 +
11444 +#ifdef CONFIG_LITMUS_NVIDIA
11445 +
11446 +#include <linux/sched.h>
11447 +#include <litmus/litmus.h>
11448 +#include <litmus/gpu_affinity.h>
11449 +
11450 +#include <litmus/sched_trace.h>
11451 +
11452 +#define OBSERVATION_CAP 2*1e9
11453 +
11454 +static fp_t update_estimate(feedback_est_t* fb, fp_t a, fp_t b, lt_t observed)
11455 +{
11456 +	fp_t relative_err;
11457 +	fp_t err, new;
11458 +	fp_t actual = _integer_to_fp(observed);
11459 +
11460 +	err = _sub(actual, fb->est);
11461 +	new = _add(_mul(a, err), _mul(b, fb->accum_err));
11462 +
11463 +	relative_err = _div(err, actual);
11464 +
11465 +	fb->est = new;
11466 +	fb->accum_err = _add(fb->accum_err, err);
11467 +
11468 +	return relative_err;
11469 +}
11470 +
11471 +void update_gpu_estimate(struct task_struct *t, lt_t observed)
11472 +{
11473 +	feedback_est_t *fb = &(tsk_rt(t)->gpu_migration_est[tsk_rt(t)->gpu_migration]);
11474 +
11475 +	BUG_ON(tsk_rt(t)->gpu_migration > MIG_LAST);
11476 +
11477 +	if(unlikely(fb->est.val == 0)) {
11478 +		// kludge-- cap observed values to prevent whacky estimations.
11479 +		// whacky stuff happens during the first few jobs.
11480 +		if(unlikely(observed > OBSERVATION_CAP)) {
11481 +			TRACE_TASK(t, "Crazy observation was capped: %llu -> %llu\n",
11482 +					   observed, OBSERVATION_CAP);
11483 +			observed = OBSERVATION_CAP;
11484 +		}
11485 +
11486 +		// take the first observation as our estimate
11487 +		// (initial value of 0 was bogus anyhow)
11488 +		fb->est = _integer_to_fp(observed);
11489 +		fb->accum_err = _div(fb->est, _integer_to_fp(2));  // ...seems to work.
11490 +	}
11491 +	else {
11492 +		fp_t rel_err = update_estimate(fb,
11493 +									   tsk_rt(t)->gpu_fb_param_a[tsk_rt(t)->gpu_migration],
11494 +									   tsk_rt(t)->gpu_fb_param_b[tsk_rt(t)->gpu_migration],
11495 +									   observed);
11496 +
11497 +		if(unlikely(_fp_to_integer(fb->est) <= 0)) {
11498 +			TRACE_TASK(t, "Invalid estimate. Patching.\n");
11499 +			fb->est = _integer_to_fp(observed);
11500 +			fb->accum_err = _div(fb->est, _integer_to_fp(2));  // ...seems to work.
11501 +		}
11502 +		else {
11503 +//			struct migration_info mig_info;
11504 +
11505 +			sched_trace_prediction_err(t,
11506 +									   &(tsk_rt(t)->gpu_migration),
11507 +									   &rel_err);
11508 +
11509 +//			mig_info.observed = observed;
11510 +//			mig_info.estimated = get_gpu_estimate(t, tsk_rt(t)->gpu_migration);
11511 +//			mig_info.distance = tsk_rt(t)->gpu_migration;
11512 +//
11513 +//			sched_trace_migration(t, &mig_info);
11514 +		}
11515 +	}
11516 +
11517 +	TRACE_TASK(t, "GPU est update after (dist = %d, obs = %llu): %d.%d\n",
11518 +			   tsk_rt(t)->gpu_migration,
11519 +			   observed,
11520 +			   _fp_to_integer(fb->est),
11521 +			   _point(fb->est));
11522 +}
11523 +
11524 +gpu_migration_dist_t gpu_migration_distance(int a, int b)
11525 +{
11526 +	// GPUs organized in a binary hierarchy, no more than 2^MIG_FAR GPUs
11527 +	int i;
11528 +	int dist;
11529 +
11530 +	if(likely(a >= 0 && b >= 0)) {
11531 +		for(i = 0; i <= MIG_FAR; ++i) {
11532 +			if(a>>i == b>>i) {
11533 +				dist = i;
11534 +				goto out;
11535 +			}
11536 +		}
11537 +		dist = MIG_NONE; // hopefully never reached.
11538 +		TRACE_CUR("WARNING: GPU distance too far! %d -> %d\n", a, b);
11539 +	}
11540 +	else {
11541 +		dist = MIG_NONE;
11542 +	}
11543 +
11544 +out:
11545 +	TRACE_CUR("Distance %d -> %d is %d\n",
11546 +			  a, b, dist);
11547 +
11548 +	return dist;
11549 +}
11550 +
11551 +
11552 +
11553 +
11554 +#endif
11555 +
11556 diff --git a/litmus/ikglp_lock.c b/litmus/ikglp_lock.c
11557 new file mode 100644
11558 index 0000000..83b708a
11559 --- /dev/null
11560 +++ b/litmus/ikglp_lock.c
11561 @@ -0,0 +1,2838 @@
11562 +#include <linux/slab.h>
11563 +#include <linux/uaccess.h>
11564 +
11565 +#include <litmus/trace.h>
11566 +#include <litmus/sched_plugin.h>
11567 +#include <litmus/fdso.h>
11568 +
11569 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
11570 +#include <litmus/gpu_affinity.h>
11571 +#include <litmus/nvidia_info.h>
11572 +#endif
11573 +
11574 +#include <litmus/ikglp_lock.h>
11575 +
11576 +// big signed value.
11577 +#define IKGLP_INVAL_DISTANCE 0x7FFFFFFF
11578 +
11579 +int ikglp_max_heap_base_priority_order(struct binheap_node *a,
11580 +										   struct binheap_node *b)
11581 +{
11582 +	ikglp_heap_node_t *d_a = binheap_entry(a, ikglp_heap_node_t, node);
11583 +	ikglp_heap_node_t *d_b = binheap_entry(b, ikglp_heap_node_t, node);
11584 +
11585 +	BUG_ON(!d_a);
11586 +	BUG_ON(!d_b);
11587 +
11588 +	return litmus->__compare(d_a->task, BASE, d_b->task, BASE);
11589 +}
11590 +
11591 +int ikglp_min_heap_base_priority_order(struct binheap_node *a,
11592 +										   struct binheap_node *b)
11593 +{
11594 +	ikglp_heap_node_t *d_a = binheap_entry(a, ikglp_heap_node_t, node);
11595 +	ikglp_heap_node_t *d_b = binheap_entry(b, ikglp_heap_node_t, node);
11596 +
11597 +	return litmus->__compare(d_b->task, BASE, d_a->task, BASE);
11598 +}
11599 +
11600 +int ikglp_donor_max_heap_base_priority_order(struct binheap_node *a,
11601 +												 struct binheap_node *b)
11602 +{
11603 +	ikglp_wait_state_t *d_a = binheap_entry(a, ikglp_wait_state_t, node);
11604 +	ikglp_wait_state_t *d_b = binheap_entry(b, ikglp_wait_state_t, node);
11605 +
11606 +	return litmus->__compare(d_a->task, BASE, d_b->task, BASE);
11607 +}
11608 +
11609 +
11610 +int ikglp_min_heap_donee_order(struct binheap_node *a,
11611 +								   struct binheap_node *b)
11612 +{
11613 +	struct task_struct *prio_a, *prio_b;
11614 +
11615 +	ikglp_donee_heap_node_t *d_a =
11616 +		binheap_entry(a, ikglp_donee_heap_node_t, node);
11617 +	ikglp_donee_heap_node_t *d_b =
11618 +		binheap_entry(b, ikglp_donee_heap_node_t, node);
11619 +
11620 +	if(!d_a->donor_info) {
11621 +		prio_a = d_a->task;
11622 +	}
11623 +	else {
11624 +		prio_a = d_a->donor_info->task;
11625 +		BUG_ON(d_a->task != d_a->donor_info->donee_info->task);
11626 +	}
11627 +
11628 +	if(!d_b->donor_info) {
11629 +		prio_b = d_b->task;
11630 +	}
11631 +	else {
11632 +		prio_b = d_b->donor_info->task;
11633 +		BUG_ON(d_b->task != d_b->donor_info->donee_info->task);
11634 +	}
11635 +
11636 +	// note reversed order
11637 +	return litmus->__compare(prio_b, BASE, prio_a, BASE);
11638 +}
11639 +
11640 +
11641 +
11642 +static inline int ikglp_get_idx(struct ikglp_semaphore *sem,
11643 +								struct fifo_queue *queue)
11644 +{
11645 +	return (queue - &sem->fifo_queues[0]);
11646 +}
11647 +
11648 +static inline struct fifo_queue* ikglp_get_queue(struct ikglp_semaphore *sem,
11649 +												 struct task_struct *holder)
11650 +{
11651 +	int i;
11652 +	for(i = 0; i < sem->nr_replicas; ++i)
11653 +		if(sem->fifo_queues[i].owner == holder)
11654 +			return(&sem->fifo_queues[i]);
11655 +	return(NULL);
11656 +}
11657 +
11658 +
11659 +
11660 +static struct task_struct* ikglp_find_hp_waiter(struct fifo_queue *kqueue,
11661 +												struct task_struct *skip)
11662 +{
11663 +	struct list_head *pos;
11664 +	struct task_struct *queued, *found = NULL;
11665 +
11666 +	list_for_each(pos, &kqueue->wait.task_list) {
11667 +		queued  = (struct task_struct*) list_entry(pos,
11668 +											wait_queue_t, task_list)->private;
11669 +
11670 +		/* Compare task prios, find high prio task. */
11671 +		if(queued != skip && litmus->compare(queued, found))
11672 +			found = queued;
11673 +	}
11674 +	return found;
11675 +}
11676 +
11677 +static struct fifo_queue* ikglp_find_shortest(struct ikglp_semaphore *sem,
11678 +											  struct fifo_queue *search_start)
11679 +{
11680 +	// we start our search at search_start instead of at the beginning of the
11681 +	// queue list to load-balance across all resources.
11682 +	struct fifo_queue* step = search_start;
11683 +	struct fifo_queue* shortest = sem->shortest_fifo_queue;
11684 +
11685 +	do {
11686 +		step = (step+1 != &sem->fifo_queues[sem->nr_replicas]) ?
11687 +		step+1 : &sem->fifo_queues[0];
11688 +
11689 +		if(step->count < shortest->count) {
11690 +			shortest = step;
11691 +			if(step->count == 0)
11692 +				break; /* can't get any shorter */
11693 +		}
11694 +
11695 +	}while(step != search_start);
11696 +
11697 +	return(shortest);
11698 +}
11699 +
11700 +static inline struct task_struct* ikglp_mth_highest(struct ikglp_semaphore *sem)
11701 +{
11702 +	return binheap_top_entry(&sem->top_m, ikglp_heap_node_t, node)->task;
11703 +}
11704 +
11705 +
11706 +
11707 +#if 0
11708 +static void print_global_list(struct binheap_node* n, int depth)
11709 +{
11710 +	ikglp_heap_node_t *global_heap_node;
11711 +	char padding[81] = "                                                                                ";
11712 +
11713 +	if(n == NULL) {
11714 +		TRACE_CUR("+-> %p\n", NULL);
11715 +		return;
11716 +	}
11717 +
11718 +	global_heap_node = binheap_entry(n, ikglp_heap_node_t, node);
11719 +
11720 +	if(depth*2 <= 80)
11721 +		padding[depth*2] = '\0';
11722 +
11723 +	TRACE_CUR("%s+-> %s/%d\n",
11724 +			  padding,
11725 +			  global_heap_node->task->comm,
11726 +			  global_heap_node->task->pid);
11727 +
11728 +    if(n->left) print_global_list(n->left, depth+1);
11729 +    if(n->right) print_global_list(n->right, depth+1);
11730 +}
11731 +
11732 +static void print_donees(struct ikglp_semaphore *sem, struct binheap_node *n, int depth)
11733 +{
11734 +	ikglp_donee_heap_node_t *donee_node;
11735 +	char padding[81] = "                                                                                ";
11736 +	struct task_struct* donor = NULL;
11737 +
11738 +	if(n == NULL) {
11739 +		TRACE_CUR("+-> %p\n", NULL);
11740 +		return;
11741 +	}
11742 +
11743 +	donee_node = binheap_entry(n, ikglp_donee_heap_node_t, node);
11744 +
11745 +	if(depth*2 <= 80)
11746 +		padding[depth*2] = '\0';
11747 +
11748 +	if(donee_node->donor_info) {
11749 +		donor = donee_node->donor_info->task;
11750 +	}
11751 +
11752 +	TRACE_CUR("%s+-> %s/%d (d: %s/%d) (fq: %d)\n",
11753 +			  padding,
11754 +			  donee_node->task->comm,
11755 +			  donee_node->task->pid,
11756 +			  (donor) ? donor->comm : "nil",
11757 +			  (donor) ? donor->pid : -1,
11758 +			  ikglp_get_idx(sem, donee_node->fq));
11759 +
11760 +    if(n->left) print_donees(sem, n->left, depth+1);
11761 +    if(n->right) print_donees(sem, n->right, depth+1);
11762 +}
11763 +
11764 +static void print_donors(struct binheap_node *n, int depth)
11765 +{
11766 +	ikglp_wait_state_t *donor_node;
11767 +	char padding[81] = "                                                                                ";
11768 +
11769 +	if(n == NULL) {
11770 +		TRACE_CUR("+-> %p\n", NULL);
11771 +		return;
11772 +	}
11773 +
11774 +	donor_node = binheap_entry(n, ikglp_wait_state_t, node);
11775 +
11776 +	if(depth*2 <= 80)
11777 +		padding[depth*2] = '\0';
11778 +
11779 +
11780 +	TRACE_CUR("%s+-> %s/%d (donee: %s/%d)\n",
11781 +			  padding,
11782 +			  donor_node->task->comm,
11783 +			  donor_node->task->pid,
11784 +			  donor_node->donee_info->task->comm,
11785 +			  donor_node->donee_info->task->pid);
11786 +
11787 +    if(n->left) print_donors(n->left, depth+1);
11788 +    if(n->right) print_donors(n->right, depth+1);
11789 +}
11790 +#endif
11791 +
11792 +static void ikglp_add_global_list(struct ikglp_semaphore *sem,
11793 +								  struct task_struct *t,
11794 +								  ikglp_heap_node_t *node)
11795 +{
11796 +
11797 +
11798 +	node->task = t;
11799 +	INIT_BINHEAP_NODE(&node->node);
11800 +
11801 +	if(sem->top_m_size < sem->m) {
11802 +		TRACE_CUR("Trivially adding %s/%d to top-m global list.\n",
11803 +				  t->comm, t->pid);
11804 +//		TRACE_CUR("Top-M Before (size = %d):\n", sem->top_m_size);
11805 +//		print_global_list(sem->top_m.root, 1);
11806 +
11807 +		binheap_add(&node->node, &sem->top_m, ikglp_heap_node_t, node);
11808 +		++(sem->top_m_size);
11809 +
11810 +//		TRACE_CUR("Top-M After (size = %d):\n", sem->top_m_size);
11811 +//		print_global_list(sem->top_m.root, 1);
11812 +	}
11813 +	else if(litmus->__compare(t, BASE, ikglp_mth_highest(sem), BASE)) {
11814 +		ikglp_heap_node_t *evicted =
11815 +			binheap_top_entry(&sem->top_m, ikglp_heap_node_t, node);
11816 +
11817 +		TRACE_CUR("Adding %s/%d to top-m and evicting %s/%d.\n",
11818 +				  t->comm, t->pid,
11819 +				  evicted->task->comm, evicted->task->pid);
11820 +
11821 +//		TRACE_CUR("Not-Top-M Before:\n");
11822 +//		print_global_list(sem->not_top_m.root, 1);
11823 +//		TRACE_CUR("Top-M Before (size = %d):\n", sem->top_m_size);
11824 +//		print_global_list(sem->top_m.root, 1);
11825 +
11826 +
11827 +		binheap_delete_root(&sem->top_m, ikglp_heap_node_t, node);
11828 +		INIT_BINHEAP_NODE(&evicted->node);
11829 +		binheap_add(&evicted->node, &sem->not_top_m, ikglp_heap_node_t, node);
11830 +
11831 +		binheap_add(&node->node, &sem->top_m, ikglp_heap_node_t, node);
11832 +
11833 +//		TRACE_CUR("Top-M After (size = %d):\n", sem->top_m_size);
11834 +//		print_global_list(sem->top_m.root, 1);
11835 +//		TRACE_CUR("Not-Top-M After:\n");
11836 +//		print_global_list(sem->not_top_m.root, 1);
11837 +	}
11838 +	else {
11839 +		TRACE_CUR("Trivially adding %s/%d to not-top-m global list.\n",
11840 +				  t->comm, t->pid);
11841 +//		TRACE_CUR("Not-Top-M Before:\n");
11842 +//		print_global_list(sem->not_top_m.root, 1);
11843 +
11844 +		binheap_add(&node->node, &sem->not_top_m, ikglp_heap_node_t, node);
11845 +
11846 +//		TRACE_CUR("Not-Top-M After:\n");
11847 +//		print_global_list(sem->not_top_m.root, 1);
11848 +	}
11849 +}
11850 +
11851 +
11852 +static void ikglp_del_global_list(struct ikglp_semaphore *sem,
11853 +								  struct task_struct *t,
11854 +								  ikglp_heap_node_t *node)
11855 +{
11856 +	BUG_ON(!binheap_is_in_heap(&node->node));
11857 +
11858 +	TRACE_CUR("Removing %s/%d from global list.\n", t->comm, t->pid);
11859 +
11860 +	if(binheap_is_in_this_heap(&node->node, &sem->top_m)) {
11861 +		TRACE_CUR("%s/%d is in top-m\n", t->comm, t->pid);
11862 +
11863 +//		TRACE_CUR("Not-Top-M Before:\n");
11864 +//		print_global_list(sem->not_top_m.root, 1);
11865 +//		TRACE_CUR("Top-M Before (size = %d):\n", sem->top_m_size);
11866 +//		print_global_list(sem->top_m.root, 1);
11867 +
11868 +
11869 +		binheap_delete(&node->node, &sem->top_m);
11870 +
11871 +		if(!binheap_empty(&sem->not_top_m)) {
11872 +			ikglp_heap_node_t *promoted =
11873 +				binheap_top_entry(&sem->not_top_m, ikglp_heap_node_t, node);
11874 +
11875 +			TRACE_CUR("Promoting %s/%d to top-m\n",
11876 +					  promoted->task->comm, promoted->task->pid);
11877 +
11878 +			binheap_delete_root(&sem->not_top_m, ikglp_heap_node_t, node);
11879 +			INIT_BINHEAP_NODE(&promoted->node);
11880 +
11881 +			binheap_add(&promoted->node, &sem->top_m, ikglp_heap_node_t, node);
11882 +		}
11883 +		else {
11884 +			TRACE_CUR("No one to promote to top-m.\n");
11885 +			--(sem->top_m_size);
11886 +		}
11887 +
11888 +//		TRACE_CUR("Top-M After (size = %d):\n", sem->top_m_size);
11889 +//		print_global_list(sem->top_m.root, 1);
11890 +//		TRACE_CUR("Not-Top-M After:\n");
11891 +//		print_global_list(sem->not_top_m.root, 1);
11892 +	}
11893 +	else {
11894 +		TRACE_CUR("%s/%d is in not-top-m\n", t->comm, t->pid);
11895 +//		TRACE_CUR("Not-Top-M Before:\n");
11896 +//		print_global_list(sem->not_top_m.root, 1);
11897 +
11898 +		binheap_delete(&node->node, &sem->not_top_m);
11899 +
11900 +//		TRACE_CUR("Not-Top-M After:\n");
11901 +//		print_global_list(sem->not_top_m.root, 1);
11902 +	}
11903 +}
11904 +
11905 +
11906 +static void ikglp_add_donees(struct ikglp_semaphore *sem,
11907 +							 struct fifo_queue *fq,
11908 +							 struct task_struct *t,
11909 +							 ikglp_donee_heap_node_t* node)
11910 +{
11911 +//	TRACE_CUR("Adding %s/%d to donee list.\n", t->comm, t->pid);
11912 +//	TRACE_CUR("donees Before:\n");
11913 +//	print_donees(sem, sem->donees.root, 1);
11914 +
11915 +	node->task = t;
11916 +	node->donor_info = NULL;
11917 +	node->fq = fq;
11918 +	INIT_BINHEAP_NODE(&node->node);
11919 +
11920 +	binheap_add(&node->node, &sem->donees, ikglp_donee_heap_node_t, node);
11921 +
11922 +//	TRACE_CUR("donees After:\n");
11923 +//	print_donees(sem, sem->donees.root, 1);
11924 +}
11925 +
11926 +
11927 +static void ikglp_refresh_owners_prio_increase(struct task_struct *t,
11928 +											   struct fifo_queue *fq,
11929 +											   struct ikglp_semaphore *sem,
11930 +											   unsigned long flags)
11931 +{
11932 +	// priority of 't' has increased (note: 't' might already be hp_waiter).
11933 +	if ((t == fq->hp_waiter) || litmus->compare(t, fq->hp_waiter)) {
11934 +		struct task_struct *old_max_eff_prio;
11935 +		struct task_struct *new_max_eff_prio;
11936 +		struct task_struct *new_prio = NULL;
11937 +		struct task_struct *owner = fq->owner;
11938 +
11939 +		if(fq->hp_waiter)
11940 +			TRACE_TASK(t, "has higher prio than hp_waiter (%s/%d).\n",
11941 +					   fq->hp_waiter->comm, fq->hp_waiter->pid);
11942 +		else
11943 +			TRACE_TASK(t, "has higher prio than hp_waiter (NIL).\n");
11944 +
11945 +		if(owner)
11946 +		{
11947 +			raw_spin_lock(&tsk_rt(owner)->hp_blocked_tasks_lock);
11948 +
11949 +//			TRACE_TASK(owner, "Heap Before:\n");
11950 +//			print_hp_waiters(tsk_rt(owner)->hp_blocked_tasks.root, 0);
11951 +
11952 +			old_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
11953 +
11954 +			fq->hp_waiter = t;
11955 +			fq->nest.hp_waiter_eff_prio = effective_priority(fq->hp_waiter);
11956 +
11957 +			binheap_decrease(&fq->nest.hp_binheap_node,
11958 +							 &tsk_rt(owner)->hp_blocked_tasks);
11959 +
11960 +//			TRACE_TASK(owner, "Heap After:\n");
11961 +//			print_hp_waiters(tsk_rt(owner)->hp_blocked_tasks.root, 0);
11962 +
11963 +			new_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
11964 +
11965 +			if(new_max_eff_prio != old_max_eff_prio) {
11966 +				TRACE_TASK(t, "is new hp_waiter.\n");
11967 +
11968 +				if ((effective_priority(owner) == old_max_eff_prio) ||
11969 +					(litmus->__compare(new_max_eff_prio, BASE,
11970 +									   owner, EFFECTIVE))){
11971 +					new_prio = new_max_eff_prio;
11972 +				}
11973 +			}
11974 +			else {
11975 +				TRACE_TASK(t, "no change in max_eff_prio of heap.\n");
11976 +			}
11977 +
11978 +			if(new_prio) {
11979 +				// set new inheritance and propagate
11980 +				TRACE_TASK(t, "Effective priority changed for owner %s/%d to %s/%d\n",
11981 +						   owner->comm, owner->pid,
11982 +						   new_prio->comm, new_prio->pid);
11983 +				litmus->nested_increase_prio(owner, new_prio, &sem->lock,
11984 +											 flags);  // unlocks lock.
11985 +			}
11986 +			else {
11987 +				TRACE_TASK(t, "No change in effective priority (is %s/%d).  Propagation halted.\n",
11988 +						   new_max_eff_prio->comm, new_max_eff_prio->pid);
11989 +				raw_spin_unlock(&tsk_rt(owner)->hp_blocked_tasks_lock);
11990 +				unlock_fine_irqrestore(&sem->lock, flags);
11991 +			}
11992 +		}
11993 +		else {
11994 +			fq->hp_waiter = t;
11995 +			fq->nest.hp_waiter_eff_prio = effective_priority(fq->hp_waiter);
11996 +
11997 +			TRACE_TASK(t, "no owner.\n");
11998 +			unlock_fine_irqrestore(&sem->lock, flags);
11999 +		}
12000 +	}
12001 +	else {
12002 +		TRACE_TASK(t, "hp_waiter is unaffected.\n");
12003 +		unlock_fine_irqrestore(&sem->lock, flags);
12004 +	}
12005 +}
12006 +
12007 +// hp_waiter has decreased
12008 +static void ikglp_refresh_owners_prio_decrease(struct fifo_queue *fq,
12009 +											   struct ikglp_semaphore *sem,
12010 +											   unsigned long flags)
12011 +{
12012 +	struct task_struct *owner = fq->owner;
12013 +
12014 +	struct task_struct *old_max_eff_prio;
12015 +	struct task_struct *new_max_eff_prio;
12016 +
12017 +	if(!owner) {
12018 +		TRACE_CUR("No owner.  Returning.\n");
12019 +		unlock_fine_irqrestore(&sem->lock, flags);
12020 +		return;
12021 +	}
12022 +
12023 +	TRACE_CUR("ikglp_refresh_owners_prio_decrease\n");
12024 +
12025 +	raw_spin_lock(&tsk_rt(owner)->hp_blocked_tasks_lock);
12026 +
12027 +	old_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
12028 +
12029 +	binheap_delete(&fq->nest.hp_binheap_node, &tsk_rt(owner)->hp_blocked_tasks);
12030 +	fq->nest.hp_waiter_eff_prio = fq->hp_waiter;
12031 +	binheap_add(&fq->nest.hp_binheap_node, &tsk_rt(owner)->hp_blocked_tasks,
12032 +				struct nested_info, hp_binheap_node);
12033 +
12034 +	new_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
12035 +
12036 +	if((old_max_eff_prio != new_max_eff_prio) &&
12037 +	   (effective_priority(owner) == old_max_eff_prio))
12038 +	{
12039 +		// Need to set new effective_priority for owner
12040 +		struct task_struct *decreased_prio;
12041 +
12042 +		TRACE_CUR("Propagating decreased inheritance to holder of fq %d.\n",
12043 +				  ikglp_get_idx(sem, fq));
12044 +
12045 +		if(litmus->__compare(new_max_eff_prio, BASE, owner, BASE)) {
12046 +			TRACE_CUR("%s/%d has greater base priority than base priority of owner (%s/%d) of fq %d.\n",
12047 +					  (new_max_eff_prio) ? new_max_eff_prio->comm : "nil",
12048 +					  (new_max_eff_prio) ? new_max_eff_prio->pid : -1,
12049 +					  owner->comm,
12050 +					  owner->pid,
12051 +					  ikglp_get_idx(sem, fq));
12052 +
12053 +			decreased_prio = new_max_eff_prio;
12054 +		}
12055 +		else {
12056 +			TRACE_CUR("%s/%d has lesser base priority than base priority of owner (%s/%d) of fq %d.\n",
12057 +					  (new_max_eff_prio) ? new_max_eff_prio->comm : "nil",
12058 +					  (new_max_eff_prio) ? new_max_eff_prio->pid : -1,
12059 +					  owner->comm,
12060 +					  owner->pid,
12061 +					  ikglp_get_idx(sem, fq));
12062 +
12063 +			decreased_prio = NULL;
12064 +		}
12065 +
12066 +		// beware: recursion
12067 +		litmus->nested_decrease_prio(owner, decreased_prio, &sem->lock, flags);	// will unlock mutex->lock
12068 +	}
12069 +	else {
12070 +		TRACE_TASK(owner, "No need to propagate priority decrease forward.\n");
12071 +		raw_spin_unlock(&tsk_rt(owner)->hp_blocked_tasks_lock);
12072 +		unlock_fine_irqrestore(&sem->lock, flags);
12073 +	}
12074 +}
12075 +
12076 +
12077 +static void ikglp_remove_donation_from_owner(struct binheap_node *n,
12078 +											 struct fifo_queue *fq,
12079 +											 struct ikglp_semaphore *sem,
12080 +											 unsigned long flags)
12081 +{
12082 +	struct task_struct *owner = fq->owner;
12083 +
12084 +	struct task_struct *old_max_eff_prio;
12085 +	struct task_struct *new_max_eff_prio;
12086 +
12087 +	BUG_ON(!owner);
12088 +
12089 +	raw_spin_lock(&tsk_rt(owner)->hp_blocked_tasks_lock);
12090 +
12091 +	old_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
12092 +
12093 +	binheap_delete(n, &tsk_rt(owner)->hp_blocked_tasks);
12094 +
12095 +	new_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
12096 +
12097 +	if((old_max_eff_prio != new_max_eff_prio) &&
12098 +	   (effective_priority(owner) == old_max_eff_prio))
12099 +	{
12100 +		// Need to set new effective_priority for owner
12101 +		struct task_struct *decreased_prio;
12102 +
12103 +		TRACE_CUR("Propagating decreased inheritance to holder of fq %d.\n",
12104 +				  ikglp_get_idx(sem, fq));
12105 +
12106 +		if(litmus->__compare(new_max_eff_prio, BASE, owner, BASE)) {
12107 +			TRACE_CUR("has greater base priority than base priority of owner of fq %d.\n",
12108 +					  ikglp_get_idx(sem, fq));
12109 +			decreased_prio = new_max_eff_prio;
12110 +		}
12111 +		else {
12112 +			TRACE_CUR("has lesser base priority than base priority of owner of fq %d.\n",
12113 +					  ikglp_get_idx(sem, fq));
12114 +			decreased_prio = NULL;
12115 +		}
12116 +
12117 +		// beware: recursion
12118 +		litmus->nested_decrease_prio(owner, decreased_prio, &sem->lock, flags);	// will unlock mutex->lock
12119 +	}
12120 +	else {
12121 +		TRACE_TASK(owner, "No need to propagate priority decrease forward.\n");
12122 +		raw_spin_unlock(&tsk_rt(owner)->hp_blocked_tasks_lock);
12123 +		unlock_fine_irqrestore(&sem->lock, flags);
12124 +	}
12125 +}
12126 +
12127 +static void ikglp_remove_donation_from_fq_waiter(struct task_struct *t,
12128 +												 struct binheap_node *n)
12129 +{
12130 +	struct task_struct *old_max_eff_prio;
12131 +	struct task_struct *new_max_eff_prio;
12132 +
12133 +	raw_spin_lock(&tsk_rt(t)->hp_blocked_tasks_lock);
12134 +
12135 +	old_max_eff_prio = top_priority(&tsk_rt(t)->hp_blocked_tasks);
12136 +
12137 +	binheap_delete(n, &tsk_rt(t)->hp_blocked_tasks);
12138 +
12139 +	new_max_eff_prio = top_priority(&tsk_rt(t)->hp_blocked_tasks);
12140 +
12141 +	if((old_max_eff_prio != new_max_eff_prio) &&
12142 +	   (effective_priority(t) == old_max_eff_prio))
12143 +	{
12144 +		// Need to set new effective_priority for owner
12145 +		struct task_struct *decreased_prio;
12146 +
12147 +		if(litmus->__compare(new_max_eff_prio, BASE, t, BASE)) {
12148 +			decreased_prio = new_max_eff_prio;
12149 +		}
12150 +		else {
12151 +			decreased_prio = NULL;
12152 +		}
12153 +
12154 +		tsk_rt(t)->inh_task = decreased_prio;
12155 +	}
12156 +
12157 +	raw_spin_unlock(&tsk_rt(t)->hp_blocked_tasks_lock);
12158 +}
12159 +
12160 +static void ikglp_get_immediate(struct task_struct* t,
12161 +								struct fifo_queue *fq,
12162 +								struct ikglp_semaphore *sem,
12163 +								unsigned long flags)
12164 +{
12165 +	// resource available now
12166 +	TRACE_CUR("queue %d: acquired immediately\n", ikglp_get_idx(sem, fq));
12167 +
12168 +	fq->owner = t;
12169 +
12170 +	raw_spin_lock(&tsk_rt(t)->hp_blocked_tasks_lock);
12171 +	binheap_add(&fq->nest.hp_binheap_node, &tsk_rt(t)->hp_blocked_tasks,
12172 +				struct nested_info, hp_binheap_node);
12173 +	raw_spin_unlock(&tsk_rt(t)->hp_blocked_tasks_lock);
12174 +
12175 +	++(fq->count);
12176 +
12177 +	ikglp_add_global_list(sem, t, &fq->global_heap_node);
12178 +	ikglp_add_donees(sem, fq, t, &fq->donee_heap_node);
12179 +
12180 +	sem->shortest_fifo_queue = ikglp_find_shortest(sem, sem->shortest_fifo_queue);
12181 +
12182 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12183 +	if(sem->aff_obs) {
12184 +		sem->aff_obs->ops->notify_enqueue(sem->aff_obs, fq, t);
12185 +		sem->aff_obs->ops->notify_acquired(sem->aff_obs, fq, t);
12186 +	}
12187 +#endif
12188 +
12189 +	unlock_fine_irqrestore(&sem->lock, flags);
12190 +}
12191 +
12192 +
12193 +
12194 +
12195 +
12196 +static void __ikglp_enqueue_on_fq(struct ikglp_semaphore *sem,
12197 +								  struct fifo_queue* fq,
12198 +								  struct task_struct* t,
12199 +								  wait_queue_t *wait,
12200 +								  ikglp_heap_node_t *global_heap_node,
12201 +								  ikglp_donee_heap_node_t *donee_heap_node)
12202 +{
12203 +	/* resource is not free => must suspend and wait */
12204 +	TRACE_TASK(t, "Enqueuing on fq %d.\n",
12205 +			   ikglp_get_idx(sem, fq));
12206 +
12207 +	init_waitqueue_entry(wait, t);
12208 +
12209 +	__add_wait_queue_tail_exclusive(&fq->wait, wait);
12210 +
12211 +	++(fq->count);
12212 +	++(sem->nr_in_fifos);
12213 +
12214 +	// update global list.
12215 +	if(likely(global_heap_node)) {
12216 +		if(binheap_is_in_heap(&global_heap_node->node)) {
12217 +			WARN_ON(1);
12218 +			ikglp_del_global_list(sem, t, global_heap_node);
12219 +		}
12220 +		ikglp_add_global_list(sem, t, global_heap_node);
12221 +	}
12222 +	// update donor eligiblity list.
12223 +	if(likely(donee_heap_node)) {
12224 +//		if(binheap_is_in_heap(&donee_heap_node->node)) {
12225 +//			WARN_ON(1);
12226 +//		}
12227 +		ikglp_add_donees(sem, fq, t, donee_heap_node);
12228 +	}
12229 +
12230 +	if(sem->shortest_fifo_queue == fq) {
12231 +		sem->shortest_fifo_queue = ikglp_find_shortest(sem, fq);
12232 +	}
12233 +
12234 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12235 +	if(sem->aff_obs) {
12236 +		sem->aff_obs->ops->notify_enqueue(sem->aff_obs, fq, t);
12237 +	}
12238 +#endif
12239 +
12240 +	TRACE_TASK(t, "shortest queue is now %d\n", ikglp_get_idx(sem, fq));
12241 +}
12242 +
12243 +
12244 +static void ikglp_enqueue_on_fq(
12245 +								struct ikglp_semaphore *sem,
12246 +								struct fifo_queue *fq,
12247 +								ikglp_wait_state_t *wait,
12248 +								unsigned long flags)
12249 +{
12250 +	/* resource is not free => must suspend and wait */
12251 +	TRACE_TASK(wait->task, "queue %d: Resource is not free => must suspend and wait.\n",
12252 +			   ikglp_get_idx(sem, fq));
12253 +
12254 +	INIT_BINHEAP_NODE(&wait->global_heap_node.node);
12255 +	INIT_BINHEAP_NODE(&wait->donee_heap_node.node);
12256 +
12257 +	__ikglp_enqueue_on_fq(sem, fq, wait->task, &wait->fq_node,
12258 +						  &wait->global_heap_node, &wait->donee_heap_node);
12259 +
12260 +	ikglp_refresh_owners_prio_increase(wait->task, fq, sem, flags);  // unlocks sem->lock
12261 +}
12262 +
12263 +
12264 +static void __ikglp_enqueue_on_pq(struct ikglp_semaphore *sem,
12265 +								  ikglp_wait_state_t *wait)
12266 +{
12267 +	TRACE_TASK(wait->task, "goes to PQ.\n");
12268 +
12269 +	wait->pq_node.task = wait->task; // copy over task (little redundant...)
12270 +
12271 +	binheap_add(&wait->pq_node.node, &sem->priority_queue,
12272 +				ikglp_heap_node_t, node);
12273 +}
12274 +
12275 +static void ikglp_enqueue_on_pq(struct ikglp_semaphore *sem,
12276 +								ikglp_wait_state_t *wait)
12277 +{
12278 +	INIT_BINHEAP_NODE(&wait->global_heap_node.node);
12279 +	INIT_BINHEAP_NODE(&wait->donee_heap_node.node);
12280 +	INIT_BINHEAP_NODE(&wait->pq_node.node);
12281 +
12282 +	__ikglp_enqueue_on_pq(sem, wait);
12283 +}
12284 +
12285 +static void ikglp_enqueue_on_donor(struct ikglp_semaphore *sem,
12286 +								   ikglp_wait_state_t* wait,
12287 +								   unsigned long flags)
12288 +{
12289 +	struct task_struct *t = wait->task;
12290 +	ikglp_donee_heap_node_t *donee_node = NULL;
12291 +	struct task_struct *donee;
12292 +
12293 +	struct task_struct *old_max_eff_prio;
12294 +	struct task_struct *new_max_eff_prio;
12295 +	struct task_struct *new_prio = NULL;
12296 +
12297 +	INIT_BINHEAP_NODE(&wait->global_heap_node.node);
12298 +	INIT_BINHEAP_NODE(&wait->donee_heap_node.node);
12299 +	INIT_BINHEAP_NODE(&wait->pq_node.node);
12300 +	INIT_BINHEAP_NODE(&wait->node);
12301 +
12302 +//	TRACE_CUR("Adding %s/%d as donor.\n", t->comm, t->pid);
12303 +//	TRACE_CUR("donors Before:\n");
12304 +//	print_donors(sem->donors.root, 1);
12305 +
12306 +	// Add donor to the global list.
12307 +	ikglp_add_global_list(sem, t, &wait->global_heap_node);
12308 +
12309 +	// Select a donee
12310 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12311 +	donee_node = (sem->aff_obs) ?
12312 +		sem->aff_obs->ops->advise_donee_selection(sem->aff_obs, t) :
12313 +		binheap_top_entry(&sem->donees, ikglp_donee_heap_node_t, node);
12314 +#else
12315 +	donee_node = binheap_top_entry(&sem->donees, ikglp_donee_heap_node_t, node);
12316 +#endif
12317 +
12318 +	donee = donee_node->task;
12319 +
12320 +	TRACE_TASK(t, "Donee selected: %s/%d\n", donee->comm, donee->pid);
12321 +
12322 +	TRACE_CUR("Temporarily removing %s/%d to donee list.\n",
12323 +			  donee->comm, donee->pid);
12324 +//	TRACE_CUR("donees Before:\n");
12325 +//	print_donees(sem, sem->donees.root, 1);
12326 +
12327 +	//binheap_delete_root(&sem->donees, ikglp_donee_heap_node_t, node);  // will re-add it shortly
12328 +	binheap_delete(&donee_node->node, &sem->donees);
12329 +
12330 +//	TRACE_CUR("donees After:\n");
12331 +//	print_donees(sem, sem->donees.root, 1);
12332 +
12333 +
12334 +	wait->donee_info = donee_node;
12335 +
12336 +	// Add t to donor heap.
12337 +	binheap_add(&wait->node, &sem->donors, ikglp_wait_state_t, node);
12338 +
12339 +	// Now adjust the donee's priority.
12340 +
12341 +	// Lock the donee's inheritance heap.
12342 +	raw_spin_lock(&tsk_rt(donee)->hp_blocked_tasks_lock);
12343 +
12344 +	old_max_eff_prio = top_priority(&tsk_rt(donee)->hp_blocked_tasks);
12345 +
12346 +	if(donee_node->donor_info) {
12347 +		// Steal donation relation.  Evict old donor to PQ.
12348 +
12349 +		// Remove old donor from donor heap
12350 +		ikglp_wait_state_t *old_wait = donee_node->donor_info;
12351 +		struct task_struct *old_donor = old_wait->task;
12352 +
12353 +		TRACE_TASK(t, "Donee (%s/%d) had donor %s/%d.  Moving old donor to PQ.\n",
12354 +				   donee->comm, donee->pid, old_donor->comm, old_donor->pid);
12355 +
12356 +		binheap_delete(&old_wait->node, &sem->donors);
12357 +
12358 +		// Remove donation from donee's inheritance heap.
12359 +		binheap_delete(&old_wait->prio_donation.hp_binheap_node,
12360 +					   &tsk_rt(donee)->hp_blocked_tasks);
12361 +		// WARNING: have not updated inh_prio!
12362 +
12363 +		// Add old donor to PQ.
12364 +		__ikglp_enqueue_on_pq(sem, old_wait);
12365 +
12366 +		// Remove old donor from the global heap.
12367 +		ikglp_del_global_list(sem, old_donor, &old_wait->global_heap_node);
12368 +	}
12369 +
12370 +	// Add back donee's node to the donees heap with increased prio
12371 +	donee_node->donor_info = wait;
12372 +	INIT_BINHEAP_NODE(&donee_node->node);
12373 +
12374 +
12375 +	TRACE_CUR("Adding %s/%d back to donee list.\n", donee->comm, donee->pid);
12376 +//	TRACE_CUR("donees Before:\n");
12377 +//	print_donees(sem, sem->donees.root, 1);
12378 +
12379 +	binheap_add(&donee_node->node, &sem->donees, ikglp_donee_heap_node_t, node);
12380 +
12381 +//	TRACE_CUR("donees After:\n");
12382 +//	print_donees(sem, sem->donees.root, 1);
12383 +
12384 +	// Add an inheritance/donation to the donee's inheritance heap.
12385 +	wait->prio_donation.lock = (struct litmus_lock*)sem;
12386 +	wait->prio_donation.hp_waiter_eff_prio = t;
12387 +	wait->prio_donation.hp_waiter_ptr = NULL;
12388 +	INIT_BINHEAP_NODE(&wait->prio_donation.hp_binheap_node);
12389 +
12390 +	binheap_add(&wait->prio_donation.hp_binheap_node,
12391 +				&tsk_rt(donee)->hp_blocked_tasks,
12392 +				struct nested_info, hp_binheap_node);
12393 +
12394 +	new_max_eff_prio = top_priority(&tsk_rt(donee)->hp_blocked_tasks);
12395 +
12396 +	if(new_max_eff_prio != old_max_eff_prio) {
12397 +		if ((effective_priority(donee) == old_max_eff_prio) ||
12398 +			(litmus->__compare(new_max_eff_prio, BASE, donee, EFFECTIVE))){
12399 +			TRACE_TASK(t, "Donation increases %s/%d's effective priority\n",
12400 +					   donee->comm, donee->pid);
12401 +			new_prio = new_max_eff_prio;
12402 +		}
12403 +//		else {
12404 +//			// should be bug.  donor would not be in top-m.
12405 +//			TRACE_TASK(t, "Donation is not greater than base prio of %s/%d?\n", donee->comm, donee->pid);
12406 +//			WARN_ON(1);
12407 +//		}
12408 +//	}
12409 +//	else {
12410 +//		// should be bug.  donor would not be in top-m.
12411 +//		TRACE_TASK(t, "No change in %s/%d's inheritance heap?\n", donee->comm, donee->pid);
12412 +//		WARN_ON(1);
12413 +	}
12414 +
12415 +	if(new_prio) {
12416 +		struct fifo_queue *donee_fq = donee_node->fq;
12417 +
12418 +		if(donee != donee_fq->owner) {
12419 +			TRACE_TASK(t, "%s/%d is not the owner. Propagating priority to owner %s/%d.\n",
12420 +					   donee->comm, donee->pid,
12421 +					   donee_fq->owner->comm, donee_fq->owner->pid);
12422 +
12423 +			raw_spin_unlock(&tsk_rt(donee)->hp_blocked_tasks_lock);
12424 +			ikglp_refresh_owners_prio_increase(donee, donee_fq, sem, flags);  // unlocks sem->lock
12425 +		}
12426 +		else {
12427 +			TRACE_TASK(t, "%s/%d is the owner. Progatating priority immediatly.\n",
12428 +					   donee->comm, donee->pid);
12429 +			litmus->nested_increase_prio(donee, new_prio, &sem->lock, flags);  // unlocks sem->lock and donee's heap lock
12430 +		}
12431 +	}
12432 +	else {
12433 +		TRACE_TASK(t, "No change in effective priority (it is %d/%s).  BUG?\n",
12434 +				   new_max_eff_prio->comm, new_max_eff_prio->pid);
12435 +		raw_spin_unlock(&tsk_rt(donee)->hp_blocked_tasks_lock);
12436 +		unlock_fine_irqrestore(&sem->lock, flags);
12437 +	}
12438 +
12439 +
12440 +//	TRACE_CUR("donors After:\n");
12441 +//	print_donors(sem->donors.root, 1);
12442 +}
12443 +
12444 +int ikglp_lock(struct litmus_lock* l)
12445 +{
12446 +	struct task_struct* t = current;
12447 +	struct ikglp_semaphore *sem = ikglp_from_lock(l);
12448 +	unsigned long flags = 0, real_flags;
12449 +	struct fifo_queue *fq = NULL;
12450 +	int replica = -EINVAL;
12451 +
12452 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
12453 +	raw_spinlock_t *dgl_lock;
12454 +#endif
12455 +
12456 +	ikglp_wait_state_t wait;
12457 +
12458 +	if (!is_realtime(t))
12459 +		return -EPERM;
12460 +
12461 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
12462 +	dgl_lock = litmus->get_dgl_spinlock(t);
12463 +#endif
12464 +
12465 +	raw_spin_lock_irqsave(&sem->real_lock, real_flags);
12466 +
12467 +	lock_global_irqsave(dgl_lock, flags);
12468 +	lock_fine_irqsave(&sem->lock, flags);
12469 +
12470 +	if(sem->nr_in_fifos < sem->m) {
12471 +		// enqueue somwhere
12472 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12473 +		fq = (sem->aff_obs) ?
12474 +			sem->aff_obs->ops->advise_enqueue(sem->aff_obs, t) :
12475 +			sem->shortest_fifo_queue;
12476 +#else
12477 +		fq = sem->shortest_fifo_queue;
12478 +#endif
12479 +		if(fq->count == 0) {
12480 +			// take available resource
12481 +			replica = ikglp_get_idx(sem, fq);
12482 +
12483 +			ikglp_get_immediate(t, fq, sem, flags);  // unlocks sem->lock
12484 +
12485 +			unlock_global_irqrestore(dgl_lock, flags);
12486 +			raw_spin_unlock_irqrestore(&sem->real_lock, real_flags);
12487 +			goto acquired;
12488 +		}
12489 +		else {
12490 +			wait.task = t;   // THIS IS CRITICALLY IMPORTANT!!!
12491 +
12492 +			tsk_rt(t)->blocked_lock = (struct litmus_lock*)sem;  // record where we are blocked
12493 +			mb();
12494 +
12495 +			/* FIXME: interruptible would be nice some day */
12496 +			set_task_state(t, TASK_UNINTERRUPTIBLE);
12497 +
12498 +			ikglp_enqueue_on_fq(sem, fq, &wait, flags);  // unlocks sem->lock
12499 +		}
12500 +	}
12501 +	else {
12502 +		// donor!
12503 +		wait.task = t;   // THIS IS CRITICALLY IMPORTANT!!!
12504 +
12505 +		tsk_rt(t)->blocked_lock = (struct litmus_lock*)sem;  // record where we are blocked
12506 +		mb();
12507 +
12508 +		/* FIXME: interruptible would be nice some day */
12509 +		set_task_state(t, TASK_UNINTERRUPTIBLE);
12510 +
12511 +		if(litmus->__compare(ikglp_mth_highest(sem), BASE, t, BASE)) {
12512 +			// enqueue on PQ
12513 +			ikglp_enqueue_on_pq(sem, &wait);
12514 +			unlock_fine_irqrestore(&sem->lock, flags);
12515 +		}
12516 +		else {
12517 +			// enqueue as donor
12518 +			ikglp_enqueue_on_donor(sem, &wait, flags);	 // unlocks sem->lock
12519 +		}
12520 +	}
12521 +
12522 +	unlock_global_irqrestore(dgl_lock, flags);
12523 +	raw_spin_unlock_irqrestore(&sem->real_lock, real_flags);
12524 +
12525 +	TS_LOCK_SUSPEND;
12526 +
12527 +	schedule();
12528 +
12529 +	TS_LOCK_RESUME;
12530 +
12531 +	fq = ikglp_get_queue(sem, t);
12532 +	BUG_ON(!fq);
12533 +
12534 +	replica = ikglp_get_idx(sem, fq);
12535 +
12536 +acquired:
12537 +	TRACE_CUR("Acquired lock %d, queue %d\n",
12538 +			  l->ident, replica);
12539 +
12540 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12541 +	if(sem->aff_obs) {
12542 +		return sem->aff_obs->ops->replica_to_resource(sem->aff_obs, fq);
12543 +	}
12544 +#endif
12545 +
12546 +	return replica;
12547 +}
12548 +
12549 +//int ikglp_lock(struct litmus_lock* l)
12550 +//{
12551 +//	struct task_struct* t = current;
12552 +//	struct ikglp_semaphore *sem = ikglp_from_lock(l);
12553 +//	unsigned long flags = 0, real_flags;
12554 +//	struct fifo_queue *fq = NULL;
12555 +//	int replica = -EINVAL;
12556 +//
12557 +//#ifdef CONFIG_LITMUS_DGL_SUPPORT
12558 +//	raw_spinlock_t *dgl_lock;
12559 +//#endif
12560 +//
12561 +//	ikglp_wait_state_t wait;
12562 +//
12563 +//	if (!is_realtime(t))
12564 +//		return -EPERM;
12565 +//
12566 +//#ifdef CONFIG_LITMUS_DGL_SUPPORT
12567 +//	dgl_lock = litmus->get_dgl_spinlock(t);
12568 +//#endif
12569 +//
12570 +//	raw_spin_lock_irqsave(&sem->real_lock, real_flags);
12571 +//
12572 +//	lock_global_irqsave(dgl_lock, flags);
12573 +//	lock_fine_irqsave(&sem->lock, flags);
12574 +//
12575 +//
12576 +//#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12577 +//	fq = (sem->aff_obs) ?
12578 +//		sem->aff_obs->ops->advise_enqueue(sem->aff_obs, t) :
12579 +//		sem->shortest_fifo_queue;
12580 +//#else
12581 +//	fq = sem->shortest_fifo_queue;
12582 +//#endif
12583 +//
12584 +//	if(fq->count == 0) {
12585 +//		// take available resource
12586 +//		replica = ikglp_get_idx(sem, fq);
12587 +//
12588 +//		ikglp_get_immediate(t, fq, sem, flags);  // unlocks sem->lock
12589 +//
12590 +//		unlock_global_irqrestore(dgl_lock, flags);
12591 +//		raw_spin_unlock_irqrestore(&sem->real_lock, real_flags);
12592 +//	}
12593 +//	else
12594 +//	{
12595 +//		// we have to suspend.
12596 +//
12597 +//		wait.task = t;   // THIS IS CRITICALLY IMPORTANT!!!
12598 +//
12599 +//		tsk_rt(t)->blocked_lock = (struct litmus_lock*)sem;  // record where we are blocked
12600 +//		mb();
12601 +//
12602 +//		/* FIXME: interruptible would be nice some day */
12603 +//		set_task_state(t, TASK_UNINTERRUPTIBLE);
12604 +//
12605 +//		if(fq->count < sem->max_fifo_len) {
12606 +//			// enqueue on fq
12607 +//			ikglp_enqueue_on_fq(sem, fq, &wait, flags);  // unlocks sem->lock
12608 +//		}
12609 +//		else {
12610 +//
12611 +//			TRACE_CUR("IKGLP fifo queues are full (at least they better be).\n");
12612 +//
12613 +//			// no room in fifos.  Go to PQ or donors.
12614 +//
12615 +//			if(litmus->__compare(ikglp_mth_highest(sem), BASE, t, BASE)) {
12616 +//				// enqueue on PQ
12617 +//				ikglp_enqueue_on_pq(sem, &wait);
12618 +//				unlock_fine_irqrestore(&sem->lock, flags);
12619 +//			}
12620 +//			else {
12621 +//				// enqueue as donor
12622 +//				ikglp_enqueue_on_donor(sem, &wait, flags);	 // unlocks sem->lock
12623 +//			}
12624 +//		}
12625 +//
12626 +//		unlock_global_irqrestore(dgl_lock, flags);
12627 +//		raw_spin_unlock_irqrestore(&sem->real_lock, real_flags);
12628 +//
12629 +//		TS_LOCK_SUSPEND;
12630 +//
12631 +//		schedule();
12632 +//
12633 +//		TS_LOCK_RESUME;
12634 +//
12635 +//		fq = ikglp_get_queue(sem, t);
12636 +//		BUG_ON(!fq);
12637 +//
12638 +//		replica = ikglp_get_idx(sem, fq);
12639 +//	}
12640 +//
12641 +//	TRACE_CUR("Acquired lock %d, queue %d\n",
12642 +//			  l->ident, replica);
12643 +//
12644 +//#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12645 +//	if(sem->aff_obs) {
12646 +//		return sem->aff_obs->ops->replica_to_resource(sem->aff_obs, fq);
12647 +//	}
12648 +//#endif
12649 +//
12650 +//	return replica;
12651 +//}
12652 +
12653 +static void ikglp_move_donor_to_fq(struct ikglp_semaphore *sem,
12654 +								   struct fifo_queue *fq,
12655 +								   ikglp_wait_state_t *donor_info)
12656 +{
12657 +	struct task_struct *t = donor_info->task;
12658 +
12659 +	TRACE_CUR("Donor %s/%d being moved to fq %d\n",
12660 +			  t->comm,
12661 +			  t->pid,
12662 +			  ikglp_get_idx(sem, fq));
12663 +
12664 +	binheap_delete(&donor_info->node, &sem->donors);
12665 +
12666 +	__ikglp_enqueue_on_fq(sem, fq, t,
12667 +						  &donor_info->fq_node,
12668 +						  NULL, // already in global_list, so pass null to prevent adding 2nd time.
12669 +						  &donor_info->donee_heap_node);
12670 +
12671 +	// warning:
12672 +	// ikglp_update_owners_prio(t, fq, sem, flags) has not been called.
12673 +}
12674 +
12675 +static void ikglp_move_pq_to_fq(struct ikglp_semaphore *sem,
12676 +								struct fifo_queue *fq,
12677 +								ikglp_wait_state_t *wait)
12678 +{
12679 +	struct task_struct *t = wait->task;
12680 +
12681 +	TRACE_CUR("PQ request %s/%d being moved to fq %d\n",
12682 +			  t->comm,
12683 +			  t->pid,
12684 +			  ikglp_get_idx(sem, fq));
12685 +
12686 +	binheap_delete(&wait->pq_node.node, &sem->priority_queue);
12687 +
12688 +	__ikglp_enqueue_on_fq(sem, fq, t,
12689 +						  &wait->fq_node,
12690 +						  &wait->global_heap_node,
12691 +						  &wait->donee_heap_node);
12692 +	// warning:
12693 +	// ikglp_update_owners_prio(t, fq, sem, flags) has not been called.
12694 +}
12695 +
12696 +static ikglp_wait_state_t* ikglp_find_hp_waiter_to_steal(
12697 +	struct ikglp_semaphore* sem)
12698 +{
12699 +	/* must hold sem->lock */
12700 +
12701 +	struct fifo_queue *fq = NULL;
12702 +	struct list_head	*pos;
12703 +	struct task_struct 	*queued;
12704 +	int i;
12705 +
12706 +	for(i = 0; i < sem->nr_replicas; ++i) {
12707 +		if( (sem->fifo_queues[i].count > 1) &&
12708 +		   (!fq || litmus->compare(sem->fifo_queues[i].hp_waiter, fq->hp_waiter)) ) {
12709 +
12710 +			TRACE_CUR("hp_waiter on fq %d (%s/%d) has higher prio than hp_waiter on fq %d (%s/%d)\n",
12711 +					  ikglp_get_idx(sem, &sem->fifo_queues[i]),
12712 +					  sem->fifo_queues[i].hp_waiter->comm,
12713 +					  sem->fifo_queues[i].hp_waiter->pid,
12714 +					  (fq) ? ikglp_get_idx(sem, fq) : -1,
12715 +					  (fq) ? ((fq->hp_waiter) ? fq->hp_waiter->comm : "nil") : "nilXX",
12716 +					  (fq) ? ((fq->hp_waiter) ? fq->hp_waiter->pid : -1) : -2);
12717 +
12718 +			fq = &sem->fifo_queues[i];
12719 +
12720 +			WARN_ON(!(fq->hp_waiter));
12721 +		}
12722 +	}
12723 +
12724 +	if(fq) {
12725 +		struct task_struct *max_hp = fq->hp_waiter;
12726 +		ikglp_wait_state_t* ret = NULL;
12727 +
12728 +		TRACE_CUR("Searching for %s/%d on fq %d\n",
12729 +				  max_hp->comm,
12730 +				  max_hp->pid,
12731 +				  ikglp_get_idx(sem, fq));
12732 +
12733 +		BUG_ON(!max_hp);
12734 +
12735 +		list_for_each(pos, &fq->wait.task_list) {
12736 +			wait_queue_t *wait = list_entry(pos, wait_queue_t, task_list);
12737 +
12738 +			queued  = (struct task_struct*) wait->private;
12739 +
12740 +			TRACE_CUR("fq %d entry: %s/%d\n",
12741 +					  ikglp_get_idx(sem, fq),
12742 +					  queued->comm,
12743 +					  queued->pid);
12744 +
12745 +			/* Compare task prios, find high prio task. */
12746 +			if (queued == max_hp) {
12747 +				TRACE_CUR("Found it!\n");
12748 +				ret = container_of(wait, ikglp_wait_state_t, fq_node);
12749 +			}
12750 +		}
12751 +
12752 +		WARN_ON(!ret);
12753 +		return ret;
12754 +	}
12755 +
12756 +	return(NULL);
12757 +}
12758 +
12759 +static void ikglp_steal_to_fq(struct ikglp_semaphore *sem,
12760 +							  struct fifo_queue *fq,
12761 +							  ikglp_wait_state_t *fq_wait)
12762 +{
12763 +	struct task_struct *t = fq_wait->task;
12764 +	struct fifo_queue *fq_steal = fq_wait->donee_heap_node.fq;
12765 +
12766 +	TRACE_CUR("FQ request %s/%d being moved to fq %d\n",
12767 +			  t->comm,
12768 +			  t->pid,
12769 +			  ikglp_get_idx(sem, fq));
12770 +
12771 +	fq_wait->donee_heap_node.fq = fq;  // just to be safe
12772 +
12773 +
12774 +	__remove_wait_queue(&fq_steal->wait, &fq_wait->fq_node);
12775 +	--(fq_steal->count);
12776 +
12777 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12778 +	if(sem->aff_obs) {
12779 +		sem->aff_obs->ops->notify_dequeue(sem->aff_obs, fq_steal, t);
12780 +	}
12781 +#endif
12782 +
12783 +	if(t == fq_steal->hp_waiter) {
12784 +		fq_steal->hp_waiter = ikglp_find_hp_waiter(fq_steal, NULL);
12785 +		TRACE_TASK(t, "New hp_waiter for fq %d is %s/%d!\n",
12786 +				   ikglp_get_idx(sem, fq_steal),
12787 +				   (fq_steal->hp_waiter) ? fq_steal->hp_waiter->comm : "nil",
12788 +				   (fq_steal->hp_waiter) ? fq_steal->hp_waiter->pid : -1);
12789 +	}
12790 +
12791 +
12792 +	// Update shortest.
12793 +	if(fq_steal->count < sem->shortest_fifo_queue->count) {
12794 +		sem->shortest_fifo_queue = fq_steal;
12795 +	}
12796 +
12797 +	__ikglp_enqueue_on_fq(sem, fq, t,
12798 +						  &fq_wait->fq_node,
12799 +						  NULL,
12800 +						  NULL);
12801 +
12802 +	// warning: We have not checked the priority inheritance of fq's owner yet.
12803 +}
12804 +
12805 +
12806 +static void ikglp_migrate_fq_to_owner_heap_nodes(struct ikglp_semaphore *sem,
12807 +												 struct fifo_queue *fq,
12808 +												 ikglp_wait_state_t *old_wait)
12809 +{
12810 +	struct task_struct *t = old_wait->task;
12811 +
12812 +	BUG_ON(old_wait->donee_heap_node.fq != fq);
12813 +
12814 +	TRACE_TASK(t, "Migrating wait_state to memory of queue %d.\n",
12815 +			   ikglp_get_idx(sem, fq));
12816 +
12817 +	// need to migrate global_heap_node and donee_heap_node off of the stack
12818 +	// to the nodes allocated for the owner of this fq.
12819 +
12820 +	// TODO: Enhance binheap() to perform this operation in place.
12821 +
12822 +	ikglp_del_global_list(sem, t, &old_wait->global_heap_node); // remove
12823 +	fq->global_heap_node = old_wait->global_heap_node;			// copy
12824 +	ikglp_add_global_list(sem, t, &fq->global_heap_node);		// re-add
12825 +
12826 +	binheap_delete(&old_wait->donee_heap_node.node, &sem->donees);  // remove
12827 +	fq->donee_heap_node = old_wait->donee_heap_node;  // copy
12828 +
12829 +	if(fq->donee_heap_node.donor_info) {
12830 +		// let donor know that our location has changed
12831 +		BUG_ON(fq->donee_heap_node.donor_info->donee_info->task != t);	// validate cross-link
12832 +		fq->donee_heap_node.donor_info->donee_info = &fq->donee_heap_node;
12833 +	}
12834 +	INIT_BINHEAP_NODE(&fq->donee_heap_node.node);
12835 +	binheap_add(&fq->donee_heap_node.node, &sem->donees,
12836 +				ikglp_donee_heap_node_t, node);  // re-add
12837 +}
12838 +
12839 +int ikglp_unlock(struct litmus_lock* l)
12840 +{
12841 +	struct ikglp_semaphore *sem = ikglp_from_lock(l);
12842 +	struct task_struct *t = current;
12843 +	struct task_struct *donee = NULL;
12844 +	struct task_struct *next = NULL;
12845 +	struct task_struct *new_on_fq = NULL;
12846 +	struct fifo_queue *fq_of_new_on_fq = NULL;
12847 +
12848 +	ikglp_wait_state_t *other_donor_info = NULL;
12849 +	struct fifo_queue *to_steal = NULL;
12850 +	int need_steal_prio_reeval = 0;
12851 +	struct fifo_queue *fq;
12852 +
12853 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
12854 +	raw_spinlock_t *dgl_lock;
12855 +#endif
12856 +
12857 +	unsigned long flags = 0, real_flags;
12858 +
12859 +	int err = 0;
12860 +
12861 +	fq = ikglp_get_queue(sem, t);  // returns NULL if 't' is not owner.
12862 +
12863 +	if (!fq) {
12864 +		err = -EINVAL;
12865 +		goto out;
12866 +	}
12867 +
12868 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
12869 +	dgl_lock = litmus->get_dgl_spinlock(t);
12870 +#endif
12871 +	raw_spin_lock_irqsave(&sem->real_lock, real_flags);
12872 +
12873 +	lock_global_irqsave(dgl_lock, flags);  // TODO: Push this deeper
12874 +	lock_fine_irqsave(&sem->lock, flags);
12875 +
12876 +	TRACE_TASK(t, "Freeing replica %d.\n", ikglp_get_idx(sem, fq));
12877 +
12878 +
12879 +	// Remove 't' from the heaps, but data in nodes will still be good.
12880 +	ikglp_del_global_list(sem, t, &fq->global_heap_node);
12881 +	binheap_delete(&fq->donee_heap_node.node, &sem->donees);
12882 +
12883 +	fq->owner = NULL;  // no longer owned!!
12884 +	--(fq->count);
12885 +	if(fq->count < sem->shortest_fifo_queue->count) {
12886 +		sem->shortest_fifo_queue = fq;
12887 +	}
12888 +	--(sem->nr_in_fifos);
12889 +
12890 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12891 +	if(sem->aff_obs) {
12892 +		sem->aff_obs->ops->notify_dequeue(sem->aff_obs, fq, t);
12893 +		sem->aff_obs->ops->notify_freed(sem->aff_obs, fq, t);
12894 +	}
12895 +#endif
12896 +
12897 +	// Move the next request into the FQ and update heaps as needed.
12898 +	// We defer re-evaluation of priorities to later in the function.
12899 +	if(fq->donee_heap_node.donor_info) {  // move my donor to FQ
12900 +		ikglp_wait_state_t *donor_info = fq->donee_heap_node.donor_info;
12901 +
12902 +		new_on_fq = donor_info->task;
12903 +
12904 +		// donor moved to FQ
12905 +		donee = t;
12906 +
12907 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12908 +		if(sem->aff_obs && sem->aff_obs->relax_max_fifo_len) {
12909 +			fq_of_new_on_fq = sem->aff_obs->ops->advise_enqueue(sem->aff_obs, new_on_fq);
12910 +			if(fq_of_new_on_fq->count == 0) {
12911 +				// ignore it?
12912 +//				fq_of_new_on_fq = fq;
12913 +			}
12914 +		}
12915 +		else {
12916 +			fq_of_new_on_fq = fq;
12917 +		}
12918 +#else
12919 +		fq_of_new_on_fq = fq;
12920 +#endif
12921 +
12922 +		TRACE_TASK(t, "Moving MY donor (%s/%d) to fq %d (non-aff wanted fq %d).\n",
12923 +				   new_on_fq->comm, new_on_fq->pid,
12924 +				   ikglp_get_idx(sem, fq_of_new_on_fq),
12925 +				   ikglp_get_idx(sem, fq));
12926 +
12927 +
12928 +		ikglp_move_donor_to_fq(sem, fq_of_new_on_fq, donor_info);
12929 +	}
12930 +	else if(!binheap_empty(&sem->donors)) {  // No donor, so move any donor to FQ
12931 +											 // move other donor to FQ
12932 +		// Select a donor
12933 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12934 +		other_donor_info = (sem->aff_obs) ?
12935 +			sem->aff_obs->ops->advise_donor_to_fq(sem->aff_obs, fq) :
12936 +			binheap_top_entry(&sem->donors, ikglp_wait_state_t, node);
12937 +#else
12938 +		other_donor_info = binheap_top_entry(&sem->donors, ikglp_wait_state_t, node);
12939 +#endif
12940 +
12941 +		new_on_fq = other_donor_info->task;
12942 +		donee = other_donor_info->donee_info->task;
12943 +
12944 +		// update the donee's heap position.
12945 +		other_donor_info->donee_info->donor_info = NULL;  // clear the cross-link
12946 +		binheap_decrease(&other_donor_info->donee_info->node, &sem->donees);
12947 +
12948 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12949 +		if(sem->aff_obs && sem->aff_obs->relax_max_fifo_len) {
12950 +			fq_of_new_on_fq = sem->aff_obs->ops->advise_enqueue(sem->aff_obs, new_on_fq);
12951 +			if(fq_of_new_on_fq->count == 0) {
12952 +				// ignore it?
12953 +//				fq_of_new_on_fq = fq;
12954 +			}
12955 +		}
12956 +		else {
12957 +			fq_of_new_on_fq = fq;
12958 +		}
12959 +#else
12960 +		fq_of_new_on_fq = fq;
12961 +#endif
12962 +
12963 +		TRACE_TASK(t, "Moving a donor (%s/%d) to fq %d (non-aff wanted fq %d).\n",
12964 +				   new_on_fq->comm, new_on_fq->pid,
12965 +				   ikglp_get_idx(sem, fq_of_new_on_fq),
12966 +				   ikglp_get_idx(sem, fq));
12967 +
12968 +		ikglp_move_donor_to_fq(sem, fq_of_new_on_fq, other_donor_info);
12969 +	}
12970 +	else if(!binheap_empty(&sem->priority_queue)) {  // No donors, so move PQ
12971 +		ikglp_heap_node_t *pq_node = binheap_top_entry(&sem->priority_queue,
12972 +													   ikglp_heap_node_t, node);
12973 +		ikglp_wait_state_t *pq_wait = container_of(pq_node, ikglp_wait_state_t,
12974 +												   pq_node);
12975 +
12976 +		new_on_fq = pq_wait->task;
12977 +
12978 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
12979 +		if(sem->aff_obs && sem->aff_obs->relax_max_fifo_len) {
12980 +			fq_of_new_on_fq = sem->aff_obs->ops->advise_enqueue(sem->aff_obs, new_on_fq);
12981 +			if(fq_of_new_on_fq->count == 0) {
12982 +				// ignore it?
12983 +//				fq_of_new_on_fq = fq;
12984 +			}
12985 +		}
12986 +		else {
12987 +			fq_of_new_on_fq = fq;
12988 +		}
12989 +#else
12990 +		fq_of_new_on_fq = fq;
12991 +#endif
12992 +
12993 +		TRACE_TASK(t, "Moving a pq waiter (%s/%d) to fq %d (non-aff wanted fq %d).\n",
12994 +				   new_on_fq->comm, new_on_fq->pid,
12995 +				   ikglp_get_idx(sem, fq_of_new_on_fq),
12996 +				   ikglp_get_idx(sem, fq));
12997 +
12998 +		ikglp_move_pq_to_fq(sem, fq_of_new_on_fq, pq_wait);
12999 +	}
13000 +	else if(fq->count == 0) {  // No PQ and this queue is empty, so steal.
13001 +		ikglp_wait_state_t *fq_wait;
13002 +
13003 +		TRACE_TASK(t, "Looking to steal a request for fq %d...\n",
13004 +				   ikglp_get_idx(sem, fq));
13005 +
13006 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
13007 +		fq_wait = (sem->aff_obs) ?
13008 +			sem->aff_obs->ops->advise_steal(sem->aff_obs, fq) :
13009 +			ikglp_find_hp_waiter_to_steal(sem);
13010 +#else
13011 +		fq_wait = ikglp_find_hp_waiter_to_steal(sem);
13012 +#endif
13013 +
13014 +		if(fq_wait) {
13015 +			to_steal = fq_wait->donee_heap_node.fq;
13016 +
13017 +			new_on_fq = fq_wait->task;
13018 +			fq_of_new_on_fq = fq;
13019 +			need_steal_prio_reeval = (new_on_fq == to_steal->hp_waiter);
13020 +
13021 +			TRACE_TASK(t, "Found %s/%d of fq %d to steal for fq %d...\n",
13022 +					   new_on_fq->comm, new_on_fq->pid,
13023 +					   ikglp_get_idx(sem, to_steal),
13024 +					   ikglp_get_idx(sem, fq));
13025 +
13026 +			ikglp_steal_to_fq(sem, fq, fq_wait);
13027 +		}
13028 +		else {
13029 +			TRACE_TASK(t, "Found nothing to steal for fq %d.\n",
13030 +					   ikglp_get_idx(sem, fq));
13031 +		}
13032 +	}
13033 +	else { // move no one
13034 +	}
13035 +
13036 +	// 't' must drop all priority and clean up data structures before hand-off.
13037 +
13038 +	// DROP ALL INHERITANCE.  IKGLP MUST BE OUTER-MOST
13039 +	raw_spin_lock(&tsk_rt(t)->hp_blocked_tasks_lock);
13040 +	{
13041 +		int count = 0;
13042 +		while(!binheap_empty(&tsk_rt(t)->hp_blocked_tasks)) {
13043 +			binheap_delete_root(&tsk_rt(t)->hp_blocked_tasks,
13044 +								struct nested_info, hp_binheap_node);
13045 +			++count;
13046 +		}
13047 +		litmus->decrease_prio(t, NULL);
13048 +		WARN_ON(count > 2); // should not be greater than 2.  only local fq inh and donation can be possible.
13049 +	}
13050 +	raw_spin_unlock(&tsk_rt(t)->hp_blocked_tasks_lock);
13051 +
13052 +
13053 +
13054 +	// Now patch up other priorities.
13055 +	//
13056 +	// At most one of the following:
13057 +	//   if(donee && donee != t), decrease prio, propagate to owner, or onward
13058 +	//   if(to_steal), update owner's prio (hp_waiter has already been set)
13059 +	//
13060 +
13061 +	BUG_ON((other_donor_info != NULL) && (to_steal != NULL));
13062 +
13063 +	if(other_donor_info) {
13064 +		struct fifo_queue *other_fq = other_donor_info->donee_info->fq;
13065 +
13066 +		BUG_ON(!donee);
13067 +		BUG_ON(donee == t);
13068 +
13069 +		TRACE_TASK(t, "Terminating donation relation of donor %s/%d to donee %s/%d!\n",
13070 +				   other_donor_info->task->comm, other_donor_info->task->pid,
13071 +				   donee->comm, donee->pid);
13072 +
13073 +		// need to terminate donation relation.
13074 +		if(donee == other_fq->owner) {
13075 +			TRACE_TASK(t, "Donee %s/%d is an owner of fq %d.\n",
13076 +					   donee->comm, donee->pid,
13077 +					   ikglp_get_idx(sem, other_fq));
13078 +
13079 +			ikglp_remove_donation_from_owner(&other_donor_info->prio_donation.hp_binheap_node, other_fq, sem, flags);
13080 +			lock_fine_irqsave(&sem->lock, flags);  // there should be no contention!!!!
13081 +		}
13082 +		else {
13083 +			TRACE_TASK(t, "Donee %s/%d is an blocked in of fq %d.\n",
13084 +					   donee->comm, donee->pid,
13085 +					   ikglp_get_idx(sem, other_fq));
13086 +
13087 +			ikglp_remove_donation_from_fq_waiter(donee, &other_donor_info->prio_donation.hp_binheap_node);
13088 +			if(donee == other_fq->hp_waiter) {
13089 +				TRACE_TASK(t, "Donee %s/%d was an hp_waiter of fq %d. Rechecking hp_waiter.\n",
13090 +						   donee->comm, donee->pid,
13091 +						   ikglp_get_idx(sem, other_fq));
13092 +
13093 +				other_fq->hp_waiter = ikglp_find_hp_waiter(other_fq, NULL);
13094 +				TRACE_TASK(t, "New hp_waiter for fq %d is %s/%d!\n",
13095 +						   ikglp_get_idx(sem, other_fq),
13096 +						   (other_fq->hp_waiter) ? other_fq->hp_waiter->comm : "nil",
13097 +						   (other_fq->hp_waiter) ? other_fq->hp_waiter->pid : -1);
13098 +
13099 +				ikglp_refresh_owners_prio_decrease(other_fq, sem, flags); // unlocks sem->lock.  reacquire it.
13100 +				lock_fine_irqsave(&sem->lock, flags);  // there should be no contention!!!!
13101 +			}
13102 +		}
13103 +	}
13104 +	else if(to_steal) {
13105 +		TRACE_TASK(t, "Rechecking priority inheritance of fq %d, triggered by stealing.\n",
13106 +				   ikglp_get_idx(sem, to_steal));
13107 +
13108 +		if(need_steal_prio_reeval) {
13109 +			ikglp_refresh_owners_prio_decrease(to_steal, sem, flags); // unlocks sem->lock.  reacquire it.
13110 +			lock_fine_irqsave(&sem->lock, flags);  // there should be no contention!!!!
13111 +		}
13112 +	}
13113 +
13114 +	// check for new HP waiter.
13115 +	if(new_on_fq) {
13116 +		if(fq == fq_of_new_on_fq) {
13117 +			// fq->owner is null, so just update the hp_waiter without locking.
13118 +			if(new_on_fq == fq->hp_waiter) {
13119 +				TRACE_TASK(t, "new_on_fq is already hp_waiter.\n",
13120 +						   fq->hp_waiter->comm, fq->hp_waiter->pid);
13121 +				fq->nest.hp_waiter_eff_prio = effective_priority(fq->hp_waiter);  // set this just to be sure...
13122 +			}
13123 +			else if(litmus->compare(new_on_fq, fq->hp_waiter)) {
13124 +				if(fq->hp_waiter)
13125 +					TRACE_TASK(t, "has higher prio than hp_waiter (%s/%d).\n",
13126 +							   fq->hp_waiter->comm, fq->hp_waiter->pid);
13127 +				else
13128 +					TRACE_TASK(t, "has higher prio than hp_waiter (NIL).\n");
13129 +
13130 +				fq->hp_waiter = new_on_fq;
13131 +				fq->nest.hp_waiter_eff_prio = effective_priority(fq->hp_waiter);
13132 +
13133 +				TRACE_TASK(t, "New hp_waiter for fq %d is %s/%d!\n",
13134 +						   ikglp_get_idx(sem, fq),
13135 +						   (fq->hp_waiter) ? fq->hp_waiter->comm : "nil",
13136 +						   (fq->hp_waiter) ? fq->hp_waiter->pid : -1);
13137 +			}
13138 +		}
13139 +		else {
13140 +			ikglp_refresh_owners_prio_increase(new_on_fq, fq_of_new_on_fq, sem, flags); // unlocks sem->lock.  reacquire it.
13141 +			lock_fine_irqsave(&sem->lock, flags);  // there should be no contention!!!!
13142 +		}
13143 +	}
13144 +
13145 +wake_kludge:
13146 +	if(waitqueue_active(&fq->wait))
13147 +	{
13148 +		wait_queue_t *wait = list_entry(fq->wait.task_list.next, wait_queue_t, task_list);
13149 +		ikglp_wait_state_t *fq_wait = container_of(wait, ikglp_wait_state_t, fq_node);
13150 +		next = (struct task_struct*) wait->private;
13151 +
13152 +		__remove_wait_queue(&fq->wait, wait);
13153 +
13154 +		TRACE_CUR("queue %d: ASSIGNING %s/%d as owner - next\n",
13155 +				  ikglp_get_idx(sem, fq),
13156 +				  next->comm, next->pid);
13157 +
13158 +		// migrate wait-state to fifo-memory.
13159 +		ikglp_migrate_fq_to_owner_heap_nodes(sem, fq, fq_wait);
13160 +
13161 +		/* next becomes the resouce holder */
13162 +		fq->owner = next;
13163 +		tsk_rt(next)->blocked_lock = NULL;
13164 +
13165 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
13166 +		if(sem->aff_obs) {
13167 +			sem->aff_obs->ops->notify_acquired(sem->aff_obs, fq, next);
13168 +		}
13169 +#endif
13170 +
13171 +		/* determine new hp_waiter if necessary */
13172 +		if (next == fq->hp_waiter) {
13173 +
13174 +			TRACE_TASK(next, "was highest-prio waiter\n");
13175 +			/* next has the highest priority --- it doesn't need to
13176 +			 * inherit.  However, we need to make sure that the
13177 +			 * next-highest priority in the queue is reflected in
13178 +			 * hp_waiter. */
13179 +			fq->hp_waiter = ikglp_find_hp_waiter(fq, NULL);
13180 +			TRACE_TASK(next, "New hp_waiter for fq %d is %s/%d!\n",
13181 +					   ikglp_get_idx(sem, fq),
13182 +					   (fq->hp_waiter) ? fq->hp_waiter->comm : "nil",
13183 +					   (fq->hp_waiter) ? fq->hp_waiter->pid : -1);
13184 +
13185 +			fq->nest.hp_waiter_eff_prio = (fq->hp_waiter) ?
13186 +								effective_priority(fq->hp_waiter) : NULL;
13187 +
13188 +			if (fq->hp_waiter)
13189 +				TRACE_TASK(fq->hp_waiter, "is new highest-prio waiter\n");
13190 +			else
13191 +				TRACE("no further waiters\n");
13192 +
13193 +			raw_spin_lock(&tsk_rt(next)->hp_blocked_tasks_lock);
13194 +
13195 +//			TRACE_TASK(next, "Heap Before:\n");
13196 +//			print_hp_waiters(tsk_rt(next)->hp_blocked_tasks.root, 0);
13197 +
13198 +			binheap_add(&fq->nest.hp_binheap_node,
13199 +						&tsk_rt(next)->hp_blocked_tasks,
13200 +						struct nested_info,
13201 +						hp_binheap_node);
13202 +
13203 +//			TRACE_TASK(next, "Heap After:\n");
13204 +//			print_hp_waiters(tsk_rt(next)->hp_blocked_tasks.root, 0);
13205 +
13206 +			raw_spin_unlock(&tsk_rt(next)->hp_blocked_tasks_lock);
13207 +		}
13208 +		else {
13209 +			/* Well, if 'next' is not the highest-priority waiter,
13210 +			 * then it (probably) ought to inherit the highest-priority
13211 +			 * waiter's priority. */
13212 +			TRACE_TASK(next, "is not hp_waiter of replica %d. hp_waiter is %s/%d\n",
13213 +					   ikglp_get_idx(sem, fq),
13214 +					   (fq->hp_waiter) ? fq->hp_waiter->comm : "nil",
13215 +					   (fq->hp_waiter) ? fq->hp_waiter->pid : -1);
13216 +
13217 +			raw_spin_lock(&tsk_rt(next)->hp_blocked_tasks_lock);
13218 +
13219 +			binheap_add(&fq->nest.hp_binheap_node,
13220 +						&tsk_rt(next)->hp_blocked_tasks,
13221 +						struct nested_info,
13222 +						hp_binheap_node);
13223 +
13224 +			/* It is possible that 'next' *should* be the hp_waiter, but isn't
13225 +		     * because that update hasn't yet executed (update operation is
13226 +			 * probably blocked on mutex->lock). So only inherit if the top of
13227 +			 * 'next's top heap node is indeed the effective prio. of hp_waiter.
13228 +			 * (We use fq->hp_waiter_eff_prio instead of effective_priority(hp_waiter)
13229 +			 * since the effective priority of hp_waiter can change (and the
13230 +			 * update has not made it to this lock).)
13231 +			 */
13232 +			if(likely(top_priority(&tsk_rt(next)->hp_blocked_tasks) ==
13233 +												fq->nest.hp_waiter_eff_prio))
13234 +			{
13235 +				if(fq->nest.hp_waiter_eff_prio)
13236 +					litmus->increase_prio(next, fq->nest.hp_waiter_eff_prio);
13237 +				else
13238 +					WARN_ON(1);
13239 +			}
13240 +
13241 +			raw_spin_unlock(&tsk_rt(next)->hp_blocked_tasks_lock);
13242 +		}
13243 +
13244 +
13245 +		// wake up the new resource holder!
13246 +		wake_up_process(next);
13247 +	}
13248 +	if(fq_of_new_on_fq && fq_of_new_on_fq != fq && fq_of_new_on_fq->count == 1) {
13249 +		// The guy we promoted when to an empty FQ. (Why didn't stealing pick this up?)
13250 +		// Wake up the new guy too.
13251 +
13252 +		BUG_ON(fq_of_new_on_fq->owner != NULL);
13253 +
13254 +		fq = fq_of_new_on_fq;
13255 +		fq_of_new_on_fq = NULL;
13256 +		goto wake_kludge;
13257 +	}
13258 +
13259 +	unlock_fine_irqrestore(&sem->lock, flags);
13260 +	unlock_global_irqrestore(dgl_lock, flags);
13261 +
13262 +	raw_spin_unlock_irqrestore(&sem->real_lock, real_flags);
13263 +
13264 +out:
13265 +	return err;
13266 +}
13267 +
13268 +
13269 +
13270 +int ikglp_close(struct litmus_lock* l)
13271 +{
13272 +	struct task_struct *t = current;
13273 +	struct ikglp_semaphore *sem = ikglp_from_lock(l);
13274 +	unsigned long flags;
13275 +
13276 +	int owner = 0;
13277 +	int i;
13278 +
13279 +	raw_spin_lock_irqsave(&sem->real_lock, flags);
13280 +
13281 +	for(i = 0; i < sem->nr_replicas; ++i) {
13282 +		if(sem->fifo_queues[i].owner == t) {
13283 +			owner = 1;
13284 +			break;
13285 +		}
13286 +	}
13287 +
13288 +	raw_spin_unlock_irqrestore(&sem->real_lock, flags);
13289 +
13290 +	if (owner)
13291 +		ikglp_unlock(l);
13292 +
13293 +	return 0;
13294 +}
13295 +
13296 +void ikglp_free(struct litmus_lock* l)
13297 +{
13298 +	struct ikglp_semaphore *sem = ikglp_from_lock(l);
13299 +
13300 +	kfree(sem->fifo_queues);
13301 +	kfree(sem);
13302 +}
13303 +
13304 +
13305 +
13306 +struct litmus_lock* ikglp_new(int m,
13307 +							  struct litmus_lock_ops* ops,
13308 +							  void* __user arg)
13309 +{
13310 +	struct ikglp_semaphore* sem;
13311 +	int nr_replicas = 0;
13312 +	int i;
13313 +
13314 +	if(!access_ok(VERIFY_READ, arg, sizeof(nr_replicas)))
13315 +	{
13316 +		return(NULL);
13317 +	}
13318 +	if(__copy_from_user(&nr_replicas, arg, sizeof(nr_replicas)))
13319 +	{
13320 +		return(NULL);
13321 +	}
13322 +	if(nr_replicas < 1)
13323 +	{
13324 +		return(NULL);
13325 +	}
13326 +
13327 +	sem = kmalloc(sizeof(*sem), GFP_KERNEL);
13328 +	if(!sem)
13329 +	{
13330 +		return NULL;
13331 +	}
13332 +
13333 +	sem->fifo_queues = kmalloc(sizeof(struct fifo_queue)*nr_replicas, GFP_KERNEL);
13334 +	if(!sem->fifo_queues)
13335 +	{
13336 +		kfree(sem);
13337 +		return NULL;
13338 +	}
13339 +
13340 +	sem->litmus_lock.ops = ops;
13341 +
13342 +#ifdef CONFIG_DEBUG_SPINLOCK
13343 +	{
13344 +		__raw_spin_lock_init(&sem->lock, ((struct litmus_lock*)sem)->cheat_lockdep, &((struct litmus_lock*)sem)->key);
13345 +	}
13346 +#else
13347 +	raw_spin_lock_init(&sem->lock);
13348 +#endif
13349 +
13350 +	raw_spin_lock_init(&sem->real_lock);
13351 +
13352 +	sem->nr_replicas = nr_replicas;
13353 +	sem->m = m;
13354 +	sem->max_fifo_len = (sem->m/nr_replicas) + ((sem->m%nr_replicas) != 0);
13355 +	sem->nr_in_fifos = 0;
13356 +
13357 +	TRACE("New IKGLP Sem: m = %d, k = %d, max fifo_len = %d\n",
13358 +		  sem->m,
13359 +		  sem->nr_replicas,
13360 +		  sem->max_fifo_len);
13361 +
13362 +	for(i = 0; i < nr_replicas; ++i)
13363 +	{
13364 +		struct fifo_queue* q = &(sem->fifo_queues[i]);
13365 +
13366 +		q->owner = NULL;
13367 +		q->hp_waiter = NULL;
13368 +		init_waitqueue_head(&q->wait);
13369 +		q->count = 0;
13370 +
13371 +		q->global_heap_node.task = NULL;
13372 +		INIT_BINHEAP_NODE(&q->global_heap_node.node);
13373 +
13374 +		q->donee_heap_node.task = NULL;
13375 +		q->donee_heap_node.donor_info = NULL;
13376 +		q->donee_heap_node.fq = NULL;
13377 +		INIT_BINHEAP_NODE(&q->donee_heap_node.node);
13378 +
13379 +		q->nest.lock = (struct litmus_lock*)sem;
13380 +		q->nest.hp_waiter_eff_prio = NULL;
13381 +		q->nest.hp_waiter_ptr = &q->hp_waiter;
13382 +		INIT_BINHEAP_NODE(&q->nest.hp_binheap_node);
13383 +	}
13384 +
13385 +	sem->shortest_fifo_queue = &sem->fifo_queues[0];
13386 +
13387 +	sem->top_m_size = 0;
13388 +
13389 +	// init heaps
13390 +	INIT_BINHEAP_HANDLE(&sem->top_m, ikglp_min_heap_base_priority_order);
13391 +	INIT_BINHEAP_HANDLE(&sem->not_top_m, ikglp_max_heap_base_priority_order);
13392 +	INIT_BINHEAP_HANDLE(&sem->donees, ikglp_min_heap_donee_order);
13393 +	INIT_BINHEAP_HANDLE(&sem->priority_queue, ikglp_max_heap_base_priority_order);
13394 +	INIT_BINHEAP_HANDLE(&sem->donors, ikglp_donor_max_heap_base_priority_order);
13395 +
13396 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
13397 +	sem->aff_obs = NULL;
13398 +#endif
13399 +
13400 +	return &sem->litmus_lock;
13401 +}
13402 +
13403 +
13404 +
13405 +
13406 +
13407 +
13408 +
13409 +
13410 +
13411 +
13412 +
13413 +
13414 +
13415 +
13416 +
13417 +
13418 +
13419 +
13420 +
13421 +
13422 +
13423 +
13424 +
13425 +
13426 +
13427 +
13428 +
13429 +
13430 +
13431 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
13432 +
13433 +static inline int __replica_to_gpu(struct ikglp_affinity* aff, int replica)
13434 +{
13435 +	int gpu = replica % aff->nr_rsrc;
13436 +	return gpu;
13437 +}
13438 +
13439 +static inline int replica_to_gpu(struct ikglp_affinity* aff, int replica)
13440 +{
13441 +	int gpu = __replica_to_gpu(aff, replica) + aff->offset;
13442 +	return gpu;
13443 +}
13444 +
13445 +static inline int gpu_to_base_replica(struct ikglp_affinity* aff, int gpu)
13446 +{
13447 +	int replica = gpu - aff->offset;
13448 +	return replica;
13449 +}
13450 +
13451 +
13452 +int ikglp_aff_obs_close(struct affinity_observer* obs)
13453 +{
13454 +	return 0;
13455 +}
13456 +
13457 +void ikglp_aff_obs_free(struct affinity_observer* obs)
13458 +{
13459 +	struct ikglp_affinity *ikglp_aff = ikglp_aff_obs_from_aff_obs(obs);
13460 +	kfree(ikglp_aff->nr_cur_users_on_rsrc);
13461 +	kfree(ikglp_aff->q_info);
13462 +	kfree(ikglp_aff);
13463 +}
13464 +
13465 +static struct affinity_observer* ikglp_aff_obs_new(struct affinity_observer_ops* ops,
13466 +												   struct ikglp_affinity_ops* ikglp_ops,
13467 +												   void* __user args)
13468 +{
13469 +	struct ikglp_affinity* ikglp_aff;
13470 +	struct gpu_affinity_observer_args aff_args;
13471 +	struct ikglp_semaphore* sem;
13472 +	int i;
13473 +	unsigned long flags;
13474 +
13475 +	if(!access_ok(VERIFY_READ, args, sizeof(aff_args))) {
13476 +		return(NULL);
13477 +	}
13478 +	if(__copy_from_user(&aff_args, args, sizeof(aff_args))) {
13479 +		return(NULL);
13480 +	}
13481 +
13482 +	sem = (struct ikglp_semaphore*) get_lock_from_od(aff_args.obs.lock_od);
13483 +
13484 +	if(sem->litmus_lock.type != IKGLP_SEM) {
13485 +		TRACE_CUR("Lock type not supported.  Type = %d\n", sem->litmus_lock.type);
13486 +		return(NULL);
13487 +	}
13488 +
13489 +	if((aff_args.nr_simult_users <= 0) ||
13490 +	   (sem->nr_replicas%aff_args.nr_simult_users != 0)) {
13491 +		TRACE_CUR("Lock %d does not support #replicas (%d) for #simult_users "
13492 +				  "(%d) per replica.  #replicas should be evenly divisible "
13493 +				  "by #simult_users.\n",
13494 +				  sem->litmus_lock.ident,
13495 +				  sem->nr_replicas,
13496 +				  aff_args.nr_simult_users);
13497 +		return(NULL);
13498 +	}
13499 +
13500 +	if(aff_args.nr_simult_users > NV_MAX_SIMULT_USERS) {
13501 +		TRACE_CUR("System does not support #simult_users > %d. %d requested.\n",
13502 +				  NV_MAX_SIMULT_USERS, aff_args.nr_simult_users);
13503 +//		return(NULL);
13504 +	}
13505 +
13506 +	ikglp_aff = kmalloc(sizeof(*ikglp_aff), GFP_KERNEL);
13507 +	if(!ikglp_aff) {
13508 +		return(NULL);
13509 +	}
13510 +
13511 +	ikglp_aff->q_info = kmalloc(sizeof(struct ikglp_queue_info)*sem->nr_replicas, GFP_KERNEL);
13512 +	if(!ikglp_aff->q_info) {
13513 +		kfree(ikglp_aff);
13514 +		return(NULL);
13515 +	}
13516 +
13517 +	ikglp_aff->nr_cur_users_on_rsrc = kmalloc(sizeof(int)*(sem->nr_replicas / aff_args.nr_simult_users), GFP_KERNEL);
13518 +	if(!ikglp_aff->nr_cur_users_on_rsrc) {
13519 +		kfree(ikglp_aff->q_info);
13520 +		kfree(ikglp_aff);
13521 +		return(NULL);
13522 +	}
13523 +
13524 +	affinity_observer_new(&ikglp_aff->obs, ops, &aff_args.obs);
13525 +
13526 +	ikglp_aff->ops = ikglp_ops;
13527 +	ikglp_aff->offset = aff_args.replica_to_gpu_offset;
13528 +	ikglp_aff->nr_simult = aff_args.nr_simult_users;
13529 +	ikglp_aff->nr_rsrc = sem->nr_replicas / ikglp_aff->nr_simult;
13530 +	ikglp_aff->relax_max_fifo_len = (aff_args.relaxed_rules) ? 1 : 0;
13531 +
13532 +	TRACE_CUR("GPU affinity_observer: offset = %d, nr_simult = %d, "
13533 +			  "nr_rsrc = %d, relaxed_fifo_len = %d\n",
13534 +			  ikglp_aff->offset, ikglp_aff->nr_simult, ikglp_aff->nr_rsrc,
13535 +			  ikglp_aff->relax_max_fifo_len);
13536 +
13537 +	memset(ikglp_aff->nr_cur_users_on_rsrc, 0, sizeof(int)*(ikglp_aff->nr_rsrc));
13538 +
13539 +	for(i = 0; i < sem->nr_replicas; ++i) {
13540 +		ikglp_aff->q_info[i].q = &sem->fifo_queues[i];
13541 +		ikglp_aff->q_info[i].estimated_len = 0;
13542 +
13543 +		// multiple q_info's will point to the same resource (aka GPU) if
13544 +		// aff_args.nr_simult_users > 1
13545 +		ikglp_aff->q_info[i].nr_cur_users = &ikglp_aff->nr_cur_users_on_rsrc[__replica_to_gpu(ikglp_aff,i)];
13546 +	}
13547 +
13548 +	// attach observer to the lock
13549 +	raw_spin_lock_irqsave(&sem->real_lock, flags);
13550 +	sem->aff_obs = ikglp_aff;
13551 +	raw_spin_unlock_irqrestore(&sem->real_lock, flags);
13552 +
13553 +	return &ikglp_aff->obs;
13554 +}
13555 +
13556 +
13557 +
13558 +
13559 +static int gpu_replica_to_resource(struct ikglp_affinity* aff,
13560 +								   struct fifo_queue* fq) {
13561 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
13562 +	return(replica_to_gpu(aff, ikglp_get_idx(sem, fq)));
13563 +}
13564 +
13565 +
13566 +// Smart IKGLP Affinity
13567 +
13568 +//static inline struct ikglp_queue_info* ikglp_aff_find_shortest(struct ikglp_affinity* aff)
13569 +//{
13570 +//	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
13571 +//	struct ikglp_queue_info *shortest = &aff->q_info[0];
13572 +//	int i;
13573 +//
13574 +//	for(i = 1; i < sem->nr_replicas; ++i) {
13575 +//		if(aff->q_info[i].estimated_len < shortest->estimated_len) {
13576 +//			shortest = &aff->q_info[i];
13577 +//		}
13578 +//	}
13579 +//
13580 +//	return(shortest);
13581 +//}
13582 +
13583 +struct fifo_queue* gpu_ikglp_advise_enqueue(struct ikglp_affinity* aff, struct task_struct* t)
13584 +{
13585 +	// advise_enqueue must be smart as not not break IKGLP rules:
13586 +	//  * No queue can be greater than ceil(m/k) in length.  We may return
13587 +	//    such a queue, but IKGLP will be smart enough as to send requests
13588 +	//    to donors or PQ.
13589 +	//  * Cannot let a queue idle if there exist waiting PQ/donors
13590 +	//      -- needed to guarantee parallel progress of waiters.
13591 +	//
13592 +	// We may be able to relax some of these constraints, but this will have to
13593 +	// be carefully evaluated.
13594 +	//
13595 +	// Huristic strategy: Find the shortest queue that is not full.
13596 +
13597 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
13598 +	lt_t min_len;
13599 +	int min_nr_users;
13600 +	struct ikglp_queue_info *shortest;
13601 +	struct fifo_queue *to_enqueue;
13602 +	int i;
13603 +	int affinity_gpu;
13604 +
13605 +	int max_fifo_len = (aff->relax_max_fifo_len) ?
13606 +		sem->m : sem->max_fifo_len;
13607 +
13608 +	// simply pick the shortest queue if, we have no affinity, or we have
13609 +	// affinity with the shortest
13610 +	if(unlikely(tsk_rt(t)->last_gpu < 0)) {
13611 +		affinity_gpu = aff->offset;  // first gpu
13612 +		TRACE_CUR("no affinity\n");
13613 +	}
13614 +	else {
13615 +		affinity_gpu = tsk_rt(t)->last_gpu;
13616 +	}
13617 +
13618 +	// all things being equal, let's start with the queue with which we have
13619 +	// affinity.  this helps us maintain affinity even when we don't have
13620 +	// an estiamte for local-affinity execution time (i.e., 2nd time on GPU)
13621 +	shortest = &aff->q_info[gpu_to_base_replica(aff, affinity_gpu)];
13622 +
13623 +	//	if(shortest == aff->shortest_queue) {
13624 +	//		TRACE_CUR("special case: have affinity with shortest queue\n");
13625 +	//		goto out;
13626 +	//	}
13627 +
13628 +	min_len = shortest->estimated_len + get_gpu_estimate(t, MIG_LOCAL);
13629 +	min_nr_users = *(shortest->nr_cur_users);
13630 +
13631 +	TRACE_CUR("cs is %llu on queue %d (count = %d): est len = %llu\n",
13632 +			  get_gpu_estimate(t, MIG_LOCAL),
13633 +			  ikglp_get_idx(sem, shortest->q),
13634 +			  shortest->q->count,
13635 +			  min_len);
13636 +
13637 +	for(i = 0; i < sem->nr_replicas; ++i) {
13638 +		if(&aff->q_info[i] != shortest) {
13639 +			if(aff->q_info[i].q->count < max_fifo_len) {
13640 +
13641 +				lt_t est_len =
13642 +					aff->q_info[i].estimated_len +
13643 +					get_gpu_estimate(t,
13644 +								gpu_migration_distance(tsk_rt(t)->last_gpu,
13645 +													replica_to_gpu(aff, i)));
13646 +
13647 +		// queue is smaller, or they're equal and the other has a smaller number
13648 +		// of total users.
13649 +		//
13650 +		// tie-break on the shortest number of simult users.  this only kicks in
13651 +		// when there are more than 1 empty queues.
13652 +				if((shortest->q->count >= max_fifo_len) ||		/* 'shortest' is full and i-th queue is not */
13653 +				   (est_len < min_len) ||						/* i-th queue has shortest length */
13654 +				   ((est_len == min_len) &&						/* equal lengths, but one has fewer over-all users */
13655 +					(*(aff->q_info[i].nr_cur_users) < min_nr_users))) {
13656 +
13657 +					shortest = &aff->q_info[i];
13658 +					min_len = est_len;
13659 +					min_nr_users = *(aff->q_info[i].nr_cur_users);
13660 +				}
13661 +
13662 +				TRACE_CUR("cs is %llu on queue %d (count = %d): est len = %llu\n",
13663 +						  get_gpu_estimate(t,
13664 +								gpu_migration_distance(tsk_rt(t)->last_gpu,
13665 +													   replica_to_gpu(aff, i))),
13666 +						  ikglp_get_idx(sem, aff->q_info[i].q),
13667 +						  aff->q_info[i].q->count,
13668 +						  est_len);
13669 +			}
13670 +			else {
13671 +				TRACE_CUR("queue %d is too long.  ineligible for enqueue.\n",
13672 +						  ikglp_get_idx(sem, aff->q_info[i].q));
13673 +			}
13674 +		}
13675 +	}
13676 +
13677 +	if(shortest->q->count >= max_fifo_len) {
13678 +		TRACE_CUR("selected fq %d is too long, but returning it anyway.\n",
13679 +				  ikglp_get_idx(sem, shortest->q));
13680 +	}
13681 +
13682 +	to_enqueue = shortest->q;
13683 +	TRACE_CUR("enqueue on fq %d (count = %d) (non-aff wanted fq %d)\n",
13684 +			  ikglp_get_idx(sem, to_enqueue),
13685 +			  to_enqueue->count,
13686 +			  ikglp_get_idx(sem, sem->shortest_fifo_queue));
13687 +
13688 +	return to_enqueue;
13689 +
13690 +	//return(sem->shortest_fifo_queue);
13691 +}
13692 +
13693 +
13694 +
13695 +
13696 +static ikglp_wait_state_t* pick_steal(struct ikglp_affinity* aff,
13697 +									  int dest_gpu,
13698 +									  struct fifo_queue* fq)
13699 +{
13700 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
13701 +	ikglp_wait_state_t *wait = NULL;
13702 +	int max_improvement = -(MIG_NONE+1);
13703 +	int replica = ikglp_get_idx(sem, fq);
13704 +
13705 +	if(waitqueue_active(&fq->wait)) {
13706 +		int this_gpu = replica_to_gpu(aff, replica);
13707 +		struct list_head *pos;
13708 +
13709 +		list_for_each(pos, &fq->wait.task_list) {
13710 +			wait_queue_t *fq_wait = list_entry(pos, wait_queue_t, task_list);
13711 +			ikglp_wait_state_t *tmp_wait = container_of(fq_wait, ikglp_wait_state_t, fq_node);
13712 +
13713 +			int tmp_improvement =
13714 +				gpu_migration_distance(this_gpu, tsk_rt(tmp_wait->task)->last_gpu) -
13715 +				gpu_migration_distance(dest_gpu, tsk_rt(tmp_wait->task)->last_gpu);
13716 +
13717 +			if(tmp_improvement > max_improvement) {
13718 +				wait = tmp_wait;
13719 +				max_improvement = tmp_improvement;
13720 +
13721 +				if(max_improvement >= (MIG_NONE-1)) {
13722 +					goto out;
13723 +				}
13724 +			}
13725 +		}
13726 +
13727 +		BUG_ON(!wait);
13728 +	}
13729 +	else {
13730 +		TRACE_CUR("fq %d is empty!\n", replica);
13731 +	}
13732 +
13733 +out:
13734 +
13735 +	TRACE_CUR("Candidate victim from fq %d is %s/%d.  aff improvement = %d.\n",
13736 +			  replica,
13737 +			  (wait) ? wait->task->comm : "nil",
13738 +			  (wait) ? wait->task->pid  : -1,
13739 +			  max_improvement);
13740 +
13741 +	return wait;
13742 +}
13743 +
13744 +
13745 +ikglp_wait_state_t* gpu_ikglp_advise_steal(struct ikglp_affinity* aff,
13746 +										   struct fifo_queue* dst)
13747 +{
13748 +	// Huristic strategy: Find task with greatest improvement in affinity.
13749 +	//
13750 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
13751 +	ikglp_wait_state_t *to_steal_state = NULL;
13752 +//	ikglp_wait_state_t *default_to_steal_state = ikglp_find_hp_waiter_to_steal(sem);
13753 +	int max_improvement = -(MIG_NONE+1);
13754 +	int replica, i;
13755 +	int dest_gpu;
13756 +
13757 +	replica = ikglp_get_idx(sem, dst);
13758 +	dest_gpu = replica_to_gpu(aff, replica);
13759 +
13760 +	for(i = 0; i < sem->nr_replicas; ++i) {
13761 +		ikglp_wait_state_t *tmp_to_steal_state =
13762 +			pick_steal(aff, dest_gpu, &sem->fifo_queues[i]);
13763 +
13764 +		if(tmp_to_steal_state) {
13765 +			int tmp_improvement =
13766 +				gpu_migration_distance(replica_to_gpu(aff, i), tsk_rt(tmp_to_steal_state->task)->last_gpu) -
13767 +				gpu_migration_distance(dest_gpu, tsk_rt(tmp_to_steal_state->task)->last_gpu);
13768 +
13769 +			if(tmp_improvement > max_improvement) {
13770 +				to_steal_state = tmp_to_steal_state;
13771 +				max_improvement = tmp_improvement;
13772 +
13773 +				if(max_improvement >= (MIG_NONE-1)) {
13774 +					goto out;
13775 +				}
13776 +			}
13777 +		}
13778 +	}
13779 +
13780 +out:
13781 +	if(!to_steal_state) {
13782 +		TRACE_CUR("Could not find anyone to steal.\n");
13783 +	}
13784 +	else {
13785 +		TRACE_CUR("Selected victim %s/%d on fq %d (GPU %d) for fq %d (GPU %d): improvement = %d\n",
13786 +				  to_steal_state->task->comm, to_steal_state->task->pid,
13787 +				  ikglp_get_idx(sem, to_steal_state->donee_heap_node.fq),
13788 +				  replica_to_gpu(aff, ikglp_get_idx(sem, to_steal_state->donee_heap_node.fq)),
13789 +				  ikglp_get_idx(sem, dst),
13790 +				  dest_gpu,
13791 +				  max_improvement);
13792 +
13793 +//		TRACE_CUR("Non-aff wanted to select victim %s/%d on fq %d (GPU %d) for fq %d (GPU %d): improvement = %d\n",
13794 +//				  default_to_steal_state->task->comm, default_to_steal_state->task->pid,
13795 +//				  ikglp_get_idx(sem, default_to_steal_state->donee_heap_node.fq),
13796 +//				  replica_to_gpu(aff, ikglp_get_idx(sem, default_to_steal_state->donee_heap_node.fq)),
13797 +//				  ikglp_get_idx(sem, dst),
13798 +//				  replica_to_gpu(aff, ikglp_get_idx(sem, dst)),
13799 +//
13800 +//				  gpu_migration_distance(
13801 +//					  replica_to_gpu(aff, ikglp_get_idx(sem, default_to_steal_state->donee_heap_node.fq)),
13802 +//					  tsk_rt(default_to_steal_state->task)->last_gpu) -
13803 +//				  gpu_migration_distance(dest_gpu, tsk_rt(default_to_steal_state->task)->last_gpu));
13804 +	}
13805 +
13806 +	return(to_steal_state);
13807 +}
13808 +
13809 +
13810 +static inline int has_donor(wait_queue_t* fq_wait)
13811 +{
13812 +	ikglp_wait_state_t *wait = container_of(fq_wait, ikglp_wait_state_t, fq_node);
13813 +	return(wait->donee_heap_node.donor_info != NULL);
13814 +}
13815 +
13816 +static ikglp_donee_heap_node_t* pick_donee(struct ikglp_affinity* aff,
13817 +					  struct fifo_queue* fq,
13818 +					  int* dist_from_head)
13819 +{
13820 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
13821 +	struct task_struct *donee;
13822 +	ikglp_donee_heap_node_t *donee_node;
13823 +	struct task_struct *mth_highest = ikglp_mth_highest(sem);
13824 +
13825 +//	lt_t now = litmus_clock();
13826 +//
13827 +//	TRACE_CUR("fq %d: mth_highest: %s/%d, deadline = %d: (donor) = ??? ",
13828 +//			  ikglp_get_idx(sem, fq),
13829 +//			  mth_highest->comm, mth_highest->pid,
13830 +//			  (int)get_deadline(mth_highest) - now);
13831 +
13832 +	if(fq->owner &&
13833 +	   fq->donee_heap_node.donor_info == NULL &&
13834 +	   mth_highest != fq->owner &&
13835 +	   litmus->__compare(mth_highest, BASE, fq->owner, BASE)) {
13836 +		donee = fq->owner;
13837 +		donee_node = &(fq->donee_heap_node);
13838 +		*dist_from_head = 0;
13839 +
13840 +		BUG_ON(donee != donee_node->task);
13841 +
13842 +		TRACE_CUR("picked owner of fq %d as donee\n",
13843 +				  ikglp_get_idx(sem, fq));
13844 +
13845 +		goto out;
13846 +	}
13847 +	else if(waitqueue_active(&fq->wait)) {
13848 +		struct list_head	*pos;
13849 +
13850 +
13851 +//		TRACE_CUR("fq %d: owner: %s/%d, deadline = %d: (donor) = %s/%d "
13852 +//				  "(mth_highest != fq->owner) = %d "
13853 +//				  "(mth_highest > fq->owner) = %d\n",
13854 +//				  ikglp_get_idx(sem, fq),
13855 +//				  (fq->owner) ? fq->owner->comm : "nil",
13856 +//				  (fq->owner) ? fq->owner->pid : -1,
13857 +//				  (fq->owner) ? (int)get_deadline(fq->owner) - now : -999,
13858 +//				  (fq->donee_heap_node.donor_info) ? fq->donee_heap_node.donor_info->task->comm : "nil",
13859 +//				  (fq->donee_heap_node.donor_info) ? fq->donee_heap_node.donor_info->task->pid : -1,
13860 +//				  (mth_highest != fq->owner),
13861 +//				  (litmus->__compare(mth_highest, BASE, fq->owner, BASE)));
13862 +
13863 +
13864 +		*dist_from_head = 1;
13865 +
13866 +		// iterating from the start of the queue is nice since this means
13867 +		// the donee will be closer to obtaining a resource.
13868 +		list_for_each(pos, &fq->wait.task_list) {
13869 +			wait_queue_t *fq_wait = list_entry(pos, wait_queue_t, task_list);
13870 +			ikglp_wait_state_t *wait = container_of(fq_wait, ikglp_wait_state_t, fq_node);
13871 +
13872 +//			TRACE_CUR("fq %d: waiter %d: %s/%d, deadline = %d (donor) = %s/%d "
13873 +//					  "(mth_highest != wait->task) = %d "
13874 +//					  "(mth_highest > wait->task) = %d\n",
13875 +//					  ikglp_get_idx(sem, fq),
13876 +//					  dist_from_head,
13877 +//					  wait->task->comm, wait->task->pid,
13878 +//					  (int)get_deadline(wait->task) - now,
13879 +//					  (wait->donee_heap_node.donor_info) ? wait->donee_heap_node.donor_info->task->comm : "nil",
13880 +//					  (wait->donee_heap_node.donor_info) ? wait->donee_heap_node.donor_info->task->pid : -1,
13881 +//					  (mth_highest != wait->task),
13882 +//					  (litmus->__compare(mth_highest, BASE, wait->task, BASE)));
13883 +
13884 +
13885 +			if(!has_donor(fq_wait) &&
13886 +			   mth_highest != wait->task &&
13887 +			   litmus->__compare(mth_highest, BASE, wait->task, BASE)) {
13888 +				donee = (struct task_struct*) fq_wait->private;
13889 +				donee_node = &wait->donee_heap_node;
13890 +
13891 +				BUG_ON(donee != donee_node->task);
13892 +
13893 +				TRACE_CUR("picked waiter in fq %d as donee\n",
13894 +						  ikglp_get_idx(sem, fq));
13895 +
13896 +				goto out;
13897 +			}
13898 +			++(*dist_from_head);
13899 +		}
13900 +	}
13901 +
13902 +	donee = NULL;
13903 +	donee_node = NULL;
13904 +	//*dist_from_head = sem->max_fifo_len + 1;
13905 +	*dist_from_head = IKGLP_INVAL_DISTANCE;
13906 +
13907 +	TRACE_CUR("Found no one to be donee in fq %d!\n", ikglp_get_idx(sem, fq));
13908 +
13909 +out:
13910 +
13911 +	TRACE_CUR("Candidate donee for fq %d is %s/%d (dist_from_head = %d)\n",
13912 +			  ikglp_get_idx(sem, fq),
13913 +			  (donee) ? (donee)->comm : "nil",
13914 +			  (donee) ? (donee)->pid  : -1,
13915 +			  *dist_from_head);
13916 +
13917 +	return donee_node;
13918 +}
13919 +
13920 +ikglp_donee_heap_node_t* gpu_ikglp_advise_donee_selection(
13921 +											struct ikglp_affinity* aff,
13922 +											struct task_struct* donor)
13923 +{
13924 +	// Huristic strategy: Find the highest-priority donee that is waiting on
13925 +	// a queue closest to our affinity.  (1) The donee CANNOT already have a
13926 +	// donor (exception: donee is the lowest-prio task in the donee heap).
13927 +	// (2) Requests in 'top_m' heap are ineligible.
13928 +	//
13929 +	// Further strategy: amongst elible donees waiting for the same GPU, pick
13930 +	// the one closest to the head of the FIFO queue (including owners).
13931 +	//
13932 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
13933 +	ikglp_donee_heap_node_t *donee_node;
13934 +	gpu_migration_dist_t distance;
13935 +	int start, i, j;
13936 +
13937 +	ikglp_donee_heap_node_t *default_donee;
13938 +	ikglp_wait_state_t *default_donee_donor_info;
13939 +
13940 +	if(tsk_rt(donor)->last_gpu < 0) {
13941 +		// no affinity.  just return the min prio, like standard IKGLP
13942 +		// TODO: Find something closer to the head of the queue??
13943 +		donee_node = binheap_top_entry(&sem->donees,
13944 +									   ikglp_donee_heap_node_t,
13945 +									   node);
13946 +		goto out;
13947 +	}
13948 +
13949 +
13950 +	// Temporarily break any donation relation the default donee (the lowest
13951 +	// prio task in the FIFO queues) to make it eligible for selection below.
13952 +	//
13953 +	// NOTE: The original donor relation *must* be restored, even if we select
13954 +	// the default donee throug affinity-aware selection, before returning
13955 +	// from this function so we don't screw up our heap ordering.
13956 +	// The standard IKGLP algorithm will steal the donor relationship if needed.
13957 +	default_donee = binheap_top_entry(&sem->donees, ikglp_donee_heap_node_t, node);
13958 +	default_donee_donor_info = default_donee->donor_info;  // back-up donor relation
13959 +	default_donee->donor_info = NULL;  // temporarily break any donor relation.
13960 +
13961 +	// initialize our search
13962 +	donee_node = NULL;
13963 +	distance = MIG_NONE;
13964 +
13965 +	// TODO: The below search logic may work well for locating nodes to steal
13966 +	// when an FQ goes idle.  Validate this code and apply it to stealing.
13967 +
13968 +	// begin search with affinity GPU.
13969 +	start = gpu_to_base_replica(aff, tsk_rt(donor)->last_gpu);
13970 +	i = start;
13971 +	do {  // "for each gpu" / "for each aff->nr_rsrc"
13972 +		gpu_migration_dist_t temp_distance = gpu_migration_distance(start, i);
13973 +
13974 +		// only interested in queues that will improve our distance
13975 +		if(temp_distance < distance || donee_node == NULL) {
13976 +			int dist_from_head = IKGLP_INVAL_DISTANCE;
13977 +
13978 +			TRACE_CUR("searching for donor on GPU %d", i);
13979 +
13980 +			// visit each queue and pick a donee.  bail as soon as we find
13981 +			// one for this class.
13982 +
13983 +			for(j = 0; j < aff->nr_simult; ++j) {
13984 +				int temp_dist_from_head;
13985 +				ikglp_donee_heap_node_t *temp_donee_node;
13986 +				struct fifo_queue *fq;
13987 +
13988 +				fq = &(sem->fifo_queues[i + j*aff->nr_rsrc]);
13989 +				temp_donee_node = pick_donee(aff, fq, &temp_dist_from_head);
13990 +
13991 +				if(temp_dist_from_head < dist_from_head)
13992 +				{
13993 +					// we check all the FQs for this GPU to spread priorities
13994 +					// out across the queues.  does this decrease jitter?
13995 +					donee_node = temp_donee_node;
13996 +					dist_from_head = temp_dist_from_head;
13997 +				}
13998 +			}
13999 +
14000 +			if(dist_from_head != IKGLP_INVAL_DISTANCE) {
14001 +				TRACE_CUR("found donee %s/%d and is the %d-th waiter.\n",
14002 +						  donee_node->task->comm, donee_node->task->pid,
14003 +						  dist_from_head);
14004 +			}
14005 +			else {
14006 +				TRACE_CUR("found no eligible donors from GPU %d\n", i);
14007 +			}
14008 +		}
14009 +		else {
14010 +			TRACE_CUR("skipping GPU %d (distance = %d, best donor "
14011 +					  "distance = %d)\n", i, temp_distance, distance);
14012 +		}
14013 +
14014 +		i = (i+1 < aff->nr_rsrc) ? i+1 : 0;  // increment with wrap-around
14015 +	} while (i != start);
14016 +
14017 +
14018 +	// restore old donor info state.
14019 +	default_donee->donor_info = default_donee_donor_info;
14020 +
14021 +	if(!donee_node) {
14022 +		donee_node = default_donee;
14023 +
14024 +		TRACE_CUR("Could not find a donee. We have to steal one.\n");
14025 +		WARN_ON(default_donee->donor_info == NULL);
14026 +	}
14027 +
14028 +out:
14029 +
14030 +	TRACE_CUR("Selected donee %s/%d on fq %d (GPU %d) for %s/%d with affinity for GPU %d\n",
14031 +			  donee_node->task->comm, donee_node->task->pid,
14032 +			  ikglp_get_idx(sem, donee_node->fq),
14033 +			  replica_to_gpu(aff, ikglp_get_idx(sem, donee_node->fq)),
14034 +			  donor->comm, donor->pid, tsk_rt(donor)->last_gpu);
14035 +
14036 +	return(donee_node);
14037 +}
14038 +
14039 +
14040 +
14041 +static void __find_closest_donor(int target_gpu,
14042 +								 struct binheap_node* donor_node,
14043 +								 ikglp_wait_state_t** cur_closest,
14044 +								 int* cur_dist)
14045 +{
14046 +	ikglp_wait_state_t *this_donor =
14047 +		binheap_entry(donor_node, ikglp_wait_state_t, node);
14048 +
14049 +	int this_dist =
14050 +		gpu_migration_distance(target_gpu, tsk_rt(this_donor->task)->last_gpu);
14051 +
14052 +//	TRACE_CUR("%s/%d: dist from target = %d\n",
14053 +//			  this_donor->task->comm,
14054 +//			  this_donor->task->pid,
14055 +//			  this_dist);
14056 +
14057 +	if(this_dist < *cur_dist) {
14058 +		// take this donor
14059 +		*cur_dist = this_dist;
14060 +		*cur_closest = this_donor;
14061 +	}
14062 +	else if(this_dist == *cur_dist) {
14063 +		// priority tie-break.  Even though this is a pre-order traversal,
14064 +		// this is a heap, not a binary tree, so we still need to do a priority
14065 +		// comparision.
14066 +		if(!(*cur_closest) ||
14067 +		   litmus->compare(this_donor->task, (*cur_closest)->task)) {
14068 +			*cur_dist = this_dist;
14069 +			*cur_closest = this_donor;
14070 +		}
14071 +	}
14072 +
14073 +    if(donor_node->left) __find_closest_donor(target_gpu, donor_node->left, cur_closest, cur_dist);
14074 +    if(donor_node->right) __find_closest_donor(target_gpu, donor_node->right, cur_closest, cur_dist);
14075 +}
14076 +
14077 +ikglp_wait_state_t* gpu_ikglp_advise_donor_to_fq(struct ikglp_affinity* aff, struct fifo_queue* fq)
14078 +{
14079 +	// Huristic strategy: Find donor with the closest affinity to fq.
14080 +	// Tie-break on priority.
14081 +
14082 +	// We need to iterate over all the donors to do this.  Unfortunatly,
14083 +	// our donors are organized in a heap.  We'll visit each node with a
14084 +	// recurisve call.  This is realitively safe since there are only sem->m
14085 +	// donors, at most.  We won't recurse too deeply to have to worry about
14086 +	// our stack.  (even with 128 CPUs, our nest depth is at most 7 deep).
14087 +
14088 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14089 +	ikglp_wait_state_t *donor = NULL;
14090 +	int distance = MIG_NONE;
14091 +	int gpu = replica_to_gpu(aff, ikglp_get_idx(sem, fq));
14092 +	ikglp_wait_state_t* default_donor = binheap_top_entry(&sem->donors, ikglp_wait_state_t, node);
14093 +
14094 +	__find_closest_donor(gpu, sem->donors.root, &donor, &distance);
14095 +
14096 +	TRACE_CUR("Selected donor %s/%d (distance = %d) to move to fq %d "
14097 +			  "(non-aff wanted %s/%d). differs = %d\n",
14098 +			  donor->task->comm, donor->task->pid,
14099 +			  distance,
14100 +			  ikglp_get_idx(sem, fq),
14101 +			  default_donor->task->comm, default_donor->task->pid,
14102 +			  (donor->task != default_donor->task)
14103 +			  );
14104 +
14105 +	return(donor);
14106 +}
14107 +
14108 +
14109 +
14110 +void gpu_ikglp_notify_enqueue(struct ikglp_affinity* aff, struct fifo_queue* fq, struct task_struct* t)
14111 +{
14112 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14113 +	int replica = ikglp_get_idx(sem, fq);
14114 +	int gpu = replica_to_gpu(aff, replica);
14115 +	struct ikglp_queue_info *info = &aff->q_info[replica];
14116 +	lt_t est_time;
14117 +	lt_t est_len_before;
14118 +
14119 +	if(current == t) {
14120 +		tsk_rt(t)->suspend_gpu_tracker_on_block = 1;
14121 +	}
14122 +
14123 +	est_len_before = info->estimated_len;
14124 +	est_time = get_gpu_estimate(t, gpu_migration_distance(tsk_rt(t)->last_gpu, gpu));
14125 +	info->estimated_len += est_time;
14126 +
14127 +	TRACE_CUR("fq %d: q_len (%llu) + est_cs (%llu) = %llu\n",
14128 +			  ikglp_get_idx(sem, info->q),
14129 +			  est_len_before, est_time,
14130 +			  info->estimated_len);
14131 +
14132 +	//	if(aff->shortest_queue == info) {
14133 +	//		// we may no longer be the shortest
14134 +	//		aff->shortest_queue = ikglp_aff_find_shortest(aff);
14135 +	//
14136 +	//		TRACE_CUR("shortest queue is fq %d (with %d in queue) has est len %llu\n",
14137 +	//				  ikglp_get_idx(sem, aff->shortest_queue->q),
14138 +	//				  aff->shortest_queue->q->count,
14139 +	//				  aff->shortest_queue->estimated_len);
14140 +	//	}
14141 +}
14142 +
14143 +void gpu_ikglp_notify_dequeue(struct ikglp_affinity* aff, struct fifo_queue* fq, struct task_struct* t)
14144 +{
14145 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14146 +	int replica = ikglp_get_idx(sem, fq);
14147 +	int gpu = replica_to_gpu(aff, replica);
14148 +	struct ikglp_queue_info *info = &aff->q_info[replica];
14149 +	lt_t est_time = get_gpu_estimate(t, gpu_migration_distance(tsk_rt(t)->last_gpu, gpu));
14150 +
14151 +	if(est_time > info->estimated_len) {
14152 +		WARN_ON(1);
14153 +		info->estimated_len = 0;
14154 +	}
14155 +	else {
14156 +		info->estimated_len -= est_time;
14157 +	}
14158 +
14159 +	TRACE_CUR("fq %d est len is now %llu\n",
14160 +			  ikglp_get_idx(sem, info->q),
14161 +			  info->estimated_len);
14162 +
14163 +	// check to see if we're the shortest queue now.
14164 +	//	if((aff->shortest_queue != info) &&
14165 +	//	   (aff->shortest_queue->estimated_len > info->estimated_len)) {
14166 +	//
14167 +	//		aff->shortest_queue = info;
14168 +	//
14169 +	//		TRACE_CUR("shortest queue is fq %d (with %d in queue) has est len %llu\n",
14170 +	//				  ikglp_get_idx(sem, info->q),
14171 +	//				  info->q->count,
14172 +	//				  info->estimated_len);
14173 +	//	}
14174 +}
14175 +
14176 +void gpu_ikglp_notify_acquired(struct ikglp_affinity* aff,
14177 +							   struct fifo_queue* fq,
14178 +							   struct task_struct* t)
14179 +{
14180 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14181 +	int replica = ikglp_get_idx(sem, fq);
14182 +	int gpu = replica_to_gpu(aff, replica);
14183 +
14184 +	tsk_rt(t)->gpu_migration = gpu_migration_distance(tsk_rt(t)->last_gpu, gpu);  // record the type of migration
14185 +
14186 +	TRACE_CUR("%s/%d acquired gpu %d (prev = %d).  migration type = %d\n",
14187 +			  t->comm, t->pid, gpu, tsk_rt(t)->last_gpu, tsk_rt(t)->gpu_migration);
14188 +
14189 +	// count the number or resource holders
14190 +	++(*(aff->q_info[replica].nr_cur_users));
14191 +
14192 +	reg_nv_device(gpu, 1, t);  // register
14193 +
14194 +	tsk_rt(t)->suspend_gpu_tracker_on_block = 0;
14195 +	reset_gpu_tracker(t);
14196 +	start_gpu_tracker(t);
14197 +}
14198 +
14199 +void gpu_ikglp_notify_freed(struct ikglp_affinity* aff,
14200 +							struct fifo_queue* fq,
14201 +							struct task_struct* t)
14202 +{
14203 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14204 +	int replica = ikglp_get_idx(sem, fq);
14205 +	int gpu = replica_to_gpu(aff, replica);
14206 +	lt_t est_time;
14207 +
14208 +	stop_gpu_tracker(t);  // stop the tracker before we do anything else.
14209 +
14210 +	est_time = get_gpu_estimate(t, gpu_migration_distance(tsk_rt(t)->last_gpu, gpu));
14211 +
14212 +	// count the number or resource holders
14213 +	--(*(aff->q_info[replica].nr_cur_users));
14214 +
14215 +	reg_nv_device(gpu, 0, t);	// unregister
14216 +
14217 +	// update estimates
14218 +	update_gpu_estimate(t, get_gpu_time(t));
14219 +
14220 +	TRACE_CUR("%s/%d freed gpu %d (prev = %d).  mig type = %d.  actual time was %llu.  "
14221 +			  "estimated was %llu.  diff is %d\n",
14222 +			  t->comm, t->pid, gpu, tsk_rt(t)->last_gpu,
14223 +			  tsk_rt(t)->gpu_migration,
14224 +			  get_gpu_time(t),
14225 +			  est_time,
14226 +			  (long long)get_gpu_time(t) - (long long)est_time);
14227 +
14228 +	tsk_rt(t)->last_gpu = gpu;
14229 +}
14230 +
14231 +struct ikglp_affinity_ops gpu_ikglp_affinity =
14232 +{
14233 +	.advise_enqueue = gpu_ikglp_advise_enqueue,
14234 +	.advise_steal = gpu_ikglp_advise_steal,
14235 +	.advise_donee_selection = gpu_ikglp_advise_donee_selection,
14236 +	.advise_donor_to_fq = gpu_ikglp_advise_donor_to_fq,
14237 +
14238 +	.notify_enqueue = gpu_ikglp_notify_enqueue,
14239 +	.notify_dequeue = gpu_ikglp_notify_dequeue,
14240 +	.notify_acquired = gpu_ikglp_notify_acquired,
14241 +	.notify_freed = gpu_ikglp_notify_freed,
14242 +
14243 +	.replica_to_resource = gpu_replica_to_resource,
14244 +};
14245 +
14246 +struct affinity_observer* ikglp_gpu_aff_obs_new(struct affinity_observer_ops* ops,
14247 +												void* __user args)
14248 +{
14249 +	return ikglp_aff_obs_new(ops, &gpu_ikglp_affinity, args);
14250 +}
14251 +
14252 +
14253 +
14254 +
14255 +
14256 +
14257 +
14258 +
14259 +// Simple ikglp Affinity (standard ikglp with auto-gpu registration)
14260 +
14261 +struct fifo_queue* simple_gpu_ikglp_advise_enqueue(struct ikglp_affinity* aff, struct task_struct* t)
14262 +{
14263 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14264 +	int min_count;
14265 +	int min_nr_users;
14266 +	struct ikglp_queue_info *shortest;
14267 +	struct fifo_queue *to_enqueue;
14268 +	int i;
14269 +
14270 +	//	TRACE_CUR("Simple GPU ikglp advise_enqueue invoked\n");
14271 +
14272 +	shortest = &aff->q_info[0];
14273 +	min_count = shortest->q->count;
14274 +	min_nr_users = *(shortest->nr_cur_users);
14275 +
14276 +	TRACE_CUR("queue %d: waiters = %d, total holders = %d\n",
14277 +			  ikglp_get_idx(sem, shortest->q),
14278 +			  shortest->q->count,
14279 +			  min_nr_users);
14280 +
14281 +	for(i = 1; i < sem->nr_replicas; ++i) {
14282 +		int len = aff->q_info[i].q->count;
14283 +
14284 +		// queue is smaller, or they're equal and the other has a smaller number
14285 +		// of total users.
14286 +		//
14287 +		// tie-break on the shortest number of simult users.  this only kicks in
14288 +		// when there are more than 1 empty queues.
14289 +		if((len < min_count) ||
14290 +		   ((len == min_count) && (*(aff->q_info[i].nr_cur_users) < min_nr_users))) {
14291 +			shortest = &aff->q_info[i];
14292 +			min_count = shortest->q->count;
14293 +			min_nr_users = *(aff->q_info[i].nr_cur_users);
14294 +		}
14295 +
14296 +		TRACE_CUR("queue %d: waiters = %d, total holders = %d\n",
14297 +				  ikglp_get_idx(sem, aff->q_info[i].q),
14298 +				  aff->q_info[i].q->count,
14299 +				  *(aff->q_info[i].nr_cur_users));
14300 +	}
14301 +
14302 +	to_enqueue = shortest->q;
14303 +	TRACE_CUR("enqueue on fq %d (non-aff wanted fq %d)\n",
14304 +			  ikglp_get_idx(sem, to_enqueue),
14305 +			  ikglp_get_idx(sem, sem->shortest_fifo_queue));
14306 +
14307 +	return to_enqueue;
14308 +}
14309 +
14310 +ikglp_wait_state_t* simple_gpu_ikglp_advise_steal(struct ikglp_affinity* aff,
14311 +												  struct fifo_queue* dst)
14312 +{
14313 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14314 +	//	TRACE_CUR("Simple GPU ikglp advise_steal invoked\n");
14315 +	return ikglp_find_hp_waiter_to_steal(sem);
14316 +}
14317 +
14318 +ikglp_donee_heap_node_t* simple_gpu_ikglp_advise_donee_selection(struct ikglp_affinity* aff, struct task_struct* donor)
14319 +{
14320 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14321 +	ikglp_donee_heap_node_t *donee = binheap_top_entry(&sem->donees, ikglp_donee_heap_node_t, node);
14322 +	return(donee);
14323 +}
14324 +
14325 +ikglp_wait_state_t* simple_gpu_ikglp_advise_donor_to_fq(struct ikglp_affinity* aff, struct fifo_queue* fq)
14326 +{
14327 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14328 +	ikglp_wait_state_t* donor = binheap_top_entry(&sem->donors, ikglp_wait_state_t, node);
14329 +	return(donor);
14330 +}
14331 +
14332 +void simple_gpu_ikglp_notify_enqueue(struct ikglp_affinity* aff, struct fifo_queue* fq, struct task_struct* t)
14333 +{
14334 +	//	TRACE_CUR("Simple GPU ikglp notify_enqueue invoked\n");
14335 +}
14336 +
14337 +void simple_gpu_ikglp_notify_dequeue(struct ikglp_affinity* aff, struct fifo_queue* fq, struct task_struct* t)
14338 +{
14339 +	//	TRACE_CUR("Simple GPU ikglp notify_dequeue invoked\n");
14340 +}
14341 +
14342 +void simple_gpu_ikglp_notify_acquired(struct ikglp_affinity* aff, struct fifo_queue* fq, struct task_struct* t)
14343 +{
14344 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14345 +	int replica = ikglp_get_idx(sem, fq);
14346 +	int gpu = replica_to_gpu(aff, replica);
14347 +
14348 +	//	TRACE_CUR("Simple GPU ikglp notify_acquired invoked\n");
14349 +
14350 +	// count the number or resource holders
14351 +	++(*(aff->q_info[replica].nr_cur_users));
14352 +
14353 +	reg_nv_device(gpu, 1, t);  // register
14354 +}
14355 +
14356 +void simple_gpu_ikglp_notify_freed(struct ikglp_affinity* aff, struct fifo_queue* fq, struct task_struct* t)
14357 +{
14358 +	struct ikglp_semaphore *sem = ikglp_from_lock(aff->obs.lock);
14359 +	int replica = ikglp_get_idx(sem, fq);
14360 +	int gpu = replica_to_gpu(aff, replica);
14361 +
14362 +	//	TRACE_CUR("Simple GPU ikglp notify_freed invoked\n");
14363 +	// count the number or resource holders
14364 +	--(*(aff->q_info[replica].nr_cur_users));
14365 +
14366 +	reg_nv_device(gpu, 0, t);	// unregister
14367 +}
14368 +
14369 +struct ikglp_affinity_ops simple_gpu_ikglp_affinity =
14370 +{
14371 +	.advise_enqueue = simple_gpu_ikglp_advise_enqueue,
14372 +	.advise_steal = simple_gpu_ikglp_advise_steal,
14373 +	.advise_donee_selection = simple_gpu_ikglp_advise_donee_selection,
14374 +	.advise_donor_to_fq = simple_gpu_ikglp_advise_donor_to_fq,
14375 +
14376 +	.notify_enqueue = simple_gpu_ikglp_notify_enqueue,
14377 +	.notify_dequeue = simple_gpu_ikglp_notify_dequeue,
14378 +	.notify_acquired = simple_gpu_ikglp_notify_acquired,
14379 +	.notify_freed = simple_gpu_ikglp_notify_freed,
14380 +
14381 +	.replica_to_resource = gpu_replica_to_resource,
14382 +};
14383 +
14384 +struct affinity_observer* ikglp_simple_gpu_aff_obs_new(struct affinity_observer_ops* ops,
14385 +													   void* __user args)
14386 +{
14387 +	return ikglp_aff_obs_new(ops, &simple_gpu_ikglp_affinity, args);
14388 +}
14389 +
14390 +#endif
14391 +
14392 +
14393 +
14394 +
14395 +
14396 +
14397 +
14398 +
14399 +
14400 diff --git a/litmus/jobs.c b/litmus/jobs.c
14401 new file mode 100644
14402 index 0000000..1d97462
14403 --- /dev/null
14404 +++ b/litmus/jobs.c
14405 @@ -0,0 +1,56 @@
14406 +/* litmus/jobs.c - common job control code
14407 + */
14408 +
14409 +#include <linux/sched.h>
14410 +
14411 +#include <litmus/litmus.h>
14412 +#include <litmus/jobs.h>
14413 +
14414 +void prepare_for_next_period(struct task_struct *t)
14415 +{
14416 +	BUG_ON(!t);
14417 +	/* prepare next release */
14418 +
14419 +	if(tsk_rt(t)->task_params.cls == RT_CLASS_SOFT_W_SLIP) {
14420 +		/* allow the release point to slip if we've passed our deadline. */
14421 +		lt_t now = litmus_clock();
14422 +		t->rt_param.job_params.release =
14423 +			(t->rt_param.job_params.deadline < now) ?
14424 +				now : t->rt_param.job_params.deadline;
14425 +		t->rt_param.job_params.deadline =
14426 +			t->rt_param.job_params.release + get_rt_period(t);
14427 +	}
14428 +	else {
14429 +		t->rt_param.job_params.release   = t->rt_param.job_params.deadline;
14430 +		t->rt_param.job_params.deadline += get_rt_period(t);
14431 +	}
14432 +
14433 +	t->rt_param.job_params.exec_time = 0;
14434 +	/* update job sequence number */
14435 +	t->rt_param.job_params.job_no++;
14436 +
14437 +	/* don't confuse Linux */
14438 +	t->rt.time_slice = 1;
14439 +}
14440 +
14441 +void release_at(struct task_struct *t, lt_t start)
14442 +{
14443 +	t->rt_param.job_params.deadline = start;
14444 +	prepare_for_next_period(t);
14445 +	set_rt_flags(t, RT_F_RUNNING);
14446 +}
14447 +
14448 +
14449 +/*
14450 + *	Deactivate current task until the beginning of the next period.
14451 + */
14452 +long complete_job(void)
14453 +{
14454 +	/* Mark that we do not excute anymore */
14455 +	set_rt_flags(current, RT_F_SLEEP);
14456 +	/* call schedule, this will return when a new job arrives
14457 +	 * it also takes care of preparing for the next release
14458 +	 */
14459 +	schedule();
14460 +	return 0;
14461 +}
14462 diff --git a/litmus/kexclu_affinity.c b/litmus/kexclu_affinity.c
14463 new file mode 100644
14464 index 0000000..5ef5e54
14465 --- /dev/null
14466 +++ b/litmus/kexclu_affinity.c
14467 @@ -0,0 +1,92 @@
14468 +#include <litmus/fdso.h>
14469 +#include <litmus/sched_plugin.h>
14470 +#include <litmus/trace.h>
14471 +#include <litmus/litmus.h>
14472 +#include <litmus/locking.h>
14473 +
14474 +#include <litmus/kexclu_affinity.h>
14475 +
14476 +static int create_generic_aff_obs(void** obj_ref, obj_type_t type, void* __user arg);
14477 +static int open_generic_aff_obs(struct od_table_entry* entry, void* __user arg);
14478 +static int close_generic_aff_obs(struct od_table_entry* entry);
14479 +static void destroy_generic_aff_obs(obj_type_t type, void* sem);
14480 +
14481 +struct fdso_ops generic_affinity_ops = {
14482 +	.create  = create_generic_aff_obs,
14483 +	.open    = open_generic_aff_obs,
14484 +	.close   = close_generic_aff_obs,
14485 +	.destroy = destroy_generic_aff_obs
14486 +};
14487 +
14488 +static atomic_t aff_obs_id_gen = ATOMIC_INIT(0);
14489 +
14490 +static inline bool is_affinity_observer(struct od_table_entry *entry)
14491 +{
14492 +	return (entry->class == &generic_affinity_ops);
14493 +}
14494 +
14495 +static inline struct affinity_observer* get_affinity_observer(struct od_table_entry* entry)
14496 +{
14497 +	BUG_ON(!is_affinity_observer(entry));
14498 +	return (struct affinity_observer*) entry->obj->obj;
14499 +}
14500 +
14501 +static int create_generic_aff_obs(void** obj_ref, obj_type_t type, void* __user arg)
14502 +{
14503 +	struct affinity_observer* aff_obs;
14504 +	int err;
14505 +
14506 +	err = litmus->allocate_aff_obs(&aff_obs, type, arg);
14507 +	if (err == 0) {
14508 +		BUG_ON(!aff_obs->lock);
14509 +		aff_obs->type = type;
14510 +		*obj_ref = aff_obs;
14511 +    }
14512 +	return err;
14513 +}
14514 +
14515 +static int open_generic_aff_obs(struct od_table_entry* entry, void* __user arg)
14516 +{
14517 +	struct affinity_observer* aff_obs = get_affinity_observer(entry);
14518 +	if (aff_obs->ops->open)
14519 +		return aff_obs->ops->open(aff_obs, arg);
14520 +	else
14521 +		return 0; /* default: any task can open it */
14522 +}
14523 +
14524 +static int close_generic_aff_obs(struct od_table_entry* entry)
14525 +{
14526 +	struct affinity_observer* aff_obs = get_affinity_observer(entry);
14527 +	if (aff_obs->ops->close)
14528 +		return aff_obs->ops->close(aff_obs);
14529 +	else
14530 +		return 0; /* default: closing succeeds */
14531 +}
14532 +
14533 +static void destroy_generic_aff_obs(obj_type_t type, void* obj)
14534 +{
14535 +	struct affinity_observer* aff_obs = (struct affinity_observer*) obj;
14536 +	aff_obs->ops->deallocate(aff_obs);
14537 +}
14538 +
14539 +
14540 +struct litmus_lock* get_lock_from_od(int od)
14541 +{
14542 +	extern struct fdso_ops generic_lock_ops;
14543 +
14544 +	struct od_table_entry *entry = get_entry_for_od(od);
14545 +
14546 +	if(entry && entry->class == &generic_lock_ops) {
14547 +		return (struct litmus_lock*) entry->obj->obj;
14548 +	}
14549 +	return NULL;
14550 +}
14551 +
14552 +void affinity_observer_new(struct affinity_observer* aff,
14553 +						   struct affinity_observer_ops* ops,
14554 +						   struct affinity_observer_args* args)
14555 +{
14556 +	aff->ops = ops;
14557 +	aff->lock = get_lock_from_od(args->lock_od);
14558 +	aff->ident = atomic_inc_return(&aff_obs_id_gen);
14559 +}
14560 \ No newline at end of file
14561 diff --git a/litmus/kfmlp_lock.c b/litmus/kfmlp_lock.c
14562 new file mode 100644
14563 index 0000000..bff857e
14564 --- /dev/null
14565 +++ b/litmus/kfmlp_lock.c
14566 @@ -0,0 +1,1002 @@
14567 +#include <linux/slab.h>
14568 +#include <linux/uaccess.h>
14569 +
14570 +#include <litmus/trace.h>
14571 +#include <litmus/sched_plugin.h>
14572 +#include <litmus/fdso.h>
14573 +
14574 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
14575 +#include <litmus/gpu_affinity.h>
14576 +#include <litmus/nvidia_info.h>
14577 +#endif
14578 +
14579 +#include <litmus/kfmlp_lock.h>
14580 +
14581 +static inline int kfmlp_get_idx(struct kfmlp_semaphore* sem,
14582 +								struct kfmlp_queue* queue)
14583 +{
14584 +	return (queue - &sem->queues[0]);
14585 +}
14586 +
14587 +static inline struct kfmlp_queue* kfmlp_get_queue(struct kfmlp_semaphore* sem,
14588 +												  struct task_struct* holder)
14589 +{
14590 +	int i;
14591 +	for(i = 0; i < sem->num_resources; ++i)
14592 +		if(sem->queues[i].owner == holder)
14593 +			return(&sem->queues[i]);
14594 +	return(NULL);
14595 +}
14596 +
14597 +/* caller is responsible for locking */
14598 +static struct task_struct* kfmlp_find_hp_waiter(struct kfmlp_queue *kqueue,
14599 +												struct task_struct *skip)
14600 +{
14601 +	struct list_head	*pos;
14602 +	struct task_struct 	*queued, *found = NULL;
14603 +
14604 +	list_for_each(pos, &kqueue->wait.task_list) {
14605 +		queued  = (struct task_struct*) list_entry(pos, wait_queue_t,
14606 +												   task_list)->private;
14607 +
14608 +		/* Compare task prios, find high prio task. */
14609 +		//if (queued != skip && edf_higher_prio(queued, found))
14610 +		if (queued != skip && litmus->compare(queued, found))
14611 +			found = queued;
14612 +	}
14613 +	return found;
14614 +}
14615 +
14616 +static inline struct kfmlp_queue* kfmlp_find_shortest(struct kfmlp_semaphore* sem,
14617 +													  struct kfmlp_queue* search_start)
14618 +{
14619 +	// we start our search at search_start instead of at the beginning of the
14620 +	// queue list to load-balance across all resources.
14621 +	struct kfmlp_queue* step = search_start;
14622 +	struct kfmlp_queue* shortest = sem->shortest_queue;
14623 +
14624 +	do
14625 +	{
14626 +		step = (step+1 != &sem->queues[sem->num_resources]) ?
14627 +		step+1 : &sem->queues[0];
14628 +
14629 +		if(step->count < shortest->count)
14630 +		{
14631 +			shortest = step;
14632 +			if(step->count == 0)
14633 +				break; /* can't get any shorter */
14634 +		}
14635 +
14636 +	}while(step != search_start);
14637 +
14638 +	return(shortest);
14639 +}
14640 +
14641 +
14642 +static struct task_struct* kfmlp_select_hp_steal(struct kfmlp_semaphore* sem,
14643 +												 wait_queue_t** to_steal,
14644 +												 struct kfmlp_queue** to_steal_from)
14645 +{
14646 +	/* must hold sem->lock */
14647 +
14648 +	int i;
14649 +
14650 +	*to_steal = NULL;
14651 +	*to_steal_from = NULL;
14652 +
14653 +	for(i = 0; i < sem->num_resources; ++i)
14654 +	{
14655 +		if( (sem->queues[i].count > 1) &&
14656 +		   ((*to_steal_from == NULL) ||
14657 +			//(edf_higher_prio(sem->queues[i].hp_waiter, my_queue->hp_waiter))) )
14658 +			(litmus->compare(sem->queues[i].hp_waiter, (*to_steal_from)->hp_waiter))) )
14659 +		{
14660 +			*to_steal_from = &sem->queues[i];
14661 +		}
14662 +	}
14663 +
14664 +	if(*to_steal_from)
14665 +	{
14666 +		struct list_head *pos;
14667 +		struct task_struct *target = (*to_steal_from)->hp_waiter;
14668 +
14669 +		TRACE_CUR("want to steal hp_waiter (%s/%d) from queue %d\n",
14670 +				  target->comm,
14671 +				  target->pid,
14672 +				  kfmlp_get_idx(sem, *to_steal_from));
14673 +
14674 +		list_for_each(pos, &(*to_steal_from)->wait.task_list)
14675 +		{
14676 +			wait_queue_t *node = list_entry(pos, wait_queue_t, task_list);
14677 +			struct task_struct *queued = (struct task_struct*) node->private;
14678 +			/* Compare task prios, find high prio task. */
14679 +			if (queued == target)
14680 +			{
14681 +				*to_steal = node;
14682 +
14683 +				TRACE_CUR("steal: selected %s/%d from queue %d\n",
14684 +						  queued->comm, queued->pid,
14685 +						  kfmlp_get_idx(sem, *to_steal_from));
14686 +
14687 +				return queued;
14688 +			}
14689 +		}
14690 +
14691 +		TRACE_CUR("Could not find %s/%d in queue %d!!!  THIS IS A BUG!\n",
14692 +				  target->comm,
14693 +				  target->pid,
14694 +				  kfmlp_get_idx(sem, *to_steal_from));
14695 +	}
14696 +
14697 +	return NULL;
14698 +}
14699 +
14700 +static void kfmlp_steal_node(struct kfmlp_semaphore *sem,
14701 +							 struct kfmlp_queue *dst,
14702 +							 wait_queue_t *wait,
14703 +							 struct kfmlp_queue *src)
14704 +{
14705 +	struct task_struct* t = (struct task_struct*) wait->private;
14706 +
14707 +	__remove_wait_queue(&src->wait, wait);
14708 +	--(src->count);
14709 +
14710 +	if(t == src->hp_waiter) {
14711 +		src->hp_waiter = kfmlp_find_hp_waiter(src, NULL);
14712 +
14713 +		TRACE_CUR("queue %d: %s/%d is new hp_waiter\n",
14714 +				  kfmlp_get_idx(sem, src),
14715 +				  (src->hp_waiter) ? src->hp_waiter->comm : "nil",
14716 +				  (src->hp_waiter) ? src->hp_waiter->pid : -1);
14717 +
14718 +		if(src->owner && tsk_rt(src->owner)->inh_task == t) {
14719 +			litmus->decrease_prio(src->owner, src->hp_waiter);
14720 +		}
14721 +	}
14722 +
14723 +	if(sem->shortest_queue->count > src->count) {
14724 +		sem->shortest_queue = src;
14725 +		TRACE_CUR("queue %d is the shortest\n", kfmlp_get_idx(sem, sem->shortest_queue));
14726 +	}
14727 +
14728 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
14729 +	if(sem->aff_obs) {
14730 +		sem->aff_obs->ops->notify_dequeue(sem->aff_obs, src, t);
14731 +	}
14732 +#endif
14733 +
14734 +	init_waitqueue_entry(wait, t);
14735 +	__add_wait_queue_tail_exclusive(&dst->wait, wait);
14736 +	++(dst->count);
14737 +
14738 +	if(litmus->compare(t, dst->hp_waiter)) {
14739 +		dst->hp_waiter = t;
14740 +
14741 +		TRACE_CUR("queue %d: %s/%d is new hp_waiter\n",
14742 +				  kfmlp_get_idx(sem, dst),
14743 +				  t->comm, t->pid);
14744 +
14745 +		if(dst->owner && litmus->compare(t, dst->owner))
14746 +		{
14747 +			litmus->increase_prio(dst->owner, t);
14748 +		}
14749 +	}
14750 +
14751 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
14752 +	if(sem->aff_obs) {
14753 +		sem->aff_obs->ops->notify_enqueue(sem->aff_obs, dst, t);
14754 +	}
14755 +#endif
14756 +}
14757 +
14758 +
14759 +int kfmlp_lock(struct litmus_lock* l)
14760 +{
14761 +	struct task_struct* t = current;
14762 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(l);
14763 +	struct kfmlp_queue* my_queue = NULL;
14764 +	wait_queue_t wait;
14765 +	unsigned long flags;
14766 +
14767 +	if (!is_realtime(t))
14768 +		return -EPERM;
14769 +
14770 +	spin_lock_irqsave(&sem->lock, flags);
14771 +
14772 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
14773 +	if(sem->aff_obs) {
14774 +		my_queue = sem->aff_obs->ops->advise_enqueue(sem->aff_obs, t);
14775 +	}
14776 +	if(!my_queue) {
14777 +		my_queue = sem->shortest_queue;
14778 +	}
14779 +#else
14780 +	my_queue = sem->shortest_queue;
14781 +#endif
14782 +
14783 +	if (my_queue->owner) {
14784 +		/* resource is not free => must suspend and wait */
14785 +		TRACE_CUR("queue %d: Resource is not free => must suspend and wait. (queue size = %d)\n",
14786 +				  kfmlp_get_idx(sem, my_queue),
14787 +				  my_queue->count);
14788 +
14789 +		init_waitqueue_entry(&wait, t);
14790 +
14791 +		/* FIXME: interruptible would be nice some day */
14792 +		set_task_state(t, TASK_UNINTERRUPTIBLE);
14793 +
14794 +		__add_wait_queue_tail_exclusive(&my_queue->wait, &wait);
14795 +
14796 +		TRACE_CUR("queue %d: hp_waiter is currently %s/%d\n",
14797 +				  kfmlp_get_idx(sem, my_queue),
14798 +				  (my_queue->hp_waiter) ? my_queue->hp_waiter->comm : "nil",
14799 +				  (my_queue->hp_waiter) ? my_queue->hp_waiter->pid : -1);
14800 +
14801 +		/* check if we need to activate priority inheritance */
14802 +		//if (edf_higher_prio(t, my_queue->hp_waiter))
14803 +		if (litmus->compare(t, my_queue->hp_waiter)) {
14804 +			my_queue->hp_waiter = t;
14805 +			TRACE_CUR("queue %d: %s/%d is new hp_waiter\n",
14806 +					  kfmlp_get_idx(sem, my_queue),
14807 +					  t->comm, t->pid);
14808 +
14809 +			//if (edf_higher_prio(t, my_queue->owner))
14810 +			if (litmus->compare(t, my_queue->owner)) {
14811 +				litmus->increase_prio(my_queue->owner, my_queue->hp_waiter);
14812 +			}
14813 +		}
14814 +
14815 +		++(my_queue->count);
14816 +
14817 +		if(my_queue == sem->shortest_queue) {
14818 +			sem->shortest_queue = kfmlp_find_shortest(sem, my_queue);
14819 +			TRACE_CUR("queue %d is the shortest\n",
14820 +					  kfmlp_get_idx(sem, sem->shortest_queue));
14821 +		}
14822 +
14823 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
14824 +		if(sem->aff_obs) {
14825 +			sem->aff_obs->ops->notify_enqueue(sem->aff_obs, my_queue, t);
14826 +		}
14827 +#endif
14828 +
14829 +		/* release lock before sleeping */
14830 +		spin_unlock_irqrestore(&sem->lock, flags);
14831 +
14832 +		/* We depend on the FIFO order.  Thus, we don't need to recheck
14833 +		 * when we wake up; we are guaranteed to have the lock since
14834 +		 * there is only one wake up per release (or steal).
14835 +		 */
14836 +		schedule();
14837 +
14838 +
14839 +		if(my_queue->owner == t) {
14840 +			TRACE_CUR("queue %d: acquired through waiting\n",
14841 +					  kfmlp_get_idx(sem, my_queue));
14842 +		}
14843 +		else {
14844 +			/* this case may happen if our wait entry was stolen
14845 +			 between queues. record where we went. */
14846 +			my_queue = kfmlp_get_queue(sem, t);
14847 +
14848 +			BUG_ON(!my_queue);
14849 +			TRACE_CUR("queue %d: acquired through stealing\n",
14850 +					  kfmlp_get_idx(sem, my_queue));
14851 +		}
14852 +	}
14853 +	else {
14854 +		TRACE_CUR("queue %d: acquired immediately\n",
14855 +				  kfmlp_get_idx(sem, my_queue));
14856 +
14857 +		my_queue->owner = t;
14858 +
14859 +		++(my_queue->count);
14860 +
14861 +		if(my_queue == sem->shortest_queue) {
14862 +			sem->shortest_queue = kfmlp_find_shortest(sem, my_queue);
14863 +			TRACE_CUR("queue %d is the shortest\n",
14864 +					  kfmlp_get_idx(sem, sem->shortest_queue));
14865 +		}
14866 +
14867 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
14868 +		if(sem->aff_obs) {
14869 +			sem->aff_obs->ops->notify_enqueue(sem->aff_obs, my_queue, t);
14870 +			sem->aff_obs->ops->notify_acquired(sem->aff_obs, my_queue, t);
14871 +		}
14872 +#endif
14873 +
14874 +		spin_unlock_irqrestore(&sem->lock, flags);
14875 +	}
14876 +
14877 +
14878 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
14879 +	if(sem->aff_obs) {
14880 +		return sem->aff_obs->ops->replica_to_resource(sem->aff_obs, my_queue);
14881 +	}
14882 +#endif
14883 +	return kfmlp_get_idx(sem, my_queue);
14884 +}
14885 +
14886 +
14887 +int kfmlp_unlock(struct litmus_lock* l)
14888 +{
14889 +	struct task_struct *t = current, *next;
14890 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(l);
14891 +	struct kfmlp_queue *my_queue, *to_steal_from;
14892 +	unsigned long flags;
14893 +	int err = 0;
14894 +
14895 +	my_queue = kfmlp_get_queue(sem, t);
14896 +
14897 +	if (!my_queue) {
14898 +		err = -EINVAL;
14899 +		goto out;
14900 +	}
14901 +
14902 +	spin_lock_irqsave(&sem->lock, flags);
14903 +
14904 +	TRACE_CUR("queue %d: unlocking\n", kfmlp_get_idx(sem, my_queue));
14905 +
14906 +	my_queue->owner = NULL;  // clear ownership
14907 +	--(my_queue->count);
14908 +
14909 +	if(my_queue->count < sem->shortest_queue->count)
14910 +	{
14911 +		sem->shortest_queue = my_queue;
14912 +		TRACE_CUR("queue %d is the shortest\n",
14913 +				  kfmlp_get_idx(sem, sem->shortest_queue));
14914 +	}
14915 +
14916 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
14917 +	if(sem->aff_obs) {
14918 +		sem->aff_obs->ops->notify_dequeue(sem->aff_obs, my_queue, t);
14919 +		sem->aff_obs->ops->notify_freed(sem->aff_obs, my_queue, t);
14920 +	}
14921 +#endif
14922 +
14923 +	/* we lose the benefit of priority inheritance (if any) */
14924 +	if (tsk_rt(t)->inh_task)
14925 +		litmus->decrease_prio(t, NULL);
14926 +
14927 +
14928 +	/* check if there are jobs waiting for this resource */
14929 +RETRY:
14930 +	next = __waitqueue_remove_first(&my_queue->wait);
14931 +	if (next) {
14932 +		/* next becomes the resouce holder */
14933 +		my_queue->owner = next;
14934 +
14935 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
14936 +		if(sem->aff_obs) {
14937 +			sem->aff_obs->ops->notify_acquired(sem->aff_obs, my_queue, next);
14938 +		}
14939 +#endif
14940 +
14941 +		TRACE_CUR("queue %d: lock ownership passed to %s/%d\n",
14942 +				  kfmlp_get_idx(sem, my_queue), next->comm, next->pid);
14943 +
14944 +		/* determine new hp_waiter if necessary */
14945 +		if (next == my_queue->hp_waiter) {
14946 +			TRACE_TASK(next, "was highest-prio waiter\n");
14947 +			my_queue->hp_waiter = kfmlp_find_hp_waiter(my_queue, next);
14948 +			if (my_queue->hp_waiter)
14949 +				TRACE_TASK(my_queue->hp_waiter, "queue %d: is new highest-prio waiter\n", kfmlp_get_idx(sem, my_queue));
14950 +			else
14951 +				TRACE("queue %d: no further waiters\n", kfmlp_get_idx(sem, my_queue));
14952 +		} else {
14953 +			/* Well, if next is not the highest-priority waiter,
14954 +			 * then it ought to inherit the highest-priority
14955 +			 * waiter's priority. */
14956 +			litmus->increase_prio(next, my_queue->hp_waiter);
14957 +		}
14958 +
14959 +		/* wake up next */
14960 +		wake_up_process(next);
14961 +	}
14962 +	else {
14963 +		// TODO: put this stealing logic before we attempt to release
14964 +		// our resource.  (simplifies code and gets rid of ugly goto RETRY.
14965 +		wait_queue_t *wait;
14966 +
14967 +		TRACE_CUR("queue %d: looking to steal someone...\n",
14968 +				  kfmlp_get_idx(sem, my_queue));
14969 +
14970 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
14971 +		next = (sem->aff_obs) ?
14972 +			sem->aff_obs->ops->advise_steal(sem->aff_obs, &wait, &to_steal_from) :
14973 +			kfmlp_select_hp_steal(sem, &wait, &to_steal_from);
14974 +#else
14975 +		next = kfmlp_select_hp_steal(sem, &wait, &to_steal_from);
14976 +#endif
14977 +
14978 +		if(next) {
14979 +			TRACE_CUR("queue %d: stealing %s/%d from queue %d\n",
14980 +					  kfmlp_get_idx(sem, my_queue),
14981 +					  next->comm, next->pid,
14982 +					  kfmlp_get_idx(sem, to_steal_from));
14983 +
14984 +			kfmlp_steal_node(sem, my_queue, wait, to_steal_from);
14985 +
14986 +			goto RETRY;  // will succeed this time.
14987 +		}
14988 +		else {
14989 +			TRACE_CUR("queue %d: no one to steal.\n",
14990 +					  kfmlp_get_idx(sem, my_queue));
14991 +		}
14992 +	}
14993 +
14994 +	spin_unlock_irqrestore(&sem->lock, flags);
14995 +
14996 +out:
14997 +	return err;
14998 +}
14999 +
15000 +int kfmlp_close(struct litmus_lock* l)
15001 +{
15002 +	struct task_struct *t = current;
15003 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(l);
15004 +	struct kfmlp_queue *my_queue;
15005 +	unsigned long flags;
15006 +
15007 +	int owner;
15008 +
15009 +	spin_lock_irqsave(&sem->lock, flags);
15010 +
15011 +	my_queue = kfmlp_get_queue(sem, t);
15012 +	owner = (my_queue) ? (my_queue->owner == t) : 0;
15013 +
15014 +	spin_unlock_irqrestore(&sem->lock, flags);
15015 +
15016 +	if (owner)
15017 +		kfmlp_unlock(l);
15018 +
15019 +	return 0;
15020 +}
15021 +
15022 +void kfmlp_free(struct litmus_lock* l)
15023 +{
15024 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(l);
15025 +	kfree(sem->queues);
15026 +	kfree(sem);
15027 +}
15028 +
15029 +
15030 +
15031 +struct litmus_lock* kfmlp_new(struct litmus_lock_ops* ops, void* __user args)
15032 +{
15033 +	struct kfmlp_semaphore* sem;
15034 +	int num_resources = 0;
15035 +	int i;
15036 +
15037 +	if(!access_ok(VERIFY_READ, args, sizeof(num_resources)))
15038 +	{
15039 +		return(NULL);
15040 +	}
15041 +	if(__copy_from_user(&num_resources, args, sizeof(num_resources)))
15042 +	{
15043 +		return(NULL);
15044 +	}
15045 +	if(num_resources < 1)
15046 +	{
15047 +		return(NULL);
15048 +	}
15049 +
15050 +	sem = kmalloc(sizeof(*sem), GFP_KERNEL);
15051 +	if(!sem)
15052 +	{
15053 +		return(NULL);
15054 +	}
15055 +
15056 +	sem->queues = kmalloc(sizeof(struct kfmlp_queue)*num_resources, GFP_KERNEL);
15057 +	if(!sem->queues)
15058 +	{
15059 +		kfree(sem);
15060 +		return(NULL);
15061 +	}
15062 +
15063 +	sem->litmus_lock.ops = ops;
15064 +	spin_lock_init(&sem->lock);
15065 +	sem->num_resources = num_resources;
15066 +
15067 +	for(i = 0; i < num_resources; ++i)
15068 +	{
15069 +		sem->queues[i].owner = NULL;
15070 +		sem->queues[i].hp_waiter = NULL;
15071 +		init_waitqueue_head(&sem->queues[i].wait);
15072 +		sem->queues[i].count = 0;
15073 +	}
15074 +
15075 +	sem->shortest_queue = &sem->queues[0];
15076 +
15077 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
15078 +	sem->aff_obs = NULL;
15079 +#endif
15080 +
15081 +	return &sem->litmus_lock;
15082 +}
15083 +
15084 +
15085 +
15086 +
15087 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
15088 +
15089 +static inline int __replica_to_gpu(struct kfmlp_affinity* aff, int replica)
15090 +{
15091 +	int gpu = replica % aff->nr_rsrc;
15092 +	return gpu;
15093 +}
15094 +
15095 +static inline int replica_to_gpu(struct kfmlp_affinity* aff, int replica)
15096 +{
15097 +	int gpu = __replica_to_gpu(aff, replica) + aff->offset;
15098 +	return gpu;
15099 +}
15100 +
15101 +static inline int gpu_to_base_replica(struct kfmlp_affinity* aff, int gpu)
15102 +{
15103 +	int replica = gpu - aff->offset;
15104 +	return replica;
15105 +}
15106 +
15107 +
15108 +int kfmlp_aff_obs_close(struct affinity_observer* obs)
15109 +{
15110 +	return 0;
15111 +}
15112 +
15113 +void kfmlp_aff_obs_free(struct affinity_observer* obs)
15114 +{
15115 +	struct kfmlp_affinity *kfmlp_aff = kfmlp_aff_obs_from_aff_obs(obs);
15116 +	kfree(kfmlp_aff->nr_cur_users_on_rsrc);
15117 +	kfree(kfmlp_aff->q_info);
15118 +	kfree(kfmlp_aff);
15119 +}
15120 +
15121 +static struct affinity_observer* kfmlp_aff_obs_new(struct affinity_observer_ops* ops,
15122 +												   struct kfmlp_affinity_ops* kfmlp_ops,
15123 +												   void* __user args)
15124 +{
15125 +	struct kfmlp_affinity* kfmlp_aff;
15126 +	struct gpu_affinity_observer_args aff_args;
15127 +	struct kfmlp_semaphore* sem;
15128 +	int i;
15129 +	unsigned long flags;
15130 +
15131 +	if(!access_ok(VERIFY_READ, args, sizeof(aff_args))) {
15132 +		return(NULL);
15133 +	}
15134 +	if(__copy_from_user(&aff_args, args, sizeof(aff_args))) {
15135 +		return(NULL);
15136 +	}
15137 +
15138 +	sem = (struct kfmlp_semaphore*) get_lock_from_od(aff_args.obs.lock_od);
15139 +
15140 +	if(sem->litmus_lock.type != KFMLP_SEM) {
15141 +		TRACE_CUR("Lock type not supported.  Type = %d\n", sem->litmus_lock.type);
15142 +		return(NULL);
15143 +	}
15144 +
15145 +	if((aff_args.nr_simult_users <= 0) ||
15146 +	   (sem->num_resources%aff_args.nr_simult_users != 0)) {
15147 +		TRACE_CUR("Lock %d does not support #replicas (%d) for #simult_users "
15148 +				  "(%d) per replica.  #replicas should be evenly divisible "
15149 +				  "by #simult_users.\n",
15150 +				  sem->litmus_lock.ident,
15151 +				  sem->num_resources,
15152 +				  aff_args.nr_simult_users);
15153 +		return(NULL);
15154 +	}
15155 +
15156 +	if(aff_args.nr_simult_users > NV_MAX_SIMULT_USERS) {
15157 +		TRACE_CUR("System does not support #simult_users > %d. %d requested.\n",
15158 +				  NV_MAX_SIMULT_USERS, aff_args.nr_simult_users);
15159 +//		return(NULL);
15160 +	}
15161 +
15162 +	kfmlp_aff = kmalloc(sizeof(*kfmlp_aff), GFP_KERNEL);
15163 +	if(!kfmlp_aff) {
15164 +		return(NULL);
15165 +	}
15166 +
15167 +	kfmlp_aff->q_info = kmalloc(sizeof(struct kfmlp_queue_info)*sem->num_resources, GFP_KERNEL);
15168 +	if(!kfmlp_aff->q_info) {
15169 +		kfree(kfmlp_aff);
15170 +		return(NULL);
15171 +	}
15172 +
15173 +	kfmlp_aff->nr_cur_users_on_rsrc = kmalloc(sizeof(int)*(sem->num_resources / aff_args.nr_simult_users), GFP_KERNEL);
15174 +	if(!kfmlp_aff->nr_cur_users_on_rsrc) {
15175 +		kfree(kfmlp_aff->q_info);
15176 +		kfree(kfmlp_aff);
15177 +		return(NULL);
15178 +	}
15179 +
15180 +	affinity_observer_new(&kfmlp_aff->obs, ops, &aff_args.obs);
15181 +
15182 +	kfmlp_aff->ops = kfmlp_ops;
15183 +	kfmlp_aff->offset = aff_args.replica_to_gpu_offset;
15184 +	kfmlp_aff->nr_simult = aff_args.nr_simult_users;
15185 +	kfmlp_aff->nr_rsrc = sem->num_resources / kfmlp_aff->nr_simult;
15186 +
15187 +	memset(kfmlp_aff->nr_cur_users_on_rsrc, 0, sizeof(int)*(sem->num_resources / kfmlp_aff->nr_rsrc));
15188 +
15189 +	for(i = 0; i < sem->num_resources; ++i) {
15190 +		kfmlp_aff->q_info[i].q = &sem->queues[i];
15191 +		kfmlp_aff->q_info[i].estimated_len = 0;
15192 +
15193 +		// multiple q_info's will point to the same resource (aka GPU) if
15194 +		// aff_args.nr_simult_users > 1
15195 +		kfmlp_aff->q_info[i].nr_cur_users = &kfmlp_aff->nr_cur_users_on_rsrc[__replica_to_gpu(kfmlp_aff,i)];
15196 +	}
15197 +
15198 +	// attach observer to the lock
15199 +	spin_lock_irqsave(&sem->lock, flags);
15200 +	sem->aff_obs = kfmlp_aff;
15201 +	spin_unlock_irqrestore(&sem->lock, flags);
15202 +
15203 +	return &kfmlp_aff->obs;
15204 +}
15205 +
15206 +
15207 +
15208 +
15209 +static int gpu_replica_to_resource(struct kfmlp_affinity* aff,
15210 +								   struct kfmlp_queue* fq) {
15211 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15212 +	return(replica_to_gpu(aff, kfmlp_get_idx(sem, fq)));
15213 +}
15214 +
15215 +
15216 +// Smart KFMLP Affinity
15217 +
15218 +//static inline struct kfmlp_queue_info* kfmlp_aff_find_shortest(struct kfmlp_affinity* aff)
15219 +//{
15220 +//	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15221 +//	struct kfmlp_queue_info *shortest = &aff->q_info[0];
15222 +//	int i;
15223 +//
15224 +//	for(i = 1; i < sem->num_resources; ++i) {
15225 +//		if(aff->q_info[i].estimated_len < shortest->estimated_len) {
15226 +//			shortest = &aff->q_info[i];
15227 +//		}
15228 +//	}
15229 +//
15230 +//	return(shortest);
15231 +//}
15232 +
15233 +struct kfmlp_queue* gpu_kfmlp_advise_enqueue(struct kfmlp_affinity* aff, struct task_struct* t)
15234 +{
15235 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15236 +	lt_t min_len;
15237 +	int min_nr_users;
15238 +	struct kfmlp_queue_info *shortest;
15239 +	struct kfmlp_queue *to_enqueue;
15240 +	int i;
15241 +	int affinity_gpu;
15242 +
15243 +	// simply pick the shortest queue if, we have no affinity, or we have
15244 +	// affinity with the shortest
15245 +	if(unlikely(tsk_rt(t)->last_gpu < 0)) {
15246 +		affinity_gpu = aff->offset;  // first gpu
15247 +		TRACE_CUR("no affinity\n");
15248 +	}
15249 +	else {
15250 +		affinity_gpu = tsk_rt(t)->last_gpu;
15251 +	}
15252 +
15253 +	// all things being equal, let's start with the queue with which we have
15254 +	// affinity.  this helps us maintain affinity even when we don't have
15255 +	// an estiamte for local-affinity execution time (i.e., 2nd time on GPU)
15256 +	shortest = &aff->q_info[gpu_to_base_replica(aff, affinity_gpu)];
15257 +
15258 +//	if(shortest == aff->shortest_queue) {
15259 +//		TRACE_CUR("special case: have affinity with shortest queue\n");
15260 +//		goto out;
15261 +//	}
15262 +
15263 +	min_len = shortest->estimated_len + get_gpu_estimate(t, MIG_LOCAL);
15264 +	min_nr_users = *(shortest->nr_cur_users);
15265 +
15266 +	TRACE_CUR("cs is %llu on queue %d: est len = %llu\n",
15267 +			  get_gpu_estimate(t, MIG_LOCAL),
15268 +			  kfmlp_get_idx(sem, shortest->q),
15269 +			  min_len);
15270 +
15271 +	for(i = 0; i < sem->num_resources; ++i) {
15272 +		if(&aff->q_info[i] != shortest) {
15273 +
15274 +			lt_t est_len =
15275 +				aff->q_info[i].estimated_len +
15276 +				get_gpu_estimate(t, gpu_migration_distance(tsk_rt(t)->last_gpu, replica_to_gpu(aff, i)));
15277 +
15278 +			// queue is smaller, or they're equal and the other has a smaller number
15279 +			// of total users.
15280 +			//
15281 +			// tie-break on the shortest number of simult users.  this only kicks in
15282 +			// when there are more than 1 empty queues.
15283 +			if((est_len < min_len) ||
15284 +			   ((est_len == min_len) && (*(aff->q_info[i].nr_cur_users) < min_nr_users))) {
15285 +				shortest = &aff->q_info[i];
15286 +				min_len = est_len;
15287 +				min_nr_users = *(aff->q_info[i].nr_cur_users);
15288 +			}
15289 +
15290 +			TRACE_CUR("cs is %llu on queue %d: est len = %llu\n",
15291 +					  get_gpu_estimate(t, gpu_migration_distance(tsk_rt(t)->last_gpu, replica_to_gpu(aff, i))),
15292 +					  kfmlp_get_idx(sem, aff->q_info[i].q),
15293 +					  est_len);
15294 +		}
15295 +	}
15296 +
15297 +	to_enqueue = shortest->q;
15298 +	TRACE_CUR("enqueue on fq %d (non-aff wanted fq %d)\n",
15299 +			  kfmlp_get_idx(sem, to_enqueue),
15300 +			  kfmlp_get_idx(sem, sem->shortest_queue));
15301 +
15302 +	return to_enqueue;
15303 +}
15304 +
15305 +struct task_struct* gpu_kfmlp_advise_steal(struct kfmlp_affinity* aff, wait_queue_t** to_steal, struct kfmlp_queue** to_steal_from)
15306 +{
15307 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15308 +
15309 +	// For now, just steal highest priority waiter
15310 +	// TODO: Implement affinity-aware stealing.
15311 +
15312 +	return kfmlp_select_hp_steal(sem, to_steal, to_steal_from);
15313 +}
15314 +
15315 +
15316 +void gpu_kfmlp_notify_enqueue(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t)
15317 +{
15318 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15319 +	int replica = kfmlp_get_idx(sem, fq);
15320 +	int gpu = replica_to_gpu(aff, replica);
15321 +	struct kfmlp_queue_info *info = &aff->q_info[replica];
15322 +	lt_t est_time;
15323 +	lt_t est_len_before;
15324 +
15325 +	if(current == t) {
15326 +		tsk_rt(t)->suspend_gpu_tracker_on_block = 1;
15327 +	}
15328 +
15329 +	est_len_before = info->estimated_len;
15330 +	est_time = get_gpu_estimate(t, gpu_migration_distance(tsk_rt(t)->last_gpu, gpu));
15331 +	info->estimated_len += est_time;
15332 +
15333 +	TRACE_CUR("fq %d: q_len (%llu) + est_cs (%llu) = %llu\n",
15334 +			  kfmlp_get_idx(sem, info->q),
15335 +			  est_len_before, est_time,
15336 +			  info->estimated_len);
15337 +
15338 +//	if(aff->shortest_queue == info) {
15339 +//		// we may no longer be the shortest
15340 +//		aff->shortest_queue = kfmlp_aff_find_shortest(aff);
15341 +//
15342 +//		TRACE_CUR("shortest queue is fq %d (with %d in queue) has est len %llu\n",
15343 +//				  kfmlp_get_idx(sem, aff->shortest_queue->q),
15344 +//				  aff->shortest_queue->q->count,
15345 +//				  aff->shortest_queue->estimated_len);
15346 +//	}
15347 +}
15348 +
15349 +void gpu_kfmlp_notify_dequeue(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t)
15350 +{
15351 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15352 +	int replica = kfmlp_get_idx(sem, fq);
15353 +	int gpu = replica_to_gpu(aff, replica);
15354 +	struct kfmlp_queue_info *info = &aff->q_info[replica];
15355 +	lt_t est_time = get_gpu_estimate(t, gpu_migration_distance(tsk_rt(t)->last_gpu, gpu));
15356 +
15357 +	if(est_time > info->estimated_len) {
15358 +		WARN_ON(1);
15359 +		info->estimated_len = 0;
15360 +	}
15361 +	else {
15362 +		info->estimated_len -= est_time;
15363 +	}
15364 +
15365 +	TRACE_CUR("fq %d est len is now %llu\n",
15366 +			  kfmlp_get_idx(sem, info->q),
15367 +			  info->estimated_len);
15368 +
15369 +	// check to see if we're the shortest queue now.
15370 +//	if((aff->shortest_queue != info) &&
15371 +//	   (aff->shortest_queue->estimated_len > info->estimated_len)) {
15372 +//
15373 +//		aff->shortest_queue = info;
15374 +//
15375 +//		TRACE_CUR("shortest queue is fq %d (with %d in queue) has est len %llu\n",
15376 +//				  kfmlp_get_idx(sem, info->q),
15377 +//				  info->q->count,
15378 +//				  info->estimated_len);
15379 +//	}
15380 +}
15381 +
15382 +void gpu_kfmlp_notify_acquired(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t)
15383 +{
15384 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15385 +	int replica = kfmlp_get_idx(sem, fq);
15386 +	int gpu = replica_to_gpu(aff, replica);
15387 +
15388 +	tsk_rt(t)->gpu_migration = gpu_migration_distance(tsk_rt(t)->last_gpu, gpu);  // record the type of migration
15389 +
15390 +	TRACE_CUR("%s/%d acquired gpu %d.  migration type = %d\n",
15391 +			  t->comm, t->pid, gpu, tsk_rt(t)->gpu_migration);
15392 +
15393 +	// count the number or resource holders
15394 +	++(*(aff->q_info[replica].nr_cur_users));
15395 +
15396 +	reg_nv_device(gpu, 1, t);  // register
15397 +
15398 +	tsk_rt(t)->suspend_gpu_tracker_on_block = 0;
15399 +	reset_gpu_tracker(t);
15400 +	start_gpu_tracker(t);
15401 +}
15402 +
15403 +void gpu_kfmlp_notify_freed(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t)
15404 +{
15405 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15406 +	int replica = kfmlp_get_idx(sem, fq);
15407 +	int gpu = replica_to_gpu(aff, replica);
15408 +	lt_t est_time;
15409 +
15410 +	stop_gpu_tracker(t);  // stop the tracker before we do anything else.
15411 +
15412 +	est_time = get_gpu_estimate(t, gpu_migration_distance(tsk_rt(t)->last_gpu, gpu));
15413 +
15414 +	tsk_rt(t)->last_gpu = gpu;
15415 +
15416 +	// count the number or resource holders
15417 +	--(*(aff->q_info[replica].nr_cur_users));
15418 +
15419 +	reg_nv_device(gpu, 0, t);	// unregister
15420 +
15421 +	// update estimates
15422 +	update_gpu_estimate(t, get_gpu_time(t));
15423 +
15424 +	TRACE_CUR("%s/%d freed gpu %d.  actual time was %llu.  estimated was %llu.  diff is %d\n",
15425 +			  t->comm, t->pid, gpu,
15426 +			  get_gpu_time(t),
15427 +			  est_time,
15428 +			  (long long)get_gpu_time(t) - (long long)est_time);
15429 +}
15430 +
15431 +struct kfmlp_affinity_ops gpu_kfmlp_affinity =
15432 +{
15433 +	.advise_enqueue = gpu_kfmlp_advise_enqueue,
15434 +	.advise_steal = gpu_kfmlp_advise_steal,
15435 +	.notify_enqueue = gpu_kfmlp_notify_enqueue,
15436 +	.notify_dequeue = gpu_kfmlp_notify_dequeue,
15437 +	.notify_acquired = gpu_kfmlp_notify_acquired,
15438 +	.notify_freed = gpu_kfmlp_notify_freed,
15439 +	.replica_to_resource = gpu_replica_to_resource,
15440 +};
15441 +
15442 +struct affinity_observer* kfmlp_gpu_aff_obs_new(struct affinity_observer_ops* ops,
15443 +											void* __user args)
15444 +{
15445 +	return kfmlp_aff_obs_new(ops, &gpu_kfmlp_affinity, args);
15446 +}
15447 +
15448 +
15449 +
15450 +
15451 +
15452 +
15453 +
15454 +
15455 +// Simple KFMLP Affinity (standard KFMLP with auto-gpu registration)
15456 +
15457 +struct kfmlp_queue* simple_gpu_kfmlp_advise_enqueue(struct kfmlp_affinity* aff, struct task_struct* t)
15458 +{
15459 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15460 +	int min_count;
15461 +	int min_nr_users;
15462 +	struct kfmlp_queue_info *shortest;
15463 +	struct kfmlp_queue *to_enqueue;
15464 +	int i;
15465 +
15466 +//	TRACE_CUR("Simple GPU KFMLP advise_enqueue invoked\n");
15467 +
15468 +	shortest = &aff->q_info[0];
15469 +	min_count = shortest->q->count;
15470 +	min_nr_users = *(shortest->nr_cur_users);
15471 +
15472 +	TRACE_CUR("queue %d: waiters = %d, total holders = %d\n",
15473 +			  kfmlp_get_idx(sem, shortest->q),
15474 +			  shortest->q->count,
15475 +			  min_nr_users);
15476 +
15477 +	for(i = 1; i < sem->num_resources; ++i) {
15478 +		int len = aff->q_info[i].q->count;
15479 +
15480 +		// queue is smaller, or they're equal and the other has a smaller number
15481 +		// of total users.
15482 +		//
15483 +		// tie-break on the shortest number of simult users.  this only kicks in
15484 +		// when there are more than 1 empty queues.
15485 +		if((len < min_count) ||
15486 +		   ((len == min_count) && (*(aff->q_info[i].nr_cur_users) < min_nr_users))) {
15487 +			shortest = &aff->q_info[i];
15488 +			min_count = shortest->q->count;
15489 +			min_nr_users = *(aff->q_info[i].nr_cur_users);
15490 +		}
15491 +
15492 +		TRACE_CUR("queue %d: waiters = %d, total holders = %d\n",
15493 +				  kfmlp_get_idx(sem, aff->q_info[i].q),
15494 +				  aff->q_info[i].q->count,
15495 +				  *(aff->q_info[i].nr_cur_users));
15496 +	}
15497 +
15498 +	to_enqueue = shortest->q;
15499 +	TRACE_CUR("enqueue on fq %d (non-aff wanted fq %d)\n",
15500 +			  kfmlp_get_idx(sem, to_enqueue),
15501 +			  kfmlp_get_idx(sem, sem->shortest_queue));
15502 +
15503 +	return to_enqueue;
15504 +}
15505 +
15506 +struct task_struct* simple_gpu_kfmlp_advise_steal(struct kfmlp_affinity* aff, wait_queue_t** to_steal, struct kfmlp_queue** to_steal_from)
15507 +{
15508 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15509 +//	TRACE_CUR("Simple GPU KFMLP advise_steal invoked\n");
15510 +	return kfmlp_select_hp_steal(sem, to_steal, to_steal_from);
15511 +}
15512 +
15513 +void simple_gpu_kfmlp_notify_enqueue(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t)
15514 +{
15515 +//	TRACE_CUR("Simple GPU KFMLP notify_enqueue invoked\n");
15516 +}
15517 +
15518 +void simple_gpu_kfmlp_notify_dequeue(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t)
15519 +{
15520 +//	TRACE_CUR("Simple GPU KFMLP notify_dequeue invoked\n");
15521 +}
15522 +
15523 +void simple_gpu_kfmlp_notify_acquired(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t)
15524 +{
15525 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15526 +	int replica = kfmlp_get_idx(sem, fq);
15527 +	int gpu = replica_to_gpu(aff, replica);
15528 +
15529 +//	TRACE_CUR("Simple GPU KFMLP notify_acquired invoked\n");
15530 +
15531 +	// count the number or resource holders
15532 +	++(*(aff->q_info[replica].nr_cur_users));
15533 +
15534 +	reg_nv_device(gpu, 1, t);  // register
15535 +}
15536 +
15537 +void simple_gpu_kfmlp_notify_freed(struct kfmlp_affinity* aff, struct kfmlp_queue* fq, struct task_struct* t)
15538 +{
15539 +	struct kfmlp_semaphore *sem = kfmlp_from_lock(aff->obs.lock);
15540 +	int replica = kfmlp_get_idx(sem, fq);
15541 +	int gpu = replica_to_gpu(aff, replica);
15542 +
15543 +//	TRACE_CUR("Simple GPU KFMLP notify_freed invoked\n");
15544 +	// count the number or resource holders
15545 +	--(*(aff->q_info[replica].nr_cur_users));
15546 +
15547 +	reg_nv_device(gpu, 0, t);	// unregister
15548 +}
15549 +
15550 +struct kfmlp_affinity_ops simple_gpu_kfmlp_affinity =
15551 +{
15552 +	.advise_enqueue = simple_gpu_kfmlp_advise_enqueue,
15553 +	.advise_steal = simple_gpu_kfmlp_advise_steal,
15554 +	.notify_enqueue = simple_gpu_kfmlp_notify_enqueue,
15555 +	.notify_dequeue = simple_gpu_kfmlp_notify_dequeue,
15556 +	.notify_acquired = simple_gpu_kfmlp_notify_acquired,
15557 +	.notify_freed = simple_gpu_kfmlp_notify_freed,
15558 +	.replica_to_resource = gpu_replica_to_resource,
15559 +};
15560 +
15561 +struct affinity_observer* kfmlp_simple_gpu_aff_obs_new(struct affinity_observer_ops* ops,
15562 +												void* __user args)
15563 +{
15564 +	return kfmlp_aff_obs_new(ops, &simple_gpu_kfmlp_affinity, args);
15565 +}
15566 +
15567 +#endif
15568 +
15569 diff --git a/litmus/litmus.c b/litmus/litmus.c
15570 new file mode 100644
15571 index 0000000..d1f836c
15572 --- /dev/null
15573 +++ b/litmus/litmus.c
15574 @@ -0,0 +1,684 @@
15575 +/*
15576 + * litmus.c -- Implementation of the LITMUS syscalls,
15577 + *             the LITMUS intialization code,
15578 + *             and the procfs interface..
15579 + */
15580 +#include <asm/uaccess.h>
15581 +#include <linux/uaccess.h>
15582 +#include <linux/sysrq.h>
15583 +#include <linux/sched.h>
15584 +#include <linux/module.h>
15585 +#include <linux/slab.h>
15586 +
15587 +#include <litmus/litmus.h>
15588 +#include <litmus/bheap.h>
15589 +#include <litmus/trace.h>
15590 +#include <litmus/rt_domain.h>
15591 +#include <litmus/litmus_proc.h>
15592 +#include <litmus/sched_trace.h>
15593 +
15594 +#ifdef CONFIG_SCHED_CPU_AFFINITY
15595 +#include <litmus/affinity.h>
15596 +#endif
15597 +
15598 +#ifdef CONFIG_LITMUS_NVIDIA
15599 +#include <litmus/nvidia_info.h>
15600 +#endif
15601 +
15602 +/* Number of RT tasks that exist in the system */
15603 +atomic_t rt_task_count 		= ATOMIC_INIT(0);
15604 +static DEFINE_RAW_SPINLOCK(task_transition_lock);
15605 +/* synchronize plugin switching */
15606 +atomic_t cannot_use_plugin	= ATOMIC_INIT(0);
15607 +
15608 +/* Give log messages sequential IDs. */
15609 +atomic_t __log_seq_no = ATOMIC_INIT(0);
15610 +
15611 +#ifdef CONFIG_RELEASE_MASTER
15612 +/* current master CPU for handling timer IRQs */
15613 +atomic_t release_master_cpu = ATOMIC_INIT(NO_CPU);
15614 +#endif
15615 +
15616 +static struct kmem_cache * bheap_node_cache;
15617 +extern struct kmem_cache * release_heap_cache;
15618 +
15619 +struct bheap_node* bheap_node_alloc(int gfp_flags)
15620 +{
15621 +	return kmem_cache_alloc(bheap_node_cache, gfp_flags);
15622 +}
15623 +
15624 +void bheap_node_free(struct bheap_node* hn)
15625 +{
15626 +	kmem_cache_free(bheap_node_cache, hn);
15627 +}
15628 +
15629 +struct release_heap* release_heap_alloc(int gfp_flags);
15630 +void release_heap_free(struct release_heap* rh);
15631 +
15632 +#ifdef CONFIG_LITMUS_NVIDIA
15633 +/*
15634 + * sys_register_nv_device
15635 + * @nv_device_id: The Nvidia device id that the task want to register
15636 + * @reg_action: set to '1' to register the specified device. zero otherwise.
15637 + * Syscall for register task's designated nvidia device into NV_DEVICE_REG array
15638 + * Returns EFAULT  if nv_device_id is out of range.
15639 + *	   0       if success
15640 + */
15641 +asmlinkage long sys_register_nv_device(int nv_device_id, int reg_action)
15642 +{
15643 +	/* register the device to caller (aka 'current') */
15644 +	return(reg_nv_device(nv_device_id, reg_action, current));
15645 +}
15646 +#else
15647 +asmlinkage long sys_register_nv_device(int nv_device_id, int reg_action)
15648 +{
15649 +	return(-EINVAL);
15650 +}
15651 +#endif
15652 +
15653 +
15654 +/*
15655 + * sys_set_task_rt_param
15656 + * @pid: Pid of the task which scheduling parameters must be changed
15657 + * @param: New real-time extension parameters such as the execution cost and
15658 + *         period
15659 + * Syscall for manipulating with task rt extension params
15660 + * Returns EFAULT  if param is NULL.
15661 + *         ESRCH   if pid is not corrsponding
15662 + *	           to a valid task.
15663 + *	   EINVAL  if either period or execution cost is <=0
15664 + *	   EPERM   if pid is a real-time task
15665 + *	   0       if success
15666 + *
15667 + * Only non-real-time tasks may be configured with this system call
15668 + * to avoid races with the scheduler. In practice, this means that a
15669 + * task's parameters must be set _before_ calling sys_prepare_rt_task()
15670 + *
15671 + * find_task_by_vpid() assumes that we are in the same namespace of the
15672 + * target.
15673 + */
15674 +asmlinkage long sys_set_rt_task_param(pid_t pid, struct rt_task __user * param)
15675 +{
15676 +	struct rt_task tp;
15677 +	struct task_struct *target;
15678 +	int retval = -EINVAL;
15679 +
15680 +	printk("Setting up rt task parameters for process %d.\n", pid);
15681 +
15682 +	if (pid < 0 || param == 0) {
15683 +		goto out;
15684 +	}
15685 +	if (copy_from_user(&tp, param, sizeof(tp))) {
15686 +		retval = -EFAULT;
15687 +		goto out;
15688 +	}
15689 +
15690 +	/* Task search and manipulation must be protected */
15691 +	read_lock_irq(&tasklist_lock);
15692 +	if (!(target = find_task_by_vpid(pid))) {
15693 +		retval = -ESRCH;
15694 +		goto out_unlock;
15695 +	}
15696 +
15697 +	if (is_realtime(target)) {
15698 +		/* The task is already a real-time task.
15699 +		 * We cannot not allow parameter changes at this point.
15700 +		 */
15701 +		retval = -EBUSY;
15702 +		goto out_unlock;
15703 +	}
15704 +
15705 +	if (tp.exec_cost <= 0)
15706 +		goto out_unlock;
15707 +	if (tp.period <= 0)
15708 +		goto out_unlock;
15709 +	if (!cpu_online(tp.cpu))
15710 +		goto out_unlock;
15711 +	if (tp.period < tp.exec_cost)
15712 +	{
15713 +		printk(KERN_INFO "litmus: real-time task %d rejected "
15714 +		       "because wcet > period\n", pid);
15715 +		goto out_unlock;
15716 +	}
15717 +	if (	tp.cls != RT_CLASS_HARD &&
15718 +		tp.cls != RT_CLASS_SOFT &&
15719 +		tp.cls != RT_CLASS_BEST_EFFORT)
15720 +	{
15721 +		printk(KERN_INFO "litmus: real-time task %d rejected "
15722 +				 "because its class is invalid\n", pid);
15723 +		goto out_unlock;
15724 +	}
15725 +	if (tp.budget_policy != NO_ENFORCEMENT &&
15726 +	    tp.budget_policy != QUANTUM_ENFORCEMENT &&
15727 +	    tp.budget_policy != PRECISE_ENFORCEMENT)
15728 +	{
15729 +		printk(KERN_INFO "litmus: real-time task %d rejected "
15730 +		       "because unsupported budget enforcement policy "
15731 +		       "specified (%d)\n",
15732 +		       pid, tp.budget_policy);
15733 +		goto out_unlock;
15734 +	}
15735 +
15736 +	target->rt_param.task_params = tp;
15737 +
15738 +	retval = 0;
15739 +      out_unlock:
15740 +	read_unlock_irq(&tasklist_lock);
15741 +      out:
15742 +	return retval;
15743 +}
15744 +
15745 +/*
15746 + * Getter of task's RT params
15747 + *   returns EINVAL if param or pid is NULL
15748 + *   returns ESRCH  if pid does not correspond to a valid task
15749 + *   returns EFAULT if copying of parameters has failed.
15750 + *
15751 + *   find_task_by_vpid() assumes that we are in the same namespace of the
15752 + *   target.
15753 + */
15754 +asmlinkage long sys_get_rt_task_param(pid_t pid, struct rt_task __user * param)
15755 +{
15756 +	int retval = -EINVAL;
15757 +	struct task_struct *source;
15758 +	struct rt_task lp;
15759 +	if (param == 0 || pid < 0)
15760 +		goto out;
15761 +	read_lock(&tasklist_lock);
15762 +	if (!(source = find_task_by_vpid(pid))) {
15763 +		retval = -ESRCH;
15764 +		goto out_unlock;
15765 +	}
15766 +	lp = source->rt_param.task_params;
15767 +	read_unlock(&tasklist_lock);
15768 +	/* Do copying outside the lock */
15769 +	retval =
15770 +	    copy_to_user(param, &lp, sizeof(lp)) ? -EFAULT : 0;
15771 +	return retval;
15772 +      out_unlock:
15773 +	read_unlock(&tasklist_lock);
15774 +      out:
15775 +	return retval;
15776 +
15777 +}
15778 +
15779 +/*
15780 + *	This is the crucial function for periodic task implementation,
15781 + *	It checks if a task is periodic, checks if such kind of sleep
15782 + *	is permitted and calls plugin-specific sleep, which puts the
15783 + *	task into a wait array.
15784 + *	returns 0 on successful wakeup
15785 + *	returns EPERM if current conditions do not permit such sleep
15786 + *	returns EINVAL if current task is not able to go to sleep
15787 + */
15788 +asmlinkage long sys_complete_job(void)
15789 +{
15790 +	int retval = -EPERM;
15791 +	if (!is_realtime(current)) {
15792 +		retval = -EINVAL;
15793 +		goto out;
15794 +	}
15795 +	/* Task with negative or zero period cannot sleep */
15796 +	if (get_rt_period(current) <= 0) {
15797 +		retval = -EINVAL;
15798 +		goto out;
15799 +	}
15800 +	/* The plugin has to put the task into an
15801 +	 * appropriate queue and call schedule
15802 +	 */
15803 +	retval = litmus->complete_job();
15804 +      out:
15805 +	return retval;
15806 +}
15807 +
15808 +/*	This is an "improved" version of sys_complete_job that
15809 + *      addresses the problem of unintentionally missing a job after
15810 + *      an overrun.
15811 + *
15812 + *	returns 0 on successful wakeup
15813 + *	returns EPERM if current conditions do not permit such sleep
15814 + *	returns EINVAL if current task is not able to go to sleep
15815 + */
15816 +asmlinkage long sys_wait_for_job_release(unsigned int job)
15817 +{
15818 +	int retval = -EPERM;
15819 +	if (!is_realtime(current)) {
15820 +		retval = -EINVAL;
15821 +		goto out;
15822 +	}
15823 +
15824 +	/* Task with negative or zero period cannot sleep */
15825 +	if (get_rt_period(current) <= 0) {
15826 +		retval = -EINVAL;
15827 +		goto out;
15828 +	}
15829 +
15830 +	retval = 0;
15831 +
15832 +	/* first wait until we have "reached" the desired job
15833 +	 *
15834 +	 * This implementation has at least two problems:
15835 +	 *
15836 +	 * 1) It doesn't gracefully handle the wrap around of
15837 +	 *    job_no. Since LITMUS is a prototype, this is not much
15838 +	 *    of a problem right now.
15839 +	 *
15840 +	 * 2) It is theoretically racy if a job release occurs
15841 +	 *    between checking job_no and calling sleep_next_period().
15842 +	 *    A proper solution would requiring adding another callback
15843 +	 *    in the plugin structure and testing the condition with
15844 +	 *    interrupts disabled.
15845 +	 *
15846 +	 * FIXME: At least problem 2 should be taken care of eventually.
15847 +	 */
15848 +	while (!retval && job > current->rt_param.job_params.job_no)
15849 +		/* If the last job overran then job <= job_no and we
15850 +		 * don't send the task to sleep.
15851 +		 */
15852 +		retval = litmus->complete_job();
15853 +      out:
15854 +	return retval;
15855 +}
15856 +
15857 +/*	This is a helper syscall to query the current job sequence number.
15858 + *
15859 + *	returns 0 on successful query
15860 + *	returns EPERM if task is not a real-time task.
15861 + *      returns EFAULT if &job is not a valid pointer.
15862 + */
15863 +asmlinkage long sys_query_job_no(unsigned int __user *job)
15864 +{
15865 +	int retval = -EPERM;
15866 +	if (is_realtime(current))
15867 +		retval = put_user(current->rt_param.job_params.job_no, job);
15868 +
15869 +	return retval;
15870 +}
15871 +
15872 +
15873 +/* sys_null_call() is only used for determining raw system call
15874 + * overheads (kernel entry, kernel exit). It has no useful side effects.
15875 + * If ts is non-NULL, then the current Feather-Trace time is recorded.
15876 + */
15877 +asmlinkage long sys_null_call(cycles_t __user *ts)
15878 +{
15879 +	long ret = 0;
15880 +	cycles_t now;
15881 +
15882 +	if (ts) {
15883 +		now = get_cycles();
15884 +		ret = put_user(now, ts);
15885 +	}
15886 +
15887 +	return ret;
15888 +}
15889 +
15890 +
15891 +#if defined(CONFIG_LITMUS_NVIDIA) && defined(CONFIG_LITMUS_AFFINITY_LOCKING)
15892 +void init_gpu_affinity_state(struct task_struct* p)
15893 +{
15894 +	// under-damped
15895 +	//p->rt_param.gpu_fb_param_a = _frac(14008, 10000);
15896 +	//p->rt_param.gpu_fb_param_b = _frac(16024, 10000);
15897 +
15898 +	// emperical;
15899 +	p->rt_param.gpu_fb_param_a[0] = _frac(7550, 10000);
15900 +	p->rt_param.gpu_fb_param_b[0] = _frac(45800, 10000);
15901 +
15902 +	p->rt_param.gpu_fb_param_a[1] = _frac(8600, 10000);
15903 +	p->rt_param.gpu_fb_param_b[1] = _frac(40000, 10000);
15904 +
15905 +	p->rt_param.gpu_fb_param_a[2] = _frac(6890, 10000);
15906 +	p->rt_param.gpu_fb_param_b[2] = _frac(40000, 10000);
15907 +
15908 +	p->rt_param.gpu_fb_param_a[3] = _frac(7580, 10000);
15909 +	p->rt_param.gpu_fb_param_b[3] = _frac(34590, 10000);
15910 +
15911 +	p->rt_param.gpu_migration = MIG_NONE;
15912 +	p->rt_param.last_gpu = -1;
15913 +}
15914 +#endif
15915 +
15916 +/* p is a real-time task. Re-init its state as a best-effort task. */
15917 +static void reinit_litmus_state(struct task_struct* p, int restore)
15918 +{
15919 +	struct rt_task  user_config = {};
15920 +	void*  ctrl_page     = NULL;
15921 +
15922 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
15923 +	binheap_order_t	prio_order = NULL;
15924 +#endif
15925 +
15926 +	if (restore) {
15927 +		/* Safe user-space provided configuration data.
15928 +		 * and allocated page. */
15929 +		user_config = p->rt_param.task_params;
15930 +		ctrl_page   = p->rt_param.ctrl_page;
15931 +	}
15932 +
15933 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
15934 +	prio_order = p->rt_param.hp_blocked_tasks.compare;
15935 +#endif
15936 +
15937 +	/* We probably should not be inheriting any task's priority
15938 +	 * at this point in time.
15939 +	 */
15940 +	WARN_ON(p->rt_param.inh_task);
15941 +
15942 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
15943 +	WARN_ON(p->rt_param.blocked_lock);
15944 +    WARN_ON(!binheap_empty(&p->rt_param.hp_blocked_tasks));
15945 +#endif
15946 +
15947 +#ifdef CONFIG_LITMUS_SOFTIRQD
15948 +	/* We probably should not have any tasklets executing for
15949 +     * us at this time.
15950 +	 */
15951 +    WARN_ON(p->rt_param.cur_klitirqd);
15952 +	WARN_ON(atomic_read(&p->rt_param.klitirqd_sem_stat) == HELD);
15953 +
15954 +	if(p->rt_param.cur_klitirqd)
15955 +		flush_pending(p->rt_param.cur_klitirqd, p);
15956 +
15957 +	if(atomic_read(&p->rt_param.klitirqd_sem_stat) == HELD)
15958 +		up_and_set_stat(p, NOT_HELD, &p->rt_param.klitirqd_sem);
15959 +#endif
15960 +
15961 +#ifdef CONFIG_LITMUS_NVIDIA
15962 +	WARN_ON(p->rt_param.held_gpus != 0);
15963 +#endif
15964 +
15965 +	/* Cleanup everything else. */
15966 +	memset(&p->rt_param, 0, sizeof(p->rt_param));
15967 +
15968 +	/* Restore preserved fields. */
15969 +	if (restore) {
15970 +		p->rt_param.task_params = user_config;
15971 +		p->rt_param.ctrl_page   = ctrl_page;
15972 +	}
15973 +
15974 +#if defined(CONFIG_LITMUS_NVIDIA) && defined(CONFIG_LITMUS_AFFINITY_LOCKING)
15975 +	init_gpu_affinity_state(p);
15976 +#endif
15977 +
15978 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
15979 +	INIT_BINHEAP_HANDLE(&p->rt_param.hp_blocked_tasks, prio_order);
15980 +	raw_spin_lock_init(&p->rt_param.hp_blocked_tasks_lock);
15981 +#endif
15982 +}
15983 +
15984 +long litmus_admit_task(struct task_struct* tsk)
15985 +{
15986 +	long retval = 0;
15987 +	unsigned long flags;
15988 +
15989 +	BUG_ON(is_realtime(tsk));
15990 +
15991 +	if (get_rt_period(tsk) == 0 ||
15992 +	    get_exec_cost(tsk) > get_rt_period(tsk)) {
15993 +		TRACE_TASK(tsk, "litmus admit: invalid task parameters "
15994 +			   "(%lu, %lu)\n",
15995 +		           get_exec_cost(tsk), get_rt_period(tsk));
15996 +		retval = -EINVAL;
15997 +		goto out;
15998 +	}
15999 +
16000 +	if (!cpu_online(get_partition(tsk))) {
16001 +		TRACE_TASK(tsk, "litmus admit: cpu %d is not online\n",
16002 +			   get_partition(tsk));
16003 +		retval = -EINVAL;
16004 +		goto out;
16005 +	}
16006 +
16007 +	INIT_LIST_HEAD(&tsk_rt(tsk)->list);
16008 +
16009 +	/* avoid scheduler plugin changing underneath us */
16010 +	raw_spin_lock_irqsave(&task_transition_lock, flags);
16011 +
16012 +	/* allocate heap node for this task */
16013 +	tsk_rt(tsk)->heap_node = bheap_node_alloc(GFP_ATOMIC);
16014 +	tsk_rt(tsk)->rel_heap = release_heap_alloc(GFP_ATOMIC);
16015 +
16016 +	if (!tsk_rt(tsk)->heap_node || !tsk_rt(tsk)->rel_heap) {
16017 +		printk(KERN_WARNING "litmus: no more heap node memory!?\n");
16018 +
16019 +		bheap_node_free(tsk_rt(tsk)->heap_node);
16020 +		release_heap_free(tsk_rt(tsk)->rel_heap);
16021 +
16022 +		retval = -ENOMEM;
16023 +		goto out_unlock;
16024 +	} else {
16025 +		bheap_node_init(&tsk_rt(tsk)->heap_node, tsk);
16026 +	}
16027 +
16028 +
16029 +#ifdef CONFIG_LITMUS_NVIDIA
16030 +	atomic_set(&tsk_rt(tsk)->nv_int_count, 0);
16031 +#endif
16032 +#if defined(CONFIG_LITMUS_NVIDIA) && defined(CONFIG_LITMUS_AFFINITY_LOCKING)
16033 +	init_gpu_affinity_state(tsk);
16034 +#endif
16035 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
16036 +	tsk_rt(tsk)->blocked_lock = NULL;
16037 +	raw_spin_lock_init(&tsk_rt(tsk)->hp_blocked_tasks_lock);
16038 +	//INIT_BINHEAP_HANDLE(&tsk_rt(tsk)->hp_blocked_tasks, prio_order);  // done by scheduler
16039 +#endif
16040 +#ifdef CONFIG_LITMUS_SOFTIRQD
16041 +	/* proxy thread off by default */
16042 +	tsk_rt(tsk)is_proxy_thread = 0;
16043 +    tsk_rt(tsk)cur_klitirqd = NULL;
16044 +	mutex_init(&tsk_rt(tsk)->klitirqd_sem);
16045 +	atomic_set(&tsk_rt(tsk)->klitirqd_sem_stat, NOT_HELD);
16046 +#endif
16047 +
16048 +	retval = litmus->admit_task(tsk);
16049 +
16050 +	if (!retval) {
16051 +		sched_trace_task_name(tsk);
16052 +		sched_trace_task_param(tsk);
16053 +		atomic_inc(&rt_task_count);
16054 +	}
16055 +
16056 +out_unlock:
16057 +	raw_spin_unlock_irqrestore(&task_transition_lock, flags);
16058 +out:
16059 +	return retval;
16060 +}
16061 +
16062 +void litmus_exit_task(struct task_struct* tsk)
16063 +{
16064 +	if (is_realtime(tsk)) {
16065 +		sched_trace_task_completion(tsk, 1);
16066 +
16067 +		litmus->task_exit(tsk);
16068 +
16069 +		BUG_ON(bheap_node_in_heap(tsk_rt(tsk)->heap_node));
16070 +	        bheap_node_free(tsk_rt(tsk)->heap_node);
16071 +		release_heap_free(tsk_rt(tsk)->rel_heap);
16072 +
16073 +		atomic_dec(&rt_task_count);
16074 +		reinit_litmus_state(tsk, 1);
16075 +	}
16076 +}
16077 +
16078 +/* IPI callback to synchronize plugin switching */
16079 +static void synch_on_plugin_switch(void* info)
16080 +{
16081 +	atomic_inc(&cannot_use_plugin);
16082 +	while (atomic_read(&cannot_use_plugin) > 0)
16083 +		cpu_relax();
16084 +}
16085 +
16086 +/* Switching a plugin in use is tricky.
16087 + * We must watch out that no real-time tasks exists
16088 + * (and that none is created in parallel) and that the plugin is not
16089 + * currently in use on any processor (in theory).
16090 + */
16091 +int switch_sched_plugin(struct sched_plugin* plugin)
16092 +{
16093 +	//unsigned long flags;
16094 +	int ret = 0;
16095 +
16096 +	BUG_ON(!plugin);
16097 +
16098 +	/* forbid other cpus to use the plugin */
16099 +	atomic_set(&cannot_use_plugin, 1);
16100 +	/* send IPI to force other CPUs to synch with us */
16101 +	smp_call_function(synch_on_plugin_switch, NULL, 0);
16102 +
16103 +	/* wait until all other CPUs have started synch */
16104 +	while (atomic_read(&cannot_use_plugin) < num_online_cpus())
16105 +		cpu_relax();
16106 +
16107 +#ifdef CONFIG_LITMUS_SOFTIRQD
16108 +	if(!klitirqd_is_dead())
16109 +	{
16110 +		kill_klitirqd();
16111 +	}
16112 +#endif
16113 +
16114 +	/* stop task transitions */
16115 +	//raw_spin_lock_irqsave(&task_transition_lock, flags);
16116 +
16117 +	/* don't switch if there are active real-time tasks */
16118 +	if (atomic_read(&rt_task_count) == 0) {
16119 +		ret = litmus->deactivate_plugin();
16120 +		if (0 != ret)
16121 +			goto out;
16122 +		ret = plugin->activate_plugin();
16123 +		if (0 != ret) {
16124 +			printk(KERN_INFO "Can't activate %s (%d).\n",
16125 +			       plugin->plugin_name, ret);
16126 +			plugin = &linux_sched_plugin;
16127 +		}
16128 +		printk(KERN_INFO "Switching to LITMUS^RT plugin %s.\n", plugin->plugin_name);
16129 +		litmus = plugin;
16130 +	} else
16131 +		ret = -EBUSY;
16132 +out:
16133 +	//raw_spin_unlock_irqrestore(&task_transition_lock, flags);
16134 +	atomic_set(&cannot_use_plugin, 0);
16135 +	return ret;
16136 +}
16137 +
16138 +/* Called upon fork.
16139 + * p is the newly forked task.
16140 + */
16141 +void litmus_fork(struct task_struct* p)
16142 +{
16143 +	if (is_realtime(p)) {
16144 +		/* clean out any litmus related state, don't preserve anything */
16145 +		reinit_litmus_state(p, 0);
16146 +		/* Don't let the child be a real-time task.  */
16147 +		p->sched_reset_on_fork = 1;
16148 +	} else
16149 +		/* non-rt tasks might have ctrl_page set */
16150 +		tsk_rt(p)->ctrl_page = NULL;
16151 +
16152 +	/* od tables are never inherited across a fork */
16153 +	p->od_table = NULL;
16154 +}
16155 +
16156 +/* Called upon execve().
16157 + * current is doing the exec.
16158 + * Don't let address space specific stuff leak.
16159 + */
16160 +void litmus_exec(void)
16161 +{
16162 +	struct task_struct* p = current;
16163 +
16164 +	if (is_realtime(p)) {
16165 +		WARN_ON(p->rt_param.inh_task);
16166 +		if (tsk_rt(p)->ctrl_page) {
16167 +			free_page((unsigned long) tsk_rt(p)->ctrl_page);
16168 +			tsk_rt(p)->ctrl_page = NULL;
16169 +		}
16170 +	}
16171 +}
16172 +
16173 +void exit_litmus(struct task_struct *dead_tsk)
16174 +{
16175 +	/* We also allow non-RT tasks to
16176 +	 * allocate control pages to allow
16177 +	 * measurements with non-RT tasks.
16178 +	 * So check if we need to free the page
16179 +	 * in any case.
16180 +	 */
16181 +	if (tsk_rt(dead_tsk)->ctrl_page) {
16182 +		TRACE_TASK(dead_tsk,
16183 +			   "freeing ctrl_page %p\n",
16184 +			   tsk_rt(dead_tsk)->ctrl_page);
16185 +		free_page((unsigned long) tsk_rt(dead_tsk)->ctrl_page);
16186 +	}
16187 +
16188 +	/* main cleanup only for RT tasks */
16189 +	if (is_realtime(dead_tsk))
16190 +		litmus_exit_task(dead_tsk);
16191 +}
16192 +
16193 +
16194 +#ifdef CONFIG_MAGIC_SYSRQ
16195 +int sys_kill(int pid, int sig);
16196 +
16197 +static void sysrq_handle_kill_rt_tasks(int key)
16198 +{
16199 +	struct task_struct *t;
16200 +	read_lock(&tasklist_lock);
16201 +	for_each_process(t) {
16202 +		if (is_realtime(t)) {
16203 +			sys_kill(t->pid, SIGKILL);
16204 +		}
16205 +	}
16206 +	read_unlock(&tasklist_lock);
16207 +}
16208 +
16209 +static struct sysrq_key_op sysrq_kill_rt_tasks_op = {
16210 +	.handler	= sysrq_handle_kill_rt_tasks,
16211 +	.help_msg	= "quit-rt-tasks(X)",
16212 +	.action_msg	= "sent SIGKILL to all LITMUS^RT real-time tasks",
16213 +};
16214 +#endif
16215 +
16216 +extern struct sched_plugin linux_sched_plugin;
16217 +
16218 +static int __init _init_litmus(void)
16219 +{
16220 +	/*      Common initializers,
16221 +	 *      mode change lock is used to enforce single mode change
16222 +	 *      operation.
16223 +	 */
16224 +	printk("Starting LITMUS^RT kernel\n");
16225 +
16226 +	BUILD_BUG_ON(sizeof(union np_flag) != sizeof(uint32_t));
16227 +
16228 +	register_sched_plugin(&linux_sched_plugin);
16229 +
16230 +	bheap_node_cache    = KMEM_CACHE(bheap_node, SLAB_PANIC);
16231 +	release_heap_cache = KMEM_CACHE(release_heap, SLAB_PANIC);
16232 +
16233 +#ifdef CONFIG_MAGIC_SYSRQ
16234 +	/* offer some debugging help */
16235 +	if (!register_sysrq_key('x', &sysrq_kill_rt_tasks_op))
16236 +		printk("Registered kill rt tasks magic sysrq.\n");
16237 +	else
16238 +		printk("Could not register kill rt tasks magic sysrq.\n");
16239 +#endif
16240 +
16241 +	init_litmus_proc();
16242 +
16243 +#ifdef CONFIG_SCHED_CPU_AFFINITY
16244 +	init_topology();
16245 +#endif
16246 +
16247 +	return 0;
16248 +}
16249 +
16250 +static void _exit_litmus(void)
16251 +{
16252 +	exit_litmus_proc();
16253 +	kmem_cache_destroy(bheap_node_cache);
16254 +	kmem_cache_destroy(release_heap_cache);
16255 +}
16256 +
16257 +module_init(_init_litmus);
16258 +module_exit(_exit_litmus);
16259 diff --git a/litmus/litmus_pai_softirq.c b/litmus/litmus_pai_softirq.c
16260 new file mode 100644
16261 index 0000000..300571a
16262 --- /dev/null
16263 +++ b/litmus/litmus_pai_softirq.c
16264 @@ -0,0 +1,64 @@
16265 +#include <linux/interrupt.h>
16266 +#include <linux/percpu.h>
16267 +#include <linux/cpu.h>
16268 +#include <linux/kthread.h>
16269 +#include <linux/ftrace.h>
16270 +#include <linux/smp.h>
16271 +#include <linux/slab.h>
16272 +#include <linux/mutex.h>
16273 +
16274 +#include <linux/sched.h>
16275 +#include <linux/cpuset.h>
16276 +
16277 +#include <litmus/litmus.h>
16278 +#include <litmus/sched_trace.h>
16279 +#include <litmus/jobs.h>
16280 +#include <litmus/sched_plugin.h>
16281 +#include <litmus/litmus_softirq.h>
16282 +
16283 +
16284 +
16285 +int __litmus_tasklet_schedule(struct tasklet_struct *t, unsigned int k_id)
16286 +{
16287 +	int ret = 0; /* assume failure */
16288 +    if(unlikely((t->owner == NULL) || !is_realtime(t->owner)))
16289 +    {
16290 +        TRACE("%s: No owner associated with this tasklet!\n", __FUNCTION__);
16291 +        BUG();
16292 +    }
16293 +
16294 +    ret = litmus->enqueue_pai_tasklet(t);
16295 +
16296 +	return(ret);
16297 +}
16298 +
16299 +EXPORT_SYMBOL(__litmus_tasklet_schedule);
16300 +
16301 +
16302 +
16303 +// failure causes default Linux handling.
16304 +int __litmus_tasklet_hi_schedule(struct tasklet_struct *t, unsigned int k_id)
16305 +{
16306 +	int ret = 0; /* assume failure */
16307 +	return(ret);
16308 +}
16309 +EXPORT_SYMBOL(__litmus_tasklet_hi_schedule);
16310 +
16311 +
16312 +// failure causes default Linux handling.
16313 +int __litmus_tasklet_hi_schedule_first(struct tasklet_struct *t, unsigned int k_id)
16314 +{
16315 +	int ret = 0; /* assume failure */
16316 +	return(ret);
16317 +}
16318 +EXPORT_SYMBOL(__litmus_tasklet_hi_schedule_first);
16319 +
16320 +
16321 +// failure causes default Linux handling.
16322 +int __litmus_schedule_work(struct work_struct *w, unsigned int k_id)
16323 +{
16324 +	int ret = 0; /* assume failure */
16325 +	return(ret);
16326 +}
16327 +EXPORT_SYMBOL(__litmus_schedule_work);
16328 +
16329 diff --git a/litmus/litmus_proc.c b/litmus/litmus_proc.c
16330 new file mode 100644
16331 index 0000000..9ab7e01
16332 --- /dev/null
16333 +++ b/litmus/litmus_proc.c
16334 @@ -0,0 +1,364 @@
16335 +/*
16336 + * litmus_proc.c -- Implementation of the /proc/litmus directory tree.
16337 + */
16338 +
16339 +#include <linux/sched.h>
16340 +#include <linux/uaccess.h>
16341 +
16342 +#include <litmus/litmus.h>
16343 +#include <litmus/litmus_proc.h>
16344 +
16345 +#include <litmus/clustered.h>
16346 +
16347 +/* in litmus/litmus.c */
16348 +extern atomic_t rt_task_count;
16349 +
16350 +static struct proc_dir_entry *litmus_dir = NULL,
16351 +	*curr_file = NULL,
16352 +	*stat_file = NULL,
16353 +	*plugs_dir = NULL,
16354 +#ifdef CONFIG_RELEASE_MASTER
16355 +	*release_master_file = NULL,
16356 +#endif
16357 +#ifdef CONFIG_LITMUS_SOFTIRQD
16358 +	*klitirqd_file = NULL,
16359 +#endif
16360 +	*plugs_file = NULL;
16361 +
16362 +/* in litmus/sync.c */
16363 +int count_tasks_waiting_for_release(void);
16364 +
16365 +extern int proc_read_klitirqd_stats(char *page, char **start,
16366 +									off_t off, int count,
16367 +									int *eof, void *data);
16368 +
16369 +static int proc_read_stats(char *page, char **start,
16370 +			   off_t off, int count,
16371 +			   int *eof, void *data)
16372 +{
16373 +	int len;
16374 +
16375 +	len = snprintf(page, PAGE_SIZE,
16376 +		       "real-time tasks   = %d\n"
16377 +		       "ready for release = %d\n",
16378 +		       atomic_read(&rt_task_count),
16379 +		       count_tasks_waiting_for_release());
16380 +	return len;
16381 +}
16382 +
16383 +static int proc_read_plugins(char *page, char **start,
16384 +			   off_t off, int count,
16385 +			   int *eof, void *data)
16386 +{
16387 +	int len;
16388 +
16389 +	len = print_sched_plugins(page, PAGE_SIZE);
16390 +	return len;
16391 +}
16392 +
16393 +static int proc_read_curr(char *page, char **start,
16394 +			  off_t off, int count,
16395 +			  int *eof, void *data)
16396 +{
16397 +	int len;
16398 +
16399 +	len = snprintf(page, PAGE_SIZE, "%s\n", litmus->plugin_name);
16400 +	return len;
16401 +}
16402 +
16403 +/* in litmus/litmus.c */
16404 +int switch_sched_plugin(struct sched_plugin*);
16405 +
16406 +static int proc_write_curr(struct file *file,
16407 +			   const char *buffer,
16408 +			   unsigned long count,
16409 +			   void *data)
16410 +{
16411 +	int len, ret;
16412 +	char name[65];
16413 +	struct sched_plugin* found;
16414 +
16415 +	len = copy_and_chomp(name, sizeof(name), buffer, count);
16416 +	if (len < 0)
16417 +		return len;
16418 +
16419 +	found = find_sched_plugin(name);
16420 +
16421 +	if (found) {
16422 +		ret = switch_sched_plugin(found);
16423 +		if (ret != 0)
16424 +			printk(KERN_INFO "Could not switch plugin: %d\n", ret);
16425 +	} else
16426 +		printk(KERN_INFO "Plugin '%s' is unknown.\n", name);
16427 +
16428 +	return len;
16429 +}
16430 +
16431 +#ifdef CONFIG_RELEASE_MASTER
16432 +static int proc_read_release_master(char *page, char **start,
16433 +				    off_t off, int count,
16434 +				    int *eof, void *data)
16435 +{
16436 +	int len, master;
16437 +	master = atomic_read(&release_master_cpu);
16438 +	if (master == NO_CPU)
16439 +		len = snprintf(page, PAGE_SIZE, "NO_CPU\n");
16440 +	else
16441 +		len = snprintf(page, PAGE_SIZE, "%d\n", master);
16442 +	return len;
16443 +}
16444 +
16445 +static int proc_write_release_master(struct file *file,
16446 +				     const char *buffer,
16447 +				     unsigned long count,
16448 +				     void *data)
16449 +{
16450 +	int cpu, err, len, online = 0;
16451 +	char msg[64];
16452 +
16453 +	len = copy_and_chomp(msg, sizeof(msg), buffer, count);
16454 +
16455 +	if (len < 0)
16456 +		return len;
16457 +
16458 +	if (strcmp(msg, "NO_CPU") == 0)
16459 +		atomic_set(&release_master_cpu, NO_CPU);
16460 +	else {
16461 +		err = sscanf(msg, "%d", &cpu);
16462 +		if (err == 1 && cpu >= 0 && (online = cpu_online(cpu))) {
16463 +			atomic_set(&release_master_cpu, cpu);
16464 +		} else {
16465 +			TRACE("invalid release master: '%s' "
16466 +			      "(err:%d cpu:%d online:%d)\n",
16467 +			      msg, err, cpu, online);
16468 +			len = -EINVAL;
16469 +		}
16470 +	}
16471 +	return len;
16472 +}
16473 +#endif
16474 +
16475 +int __init init_litmus_proc(void)
16476 +{
16477 +	litmus_dir = proc_mkdir("litmus", NULL);
16478 +	if (!litmus_dir) {
16479 +		printk(KERN_ERR "Could not allocate LITMUS^RT procfs entry.\n");
16480 +		return -ENOMEM;
16481 +	}
16482 +
16483 +	curr_file = create_proc_entry("active_plugin",
16484 +				      0644, litmus_dir);
16485 +	if (!curr_file) {
16486 +		printk(KERN_ERR "Could not allocate active_plugin "
16487 +		       "procfs entry.\n");
16488 +		return -ENOMEM;
16489 +	}
16490 +	curr_file->read_proc  = proc_read_curr;
16491 +	curr_file->write_proc = proc_write_curr;
16492 +
16493 +#ifdef CONFIG_RELEASE_MASTER
16494 +	release_master_file = create_proc_entry("release_master",
16495 +						0644, litmus_dir);
16496 +	if (!release_master_file) {
16497 +		printk(KERN_ERR "Could not allocate release_master "
16498 +		       "procfs entry.\n");
16499 +		return -ENOMEM;
16500 +	}
16501 +	release_master_file->read_proc = proc_read_release_master;
16502 +	release_master_file->write_proc  = proc_write_release_master;
16503 +#endif
16504 +
16505 +#ifdef CONFIG_LITMUS_SOFTIRQD
16506 +	klitirqd_file =
16507 +		create_proc_read_entry("klitirqd_stats", 0444, litmus_dir,
16508 +							   proc_read_klitirqd_stats, NULL);
16509 +#endif
16510 +
16511 +	stat_file = create_proc_read_entry("stats", 0444, litmus_dir,
16512 +					   proc_read_stats, NULL);
16513 +
16514 +	plugs_dir = proc_mkdir("plugins", litmus_dir);
16515 +	if (!plugs_dir){
16516 +		printk(KERN_ERR "Could not allocate plugins directory "
16517 +				"procfs entry.\n");
16518 +		return -ENOMEM;
16519 +	}
16520 +
16521 +	plugs_file = create_proc_read_entry("loaded", 0444, plugs_dir,
16522 +					   proc_read_plugins, NULL);
16523 +
16524 +	return 0;
16525 +}
16526 +
16527 +void exit_litmus_proc(void)
16528 +{
16529 +	if (plugs_file)
16530 +		remove_proc_entry("loaded", plugs_dir);
16531 +	if (plugs_dir)
16532 +		remove_proc_entry("plugins", litmus_dir);
16533 +	if (stat_file)
16534 +		remove_proc_entry("stats", litmus_dir);
16535 +	if (curr_file)
16536 +		remove_proc_entry("active_plugin", litmus_dir);
16537 +#ifdef CONFIG_LITMUS_SOFTIRQD
16538 +	if (klitirqd_file)
16539 +		remove_proc_entry("klitirqd_stats", litmus_dir);
16540 +#endif
16541 +#ifdef CONFIG_RELEASE_MASTER
16542 +	if (release_master_file)
16543 +		remove_proc_entry("release_master", litmus_dir);
16544 +#endif
16545 +	if (litmus_dir)
16546 +		remove_proc_entry("litmus", NULL);
16547 +}
16548 +
16549 +long make_plugin_proc_dir(struct sched_plugin* plugin,
16550 +		struct proc_dir_entry** pde_in)
16551 +{
16552 +	struct proc_dir_entry *pde_new = NULL;
16553 +	long rv;
16554 +
16555 +	if (!plugin || !plugin->plugin_name){
16556 +		printk(KERN_ERR "Invalid plugin struct passed to %s.\n",
16557 +				__func__);
16558 +		rv = -EINVAL;
16559 +		goto out_no_pde;
16560 +	}
16561 +
16562 +	if (!plugs_dir){
16563 +		printk(KERN_ERR "Could not make plugin sub-directory, because "
16564 +				"/proc/litmus/plugins does not exist.\n");
16565 +		rv = -ENOENT;
16566 +		goto out_no_pde;
16567 +	}
16568 +
16569 +	pde_new = proc_mkdir(plugin->plugin_name, plugs_dir);
16570 +	if (!pde_new){
16571 +		printk(KERN_ERR "Could not make plugin sub-directory: "
16572 +				"out of memory?.\n");
16573 +		rv = -ENOMEM;
16574 +		goto out_no_pde;
16575 +	}
16576 +
16577 +	rv = 0;
16578 +	*pde_in = pde_new;
16579 +	goto out_ok;
16580 +
16581 +out_no_pde:
16582 +	*pde_in = NULL;
16583 +out_ok:
16584 +	return rv;
16585 +}
16586 +
16587 +void remove_plugin_proc_dir(struct sched_plugin* plugin)
16588 +{
16589 +	if (!plugin || !plugin->plugin_name){
16590 +		printk(KERN_ERR "Invalid plugin struct passed to %s.\n",
16591 +				__func__);
16592 +		return;
16593 +	}
16594 +	remove_proc_entry(plugin->plugin_name, plugs_dir);
16595 +}
16596 +
16597 +
16598 +
16599 +/* misc. I/O helper functions */
16600 +
16601 +int copy_and_chomp(char *kbuf, unsigned long ksize,
16602 +		   __user const char* ubuf, unsigned long ulength)
16603 +{
16604 +	/* caller must provide buffer space */
16605 +	BUG_ON(!ksize);
16606 +
16607 +	ksize--; /* leave space for null byte */
16608 +
16609 +	if (ksize > ulength)
16610 +		ksize = ulength;
16611 +
16612 +	if(copy_from_user(kbuf, ubuf, ksize))
16613 +		return -EFAULT;
16614 +
16615 +	kbuf[ksize] = '\0';
16616 +
16617 +	/* chomp kbuf */
16618 +	if (ksize > 0 && kbuf[ksize - 1] == '\n')
16619 +		kbuf[ksize - 1] = '\0';
16620 +
16621 +	return ksize;
16622 +}
16623 +
16624 +/* helper functions for clustered plugins */
16625 +static const char* cache_level_names[] = {
16626 +	"ALL",
16627 +	"L1",
16628 +	"L2",
16629 +	"L3",
16630 +};
16631 +
16632 +int parse_cache_level(const char *cache_name, enum cache_level *level)
16633 +{
16634 +	int err = -EINVAL;
16635 +	int i;
16636 +	/* do a quick and dirty comparison to find the cluster size */
16637 +	for (i = GLOBAL_CLUSTER; i <= L3_CLUSTER; i++)
16638 +		if (!strcmp(cache_name, cache_level_names[i])) {
16639 +			*level = (enum cache_level) i;
16640 +			err = 0;
16641 +			break;
16642 +		}
16643 +	return err;
16644 +}
16645 +
16646 +const char* cache_level_name(enum cache_level level)
16647 +{
16648 +	int idx = level;
16649 +
16650 +	if (idx >= GLOBAL_CLUSTER && idx <= L3_CLUSTER)
16651 +		return cache_level_names[idx];
16652 +	else
16653 +		return "INVALID";
16654 +}
16655 +
16656 +
16657 +/* proc file interface to configure the cluster size */
16658 +static int proc_read_cluster_size(char *page, char **start,
16659 +				  off_t off, int count,
16660 +				  int *eof, void *data)
16661 +{
16662 +	return snprintf(page, PAGE_SIZE, "%s\n",
16663 +			cache_level_name(*((enum cache_level*) data)));;
16664 +}
16665 +
16666 +static int proc_write_cluster_size(struct file *file,
16667 +				   const char *buffer,
16668 +				   unsigned long count,
16669 +				   void *data)
16670 +{
16671 +	int len;
16672 +	char cache_name[8];
16673 +
16674 +	len = copy_and_chomp(cache_name, sizeof(cache_name), buffer, count);
16675 +
16676 +	if (len > 0 && parse_cache_level(cache_name, (enum cache_level*) data))
16677 +		printk(KERN_INFO "Cluster '%s' is unknown.\n", cache_name);
16678 +
16679 +	return len;
16680 +}
16681 +
16682 +struct proc_dir_entry* create_cluster_file(struct proc_dir_entry* parent,
16683 +					   enum cache_level* level)
16684 +{
16685 +	struct proc_dir_entry* cluster_file;
16686 +
16687 +	cluster_file = create_proc_entry("cluster", 0644, parent);
16688 +	if (!cluster_file) {
16689 +		printk(KERN_ERR "Could not allocate %s/cluster "
16690 +		       "procfs entry.\n", parent->name);
16691 +	} else {
16692 +		cluster_file->read_proc = proc_read_cluster_size;
16693 +		cluster_file->write_proc = proc_write_cluster_size;
16694 +		cluster_file->data = level;
16695 +	}
16696 +	return cluster_file;
16697 +}
16698 +
16699 diff --git a/litmus/litmus_softirq.c b/litmus/litmus_softirq.c
16700 new file mode 100644
16701 index 0000000..9f7d9da
16702 --- /dev/null
16703 +++ b/litmus/litmus_softirq.c
16704 @@ -0,0 +1,1582 @@
16705 +#include <linux/interrupt.h>
16706 +#include <linux/percpu.h>
16707 +#include <linux/cpu.h>
16708 +#include <linux/kthread.h>
16709 +#include <linux/ftrace.h>
16710 +#include <linux/smp.h>
16711 +#include <linux/slab.h>
16712 +#include <linux/mutex.h>
16713 +
16714 +#include <linux/sched.h>
16715 +#include <linux/cpuset.h>
16716 +
16717 +#include <litmus/litmus.h>
16718 +#include <litmus/sched_trace.h>
16719 +#include <litmus/jobs.h>
16720 +#include <litmus/sched_plugin.h>
16721 +#include <litmus/litmus_softirq.h>
16722 +
16723 +/* TODO: Remove unneeded mb() and other barriers. */
16724 +
16725 +
16726 +/* counts number of daemons ready to handle litmus irqs. */
16727 +static atomic_t num_ready_klitirqds = ATOMIC_INIT(0);
16728 +
16729 +enum pending_flags
16730 +{
16731 +    LIT_TASKLET_LOW = 0x1,
16732 +    LIT_TASKLET_HI  = LIT_TASKLET_LOW<<1,
16733 +	LIT_WORK = LIT_TASKLET_HI<<1
16734 +};
16735 +
16736 +/* only support tasklet processing for now. */
16737 +struct tasklet_head
16738 +{
16739 +	struct tasklet_struct *head;
16740 +	struct tasklet_struct **tail;
16741 +};
16742 +
16743 +struct klitirqd_info
16744 +{
16745 +	struct task_struct*		klitirqd;
16746 +    struct task_struct*     current_owner;
16747 +    int						terminating;
16748 +
16749 +
16750 +	raw_spinlock_t			lock;
16751 +
16752 +	u32						pending;
16753 +	atomic_t				num_hi_pending;
16754 +	atomic_t				num_low_pending;
16755 +	atomic_t				num_work_pending;
16756 +
16757 +	/* in order of priority */
16758 +	struct tasklet_head     pending_tasklets_hi;
16759 +	struct tasklet_head		pending_tasklets;
16760 +	struct list_head		worklist;
16761 +};
16762 +
16763 +/* one list for each klitirqd */
16764 +static struct klitirqd_info klitirqds[NR_LITMUS_SOFTIRQD];
16765 +
16766 +
16767 +
16768 +
16769 +
16770 +int proc_read_klitirqd_stats(char *page, char **start,
16771 +							 off_t off, int count,
16772 +							 int *eof, void *data)
16773 +{
16774 +	int len = snprintf(page, PAGE_SIZE,
16775 +				"num ready klitirqds: %d\n\n",
16776 +				atomic_read(&num_ready_klitirqds));
16777 +
16778 +	if(klitirqd_is_ready())
16779 +	{
16780 +		int i;
16781 +		for(i = 0; i < NR_LITMUS_SOFTIRQD; ++i)
16782 +		{
16783 +			len +=
16784 +				snprintf(page + len - 1, PAGE_SIZE, /* -1 to strip off \0 */
16785 +						 "klitirqd_th%d: %s/%d\n"
16786 +						 "\tcurrent_owner: %s/%d\n"
16787 +						 "\tpending: %x\n"
16788 +						 "\tnum hi: %d\n"
16789 +						 "\tnum low: %d\n"
16790 +						 "\tnum work: %d\n\n",
16791 +						 i,
16792 +						 klitirqds[i].klitirqd->comm, klitirqds[i].klitirqd->pid,
16793 +						 (klitirqds[i].current_owner != NULL) ?
16794 +						 	klitirqds[i].current_owner->comm : "(null)",
16795 +						 (klitirqds[i].current_owner != NULL) ?
16796 +							klitirqds[i].current_owner->pid : 0,
16797 +						 klitirqds[i].pending,
16798 +						 atomic_read(&klitirqds[i].num_hi_pending),
16799 +						 atomic_read(&klitirqds[i].num_low_pending),
16800 +						 atomic_read(&klitirqds[i].num_work_pending));
16801 +		}
16802 +	}
16803 +
16804 +	return(len);
16805 +}
16806 +
16807 +
16808 +
16809 +
16810 +
16811 +#if 0
16812 +static atomic_t dump_id = ATOMIC_INIT(0);
16813 +
16814 +static void __dump_state(struct klitirqd_info* which, const char* caller)
16815 +{
16816 +	struct tasklet_struct* list;
16817 +
16818 +	int id = atomic_inc_return(&dump_id);
16819 +
16820 +	//if(in_interrupt())
16821 +	{
16822 +		if(which->current_owner)
16823 +		{
16824 +			TRACE("(id: %d  caller: %s)\n"
16825 +				"klitirqd: %s/%d\n"
16826 +				"current owner: %s/%d\n"
16827 +				"pending: %x\n",
16828 +				id, caller,
16829 +				which->klitirqd->comm, which->klitirqd->pid,
16830 +				which->current_owner->comm, which->current_owner->pid,
16831 +				which->pending);
16832 +		}
16833 +		else
16834 +		{
16835 +			TRACE("(id: %d  caller: %s)\n"
16836 +				"klitirqd: %s/%d\n"
16837 +				"current owner: %p\n"
16838 +				"pending: %x\n",
16839 +				id, caller,
16840 +				which->klitirqd->comm, which->klitirqd->pid,
16841 +				NULL,
16842 +				which->pending);
16843 +		}
16844 +
16845 +		list = which->pending_tasklets.head;
16846 +		while(list)
16847 +		{
16848 +			struct tasklet_struct *t = list;
16849 +			list = list->next; /* advance */
16850 +			if(t->owner)
16851 +				TRACE("(id: %d  caller: %s) Tasklet: %x, Owner = %s/%d\n", id, caller, t, t->owner->comm, t->owner->pid);
16852 +			else
16853 +				TRACE("(id: %d  caller: %s) Tasklet: %x, Owner = %p\n", id, caller, t, NULL);
16854 +		}
16855 +	}
16856 +}
16857 +
16858 +static void dump_state(struct klitirqd_info* which, const char* caller)
16859 +{
16860 +	unsigned long flags;
16861 +
16862 +	raw_spin_lock_irqsave(&which->lock, flags);
16863 +    __dump_state(which, caller);
16864 +    raw_spin_unlock_irqrestore(&which->lock, flags);
16865 +}
16866 +#endif
16867 +
16868 +
16869 +/* forward declarations */
16870 +static void ___litmus_tasklet_schedule(struct tasklet_struct *t,
16871 +									   struct klitirqd_info *which,
16872 +									   int wakeup);
16873 +static void ___litmus_tasklet_hi_schedule(struct tasklet_struct *t,
16874 +										  struct klitirqd_info *which,
16875 +										  int wakeup);
16876 +static void ___litmus_schedule_work(struct work_struct *w,
16877 +									struct klitirqd_info *which,
16878 +									int wakeup);
16879 +
16880 +
16881 +
16882 +inline unsigned int klitirqd_id(struct task_struct* tsk)
16883 +{
16884 +    int i;
16885 +    for(i = 0; i < NR_LITMUS_SOFTIRQD; ++i)
16886 +    {
16887 +        if(klitirqds[i].klitirqd == tsk)
16888 +        {
16889 +            return i;
16890 +        }
16891 +    }
16892 +
16893 +    BUG();
16894 +
16895 +    return 0;
16896 +}
16897 +
16898 +
16899 +inline static u32 litirq_pending_hi_irqoff(struct klitirqd_info* which)
16900 +{
16901 +    return (which->pending & LIT_TASKLET_HI);
16902 +}
16903 +
16904 +inline static u32 litirq_pending_low_irqoff(struct klitirqd_info* which)
16905 +{
16906 +    return (which->pending & LIT_TASKLET_LOW);
16907 +}
16908 +
16909 +inline static u32 litirq_pending_work_irqoff(struct klitirqd_info* which)
16910 +{
16911 +	return (which->pending & LIT_WORK);
16912 +}
16913 +
16914 +inline static u32 litirq_pending_irqoff(struct klitirqd_info* which)
16915 +{
16916 +    return(which->pending);
16917 +}
16918 +
16919 +
16920 +inline static u32 litirq_pending(struct klitirqd_info* which)
16921 +{
16922 +    unsigned long flags;
16923 +    u32 pending;
16924 +
16925 +    raw_spin_lock_irqsave(&which->lock, flags);
16926 +    pending = litirq_pending_irqoff(which);
16927 +    raw_spin_unlock_irqrestore(&which->lock, flags);
16928 +
16929 +    return pending;
16930 +};
16931 +
16932 +inline static u32 litirq_pending_with_owner(struct klitirqd_info* which, struct task_struct* owner)
16933 +{
16934 +	unsigned long flags;
16935 +	u32 pending;
16936 +
16937 +	raw_spin_lock_irqsave(&which->lock, flags);
16938 +	pending = litirq_pending_irqoff(which);
16939 +	if(pending)
16940 +	{
16941 +		if(which->current_owner != owner)
16942 +		{
16943 +			pending = 0;  // owner switch!
16944 +		}
16945 +	}
16946 +	raw_spin_unlock_irqrestore(&which->lock, flags);
16947 +
16948 +	return pending;
16949 +}
16950 +
16951 +
16952 +inline static u32 litirq_pending_and_sem_and_owner(struct klitirqd_info* which,
16953 +				struct mutex** sem,
16954 +				struct task_struct** t)
16955 +{
16956 +	unsigned long flags;
16957 +	u32 pending;
16958 +
16959 +	/* init values */
16960 +	*sem = NULL;
16961 +	*t = NULL;
16962 +
16963 +	raw_spin_lock_irqsave(&which->lock, flags);
16964 +
16965 +	pending = litirq_pending_irqoff(which);
16966 +	if(pending)
16967 +	{
16968 +		if(which->current_owner != NULL)
16969 +		{
16970 +			*t = which->current_owner;
16971 +			*sem = &tsk_rt(which->current_owner)->klitirqd_sem;
16972 +		}
16973 +		else
16974 +		{
16975 +			BUG();
16976 +		}
16977 +	}
16978 +	raw_spin_unlock_irqrestore(&which->lock, flags);
16979 +
16980 +	if(likely(*sem))
16981 +	{
16982 +		return pending;
16983 +	}
16984 +	else
16985 +	{
16986 +		return 0;
16987 +	}
16988 +}
16989 +
16990 +/* returns true if the next piece of work to do is from a different owner.
16991 + */
16992 +static int tasklet_ownership_change(
16993 +				struct klitirqd_info* which,
16994 +				enum pending_flags taskletQ)
16995 +{
16996 +	/* this function doesn't have to look at work objects since they have
16997 +	   priority below tasklets. */
16998 +
16999 +    unsigned long flags;
17000 +    int ret = 0;
17001 +
17002 +    raw_spin_lock_irqsave(&which->lock, flags);
17003 +
17004 +	switch(taskletQ)
17005 +	{
17006 +	case LIT_TASKLET_HI:
17007 +		if(litirq_pending_hi_irqoff(which))
17008 +		{
17009 +			ret = (which->pending_tasklets_hi.head->owner !=
17010 +						which->current_owner);
17011 +		}
17012 +		break;
17013 +	case LIT_TASKLET_LOW:
17014 +		if(litirq_pending_low_irqoff(which))
17015 +		{
17016 +			ret = (which->pending_tasklets.head->owner !=
17017 +						which->current_owner);
17018 +		}
17019 +		break;
17020 +	default:
17021 +		break;
17022 +	}
17023 +
17024 +    raw_spin_unlock_irqrestore(&which->lock, flags);
17025 +
17026 +    TRACE_TASK(which->klitirqd, "ownership change needed: %d\n", ret);
17027 +
17028 +    return ret;
17029 +}
17030 +
17031 +
17032 +static void __reeval_prio(struct klitirqd_info* which)
17033 +{
17034 +    struct task_struct* next_owner = NULL;
17035 +	struct task_struct* klitirqd = which->klitirqd;
17036 +
17037 +	/* Check in prio-order */
17038 +	u32 pending = litirq_pending_irqoff(which);
17039 +
17040 +	//__dump_state(which, "__reeval_prio: before");
17041 +
17042 +	if(pending)
17043 +	{
17044 +		if(pending & LIT_TASKLET_HI)
17045 +		{
17046 +			next_owner = which->pending_tasklets_hi.head->owner;
17047 +		}
17048 +		else if(pending & LIT_TASKLET_LOW)
17049 +		{
17050 +			next_owner = which->pending_tasklets.head->owner;
17051 +		}
17052 +		else if(pending & LIT_WORK)
17053 +		{
17054 +			struct work_struct* work =
17055 +				list_first_entry(&which->worklist, struct work_struct, entry);
17056 +			next_owner = work->owner;
17057 +		}
17058 +	}
17059 +
17060 +	if(next_owner != which->current_owner)
17061 +	{
17062 +		struct task_struct* old_owner = which->current_owner;
17063 +
17064 +		/* bind the next owner. */
17065 +		which->current_owner = next_owner;
17066 +		mb();
17067 +
17068 +        if(next_owner != NULL)
17069 +        {
17070 +			if(!in_interrupt())
17071 +			{
17072 +				TRACE_CUR("%s: Ownership change: %s/%d to %s/%d\n", __FUNCTION__,
17073 +						((tsk_rt(klitirqd)->inh_task) ? tsk_rt(klitirqd)->inh_task : klitirqd)->comm,
17074 +						((tsk_rt(klitirqd)->inh_task) ? tsk_rt(klitirqd)->inh_task : klitirqd)->pid,
17075 +						next_owner->comm, next_owner->pid);
17076 +			}
17077 +			else
17078 +			{
17079 +				TRACE("%s: Ownership change: %s/%d to %s/%d\n", __FUNCTION__,
17080 +					((tsk_rt(klitirqd)->inh_task) ? tsk_rt(klitirqd)->inh_task : klitirqd)->comm,
17081 +					((tsk_rt(klitirqd)->inh_task) ? tsk_rt(klitirqd)->inh_task : klitirqd)->pid,
17082 +					next_owner->comm, next_owner->pid);
17083 +			}
17084 +
17085 +			litmus->increase_prio_inheritance_klitirqd(klitirqd, old_owner, next_owner);
17086 +        }
17087 +        else
17088 +        {
17089 +			if(likely(!in_interrupt()))
17090 +			{
17091 +				TRACE_CUR("%s: Ownership change: %s/%d to NULL (reverting)\n",
17092 +						__FUNCTION__, klitirqd->comm, klitirqd->pid);
17093 +			}
17094 +			else
17095 +			{
17096 +				// is this a bug?
17097 +				TRACE("%s: Ownership change: %s/%d to NULL (reverting)\n",
17098 +					__FUNCTION__, klitirqd->comm, klitirqd->pid);
17099 +			}
17100 +
17101 +			BUG_ON(pending != 0);
17102 +			litmus->decrease_prio_inheritance_klitirqd(klitirqd, old_owner, NULL);
17103 +        }
17104 +    }
17105 +
17106 +	//__dump_state(which, "__reeval_prio: after");
17107 +}
17108 +
17109 +static void reeval_prio(struct klitirqd_info* which)
17110 +{
17111 +    unsigned long flags;
17112 +
17113 +    raw_spin_lock_irqsave(&which->lock, flags);
17114 +    __reeval_prio(which);
17115 +    raw_spin_unlock_irqrestore(&which->lock, flags);
17116 +}
17117 +
17118 +
17119 +static void wakeup_litirqd_locked(struct klitirqd_info* which)
17120 +{
17121 +	/* Interrupts are disabled: no need to stop preemption */
17122 +	if (which && which->klitirqd)
17123 +	{
17124 +        __reeval_prio(which); /* configure the proper priority */
17125 +
17126 +		if(which->klitirqd->state != TASK_RUNNING)
17127 +		{
17128 +        	TRACE("%s: Waking up klitirqd: %s/%d\n", __FUNCTION__,
17129 +			  	which->klitirqd->comm, which->klitirqd->pid);
17130 +
17131 +			wake_up_process(which->klitirqd);
17132 +		}
17133 +    }
17134 +}
17135 +
17136 +
17137 +static void do_lit_tasklet(struct klitirqd_info* which,
17138 +						   struct tasklet_head* pending_tasklets)
17139 +{
17140 +    unsigned long flags;
17141 +	struct tasklet_struct *list;
17142 +	atomic_t* count;
17143 +
17144 +    raw_spin_lock_irqsave(&which->lock, flags);
17145 +
17146 +	//__dump_state(which, "do_lit_tasklet: before steal");
17147 +
17148 +	/* copy out the tasklets for our private use. */
17149 +	list = pending_tasklets->head;
17150 +	pending_tasklets->head = NULL;
17151 +	pending_tasklets->tail = &pending_tasklets->head;
17152 +
17153 +	/* remove pending flag */
17154 +	which->pending &= (pending_tasklets == &which->pending_tasklets) ?
17155 +		~LIT_TASKLET_LOW :
17156 +		~LIT_TASKLET_HI;
17157 +
17158 +	count = (pending_tasklets == &which->pending_tasklets) ?
17159 +		&which->num_low_pending:
17160 +		&which->num_hi_pending;
17161 +
17162 +	//__dump_state(which, "do_lit_tasklet: after steal");
17163 +
17164 +    raw_spin_unlock_irqrestore(&which->lock, flags);
17165 +
17166 +
17167 +    while(list)
17168 +    {
17169 +        struct tasklet_struct *t = list;
17170 +
17171 +        /* advance, lest we forget */
17172 +		list = list->next;
17173 +
17174 +        /* execute tasklet if it has my priority and is free */
17175 +		if ((t->owner == which->current_owner) && tasklet_trylock(t)) {
17176 +			if (!atomic_read(&t->count)) {
17177 +
17178 +				sched_trace_tasklet_begin(t->owner);
17179 +
17180 +				if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state))
17181 +                {
17182 +					BUG();
17183 +                }
17184 +                TRACE_CUR("%s: Invoking tasklet.\n", __FUNCTION__);
17185 +				t->func(t->data);
17186 +				tasklet_unlock(t);
17187 +
17188 +				atomic_dec(count);
17189 +
17190 +				sched_trace_tasklet_end(t->owner, 0ul);
17191 +
17192 +				continue;  /* process more tasklets */
17193 +			}
17194 +			tasklet_unlock(t);
17195 +		}
17196 +
17197 +        TRACE_CUR("%s: Could not invoke tasklet.  Requeuing.\n", __FUNCTION__);
17198 +
17199 +		/* couldn't process tasklet.  put it back at the end of the queue. */
17200 +		if(pending_tasklets == &which->pending_tasklets)
17201 +			___litmus_tasklet_schedule(t, which, 0);
17202 +		else
17203 +			___litmus_tasklet_hi_schedule(t, which, 0);
17204 +    }
17205 +}
17206 +
17207 +
17208 +// returns 1 if priorities need to be changed to continue processing
17209 +// pending tasklets.
17210 +static int do_litirq(struct klitirqd_info* which)
17211 +{
17212 +    u32 pending;
17213 +    int resched = 0;
17214 +
17215 +    if(in_interrupt())
17216 +    {
17217 +        TRACE("%s: exiting early: in interrupt context!\n", __FUNCTION__);
17218 +        return(0);
17219 +    }
17220 +
17221 +	if(which->klitirqd != current)
17222 +	{
17223 +        TRACE_CUR("%s: exiting early: thread/info mismatch! Running %s/%d but given %s/%d.\n",
17224 +				  __FUNCTION__, current->comm, current->pid,
17225 +				  which->klitirqd->comm, which->klitirqd->pid);
17226 +        return(0);
17227 +	}
17228 +
17229 +    if(!is_realtime(current))
17230 +    {
17231 +        TRACE_CUR("%s: exiting early: klitirqd is not real-time. Sched Policy = %d\n",
17232 +				  __FUNCTION__, current->policy);
17233 +        return(0);
17234 +    }
17235 +
17236 +
17237 +    /* We only handle tasklets & work objects, no need for RCU triggers? */
17238 +
17239 +    pending = litirq_pending(which);
17240 +    if(pending)
17241 +    {
17242 +        /* extract the work to do and do it! */
17243 +        if(pending & LIT_TASKLET_HI)
17244 +        {
17245 +            TRACE_CUR("%s: Invoking HI tasklets.\n", __FUNCTION__);
17246 +            do_lit_tasklet(which, &which->pending_tasklets_hi);
17247 +            resched = tasklet_ownership_change(which, LIT_TASKLET_HI);
17248 +
17249 +            if(resched)
17250 +            {
17251 +                TRACE_CUR("%s: HI tasklets of another owner remain. "
17252 +						  "Skipping any LOW tasklets.\n", __FUNCTION__);
17253 +            }
17254 +        }
17255 +
17256 +        if(!resched && (pending & LIT_TASKLET_LOW))
17257 +        {
17258 +            TRACE_CUR("%s: Invoking LOW tasklets.\n", __FUNCTION__);
17259 +			do_lit_tasklet(which, &which->pending_tasklets);
17260 +			resched = tasklet_ownership_change(which, LIT_TASKLET_LOW);
17261 +
17262 +            if(resched)
17263 +            {
17264 +                TRACE_CUR("%s: LOW tasklets of another owner remain. "
17265 +						  "Skipping any work objects.\n", __FUNCTION__);
17266 +            }
17267 +        }
17268 +    }
17269 +
17270 +	return(resched);
17271 +}
17272 +
17273 +
17274 +static void do_work(struct klitirqd_info* which)
17275 +{
17276 +	unsigned long flags;
17277 +	work_func_t f;
17278 +	struct work_struct* work;
17279 +
17280 +	// only execute one work-queue item to yield to tasklets.
17281 +	// ...is this a good idea, or should we just batch them?
17282 +	raw_spin_lock_irqsave(&which->lock, flags);
17283 +
17284 +	if(!litirq_pending_work_irqoff(which))
17285 +	{
17286 +		raw_spin_unlock_irqrestore(&which->lock, flags);
17287 +		goto no_work;
17288 +	}
17289 +
17290 +	work = list_first_entry(&which->worklist, struct work_struct, entry);
17291 +	list_del_init(&work->entry);
17292 +
17293 +	if(list_empty(&which->worklist))
17294 +	{
17295 +		which->pending &= ~LIT_WORK;
17296 +	}
17297 +
17298 +	raw_spin_unlock_irqrestore(&which->lock, flags);
17299 +
17300 +
17301 +
17302 +	/* safe to read current_owner outside of lock since only this thread
17303 +	 may write to the pointer. */
17304 +	if(work->owner == which->current_owner)
17305 +	{
17306 +		TRACE_CUR("%s: Invoking work object.\n", __FUNCTION__);
17307 +		// do the work!
17308 +		work_clear_pending(work);
17309 +		f = work->func;
17310 +		f(work);  /* can't touch 'work' after this point,
17311 +				   the user may have freed it. */
17312 +
17313 +		atomic_dec(&which->num_work_pending);
17314 +	}
17315 +	else
17316 +	{
17317 +		TRACE_CUR("%s: Could not invoke work object.  Requeuing.\n",
17318 +				  __FUNCTION__);
17319 +		___litmus_schedule_work(work, which, 0);
17320 +	}
17321 +
17322 +no_work:
17323 +	return;
17324 +}
17325 +
17326 +
17327 +static int set_litmus_daemon_sched(void)
17328 +{
17329 +    /* set up a daemon job that will never complete.
17330 +       it should only ever run on behalf of another
17331 +       real-time task.
17332 +
17333 +       TODO: Transition to a new job whenever a
17334 +       new tasklet is handled */
17335 +
17336 +    int ret = 0;
17337 +
17338 +	struct rt_task tp = {
17339 +		.exec_cost = 0,
17340 +		.period = 1000000000, /* dummy 1 second period */
17341 +		.phase = 0,
17342 +		.cpu = task_cpu(current),
17343 +		.budget_policy = NO_ENFORCEMENT,
17344 +		.cls = RT_CLASS_BEST_EFFORT
17345 +	};
17346 +
17347 +	struct sched_param param = { .sched_priority = 0};
17348 +
17349 +
17350 +	/* set task params, mark as proxy thread, and init other data */
17351 +	tsk_rt(current)->task_params = tp;
17352 +	tsk_rt(current)->is_proxy_thread = 1;
17353 +	tsk_rt(current)->cur_klitirqd = NULL;
17354 +	mutex_init(&tsk_rt(current)->klitirqd_sem);
17355 +	atomic_set(&tsk_rt(current)->klitirqd_sem_stat, NOT_HELD);
17356 +
17357 +	/* inform the OS we're SCHED_LITMUS --
17358 +	   sched_setscheduler_nocheck() calls litmus_admit_task(). */
17359 +	sched_setscheduler_nocheck(current, SCHED_LITMUS, &param);
17360 +
17361 +    return ret;
17362 +}
17363 +
17364 +static void enter_execution_phase(struct klitirqd_info* which,
17365 +								  struct mutex* sem,
17366 +								  struct task_struct* t)
17367 +{
17368 +	TRACE_CUR("%s: Trying to enter execution phase. "
17369 +			  "Acquiring semaphore of %s/%d\n", __FUNCTION__,
17370 +			  t->comm, t->pid);
17371 +	down_and_set_stat(current, HELD, sem);
17372 +	TRACE_CUR("%s: Execution phase entered! "
17373 +			  "Acquired semaphore of %s/%d\n", __FUNCTION__,
17374 +			  t->comm, t->pid);
17375 +}
17376 +
17377 +static void exit_execution_phase(struct klitirqd_info* which,
17378 +								 struct mutex* sem,
17379 +								 struct task_struct* t)
17380 +{
17381 +	TRACE_CUR("%s: Exiting execution phase. "
17382 +			  "Releasing semaphore of %s/%d\n", __FUNCTION__,
17383 +			  t->comm, t->pid);
17384 +	if(atomic_read(&tsk_rt(current)->klitirqd_sem_stat) == HELD)
17385 +	{
17386 +		up_and_set_stat(current, NOT_HELD, sem);
17387 +		TRACE_CUR("%s: Execution phase exited! "
17388 +				  "Released semaphore of %s/%d\n", __FUNCTION__,
17389 +				  t->comm, t->pid);
17390 +	}
17391 +	else
17392 +	{
17393 +		TRACE_CUR("%s: COULDN'T RELEASE SEMAPHORE BECAUSE ONE IS NOT HELD!\n", __FUNCTION__);
17394 +	}
17395 +}
17396 +
17397 +/* main loop for klitsoftirqd */
17398 +static int run_klitirqd(void* unused)
17399 +{
17400 +	struct klitirqd_info* which = &klitirqds[klitirqd_id(current)];
17401 +	struct mutex* sem;
17402 +	struct task_struct* owner;
17403 +
17404 +    int rt_status = set_litmus_daemon_sched();
17405 +
17406 +    if(rt_status != 0)
17407 +    {
17408 +        TRACE_CUR("%s: Failed to transition to rt-task.\n", __FUNCTION__);
17409 +        goto rt_failed;
17410 +    }
17411 +
17412 +	atomic_inc(&num_ready_klitirqds);
17413 +
17414 +	set_current_state(TASK_INTERRUPTIBLE);
17415 +
17416 +	while (!kthread_should_stop())
17417 +	{
17418 +		preempt_disable();
17419 +		if (!litirq_pending(which))
17420 +		{
17421 +            /* sleep for work */
17422 +            TRACE_CUR("%s: No more tasklets or work objects. Going to sleep.\n",
17423 +					  __FUNCTION__);
17424 +			preempt_enable_no_resched();
17425 +            schedule();
17426 +
17427 +			if(kthread_should_stop()) /* bail out */
17428 +			{
17429 +				TRACE_CUR("%s:%d: Signaled to terminate.\n", __FUNCTION__, __LINE__);
17430 +				continue;
17431 +			}
17432 +
17433 +			preempt_disable();
17434 +		}
17435 +
17436 +		__set_current_state(TASK_RUNNING);
17437 +
17438 +		while (litirq_pending_and_sem_and_owner(which, &sem, &owner))
17439 +		{
17440 +			int needs_resched = 0;
17441 +
17442 +			preempt_enable_no_resched();
17443 +
17444 +			BUG_ON(sem == NULL);
17445 +
17446 +			// wait to enter execution phase; wait for 'current_owner' to block.
17447 +			enter_execution_phase(which, sem, owner);
17448 +
17449 +			if(kthread_should_stop())
17450 +			{
17451 +				TRACE_CUR("%s:%d: Signaled to terminate.\n", __FUNCTION__, __LINE__);
17452 +				break;
17453 +			}
17454 +
17455 +			preempt_disable();
17456 +
17457 +			/* Double check that there's still pending work and the owner hasn't
17458 +			 * changed. Pending items may have been flushed while we were sleeping.
17459 +			 */
17460 +			if(litirq_pending_with_owner(which, owner))
17461 +			{
17462 +				TRACE_CUR("%s: Executing tasklets and/or work objects.\n",
17463 +						  __FUNCTION__);
17464 +
17465 +				needs_resched = do_litirq(which);
17466 +
17467 +				preempt_enable_no_resched();
17468 +
17469 +				// work objects are preemptible.
17470 +				if(!needs_resched)
17471 +				{
17472 +					do_work(which);
17473 +				}
17474 +
17475 +				// exit execution phase.
17476 +				exit_execution_phase(which, sem, owner);
17477 +
17478 +				TRACE_CUR("%s: Setting up next priority.\n", __FUNCTION__);
17479 +				reeval_prio(which); /* check if we need to change priority here */
17480 +			}
17481 +			else
17482 +			{
17483 +				TRACE_CUR("%s: Pending work was flushed!  Prev owner was %s/%d\n",
17484 +								__FUNCTION__,
17485 +								owner->comm, owner->pid);
17486 +				preempt_enable_no_resched();
17487 +
17488 +				// exit execution phase.
17489 +				exit_execution_phase(which, sem, owner);
17490 +			}
17491 +
17492 +			cond_resched();
17493 +			preempt_disable();
17494 +		}
17495 +		preempt_enable();
17496 +		set_current_state(TASK_INTERRUPTIBLE);
17497 +	}
17498 +	__set_current_state(TASK_RUNNING);
17499 +
17500 +	atomic_dec(&num_ready_klitirqds);
17501 +
17502 +rt_failed:
17503 +    litmus_exit_task(current);
17504 +
17505 +	return rt_status;
17506 +}
17507 +
17508 +
17509 +struct klitirqd_launch_data
17510 +{
17511 +	int* cpu_affinity;
17512 +	struct work_struct work;
17513 +};
17514 +
17515 +/* executed by a kworker from workqueues */
17516 +static void launch_klitirqd(struct work_struct *work)
17517 +{
17518 +    int i;
17519 +
17520 +	struct klitirqd_launch_data* launch_data =
17521 +		container_of(work, struct klitirqd_launch_data, work);
17522 +
17523 +    TRACE("%s: Creating %d klitirqds\n", __FUNCTION__, NR_LITMUS_SOFTIRQD);
17524 +
17525 +    /* create the daemon threads */
17526 +    for(i = 0; i < NR_LITMUS_SOFTIRQD; ++i)
17527 +    {
17528 +		if(launch_data->cpu_affinity)
17529 +		{
17530 +			klitirqds[i].klitirqd =
17531 +				kthread_create(
17532 +				   run_klitirqd,
17533 +				   /* treat the affinity as a pointer, we'll cast it back later */
17534 +				   (void*)(long long)launch_data->cpu_affinity[i],
17535 +				   "klitirqd_th%d/%d",
17536 +				   i,
17537 +				   launch_data->cpu_affinity[i]);
17538 +
17539 +			/* litmus will put is in the right cluster. */
17540 +			kthread_bind(klitirqds[i].klitirqd, launch_data->cpu_affinity[i]);
17541 +		}
17542 +		else
17543 +		{
17544 +			klitirqds[i].klitirqd =
17545 +				kthread_create(
17546 +				   run_klitirqd,
17547 +				   /* treat the affinity as a pointer, we'll cast it back later */
17548 +				   (void*)(long long)(-1),
17549 +				   "klitirqd_th%d",
17550 +				   i);
17551 +		}
17552 +    }
17553 +
17554 +    TRACE("%s: Launching %d klitirqds\n", __FUNCTION__, NR_LITMUS_SOFTIRQD);
17555 +
17556 +    /* unleash the daemons */
17557 +    for(i = 0; i < NR_LITMUS_SOFTIRQD; ++i)
17558 +    {
17559 +        wake_up_process(klitirqds[i].klitirqd);
17560 +    }
17561 +
17562 +	if(launch_data->cpu_affinity)
17563 +		kfree(launch_data->cpu_affinity);
17564 +	kfree(launch_data);
17565 +}
17566 +
17567 +
17568 +void spawn_klitirqd(int* affinity)
17569 +{
17570 +    int i;
17571 +    struct klitirqd_launch_data* delayed_launch;
17572 +
17573 +	if(atomic_read(&num_ready_klitirqds) != 0)
17574 +	{
17575 +		TRACE("%s: At least one klitirqd is already running! Need to call kill_klitirqd()?\n");
17576 +		return;
17577 +	}
17578 +
17579 +    /* init the tasklet & work queues */
17580 +    for(i = 0; i < NR_LITMUS_SOFTIRQD; ++i)
17581 +    {
17582 +		klitirqds[i].terminating = 0;
17583 +		klitirqds[i].pending = 0;
17584 +
17585 +		klitirqds[i].num_hi_pending.counter = 0;
17586 +		klitirqds[i].num_low_pending.counter = 0;
17587 +		klitirqds[i].num_work_pending.counter = 0;
17588 +
17589 +        klitirqds[i].pending_tasklets_hi.head = NULL;
17590 +        klitirqds[i].pending_tasklets_hi.tail = &klitirqds[i].pending_tasklets_hi.head;
17591 +
17592 +        klitirqds[i].pending_tasklets.head = NULL;
17593 +        klitirqds[i].pending_tasklets.tail = &klitirqds[i].pending_tasklets.head;
17594 +
17595 +		INIT_LIST_HEAD(&klitirqds[i].worklist);
17596 +
17597 +		raw_spin_lock_init(&klitirqds[i].lock);
17598 +    }
17599 +
17600 +    /* wait to flush the initializations to memory since other threads
17601 +       will access it. */
17602 +    mb();
17603 +
17604 +    /* tell a work queue to launch the threads.  we can't make scheduling
17605 +       calls since we're in an atomic state. */
17606 +    TRACE("%s: Setting callback up to launch klitirqds\n", __FUNCTION__);
17607 +	delayed_launch = kmalloc(sizeof(struct klitirqd_launch_data), GFP_ATOMIC);
17608 +	if(affinity)
17609 +	{
17610 +		delayed_launch->cpu_affinity =
17611 +			kmalloc(sizeof(int)*NR_LITMUS_SOFTIRQD, GFP_ATOMIC);
17612 +
17613 +		memcpy(delayed_launch->cpu_affinity, affinity,
17614 +			sizeof(int)*NR_LITMUS_SOFTIRQD);
17615 +	}
17616 +	else
17617 +	{
17618 +		delayed_launch->cpu_affinity = NULL;
17619 +	}
17620 +    INIT_WORK(&delayed_launch->work, launch_klitirqd);
17621 +    schedule_work(&delayed_launch->work);
17622 +}
17623 +
17624 +
17625 +void kill_klitirqd(void)
17626 +{
17627 +	if(!klitirqd_is_dead())
17628 +	{
17629 +    	int i;
17630 +
17631 +    	TRACE("%s: Killing %d klitirqds\n", __FUNCTION__, NR_LITMUS_SOFTIRQD);
17632 +
17633 +    	for(i = 0; i < NR_LITMUS_SOFTIRQD; ++i)
17634 +    	{
17635 +			if(klitirqds[i].terminating != 1)
17636 +			{
17637 +				klitirqds[i].terminating = 1;
17638 +				mb(); /* just to be sure? */
17639 +				flush_pending(klitirqds[i].klitirqd, NULL);
17640 +
17641 +				/* signal termination */
17642 +       			kthread_stop(klitirqds[i].klitirqd);
17643 +			}
17644 +    	}
17645 +	}
17646 +}
17647 +
17648 +
17649 +int klitirqd_is_ready(void)
17650 +{
17651 +	return(atomic_read(&num_ready_klitirqds) == NR_LITMUS_SOFTIRQD);
17652 +}
17653 +
17654 +int klitirqd_is_dead(void)
17655 +{
17656 +	return(atomic_read(&num_ready_klitirqds) == 0);
17657 +}
17658 +
17659 +
17660 +struct task_struct* get_klitirqd(unsigned int k_id)
17661 +{
17662 +	return(klitirqds[k_id].klitirqd);
17663 +}
17664 +
17665 +
17666 +void flush_pending(struct task_struct* klitirqd_thread,
17667 +				   struct task_struct* owner)
17668 +{
17669 +	unsigned int k_id = klitirqd_id(klitirqd_thread);
17670 +	struct klitirqd_info *which = &klitirqds[k_id];
17671 +
17672 +	unsigned long flags;
17673 +	struct tasklet_struct *list;
17674 +
17675 +	u32 work_flushed = 0;
17676 +
17677 +	raw_spin_lock_irqsave(&which->lock, flags);
17678 +
17679 +	//__dump_state(which, "flush_pending: before");
17680 +
17681 +	// flush hi tasklets.
17682 +	if(litirq_pending_hi_irqoff(which))
17683 +	{
17684 +		which->pending &= ~LIT_TASKLET_HI;
17685 +
17686 +		list = which->pending_tasklets_hi.head;
17687 +		which->pending_tasklets_hi.head = NULL;
17688 +		which->pending_tasklets_hi.tail = &which->pending_tasklets_hi.head;
17689 +
17690 +		TRACE("%s: Handing HI tasklets back to Linux.\n", __FUNCTION__);
17691 +
17692 +		while(list)
17693 +		{
17694 +			struct tasklet_struct *t = list;
17695 +			list = list->next;
17696 +
17697 +			if(likely((t->owner == owner) || (owner == NULL)))
17698 +			{
17699 +				if(unlikely(!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state)))
17700 +				{
17701 +					BUG();
17702 +				}
17703 +
17704 +				work_flushed |= LIT_TASKLET_HI;
17705 +
17706 +				t->owner = NULL;
17707 +
17708 +				// WTF?
17709 +				if(!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))
17710 +				{
17711 +					atomic_dec(&which->num_hi_pending);
17712 +					___tasklet_hi_schedule(t);
17713 +				}
17714 +				else
17715 +				{
17716 +					TRACE("%s: dropped hi tasklet??\n", __FUNCTION__);
17717 +					BUG();
17718 +				}
17719 +			}
17720 +			else
17721 +			{
17722 +				TRACE("%s: Could not flush a HI tasklet.\n", __FUNCTION__);
17723 +				// put back on queue.
17724 +				___litmus_tasklet_hi_schedule(t, which, 0);
17725 +			}
17726 +		}
17727 +	}
17728 +
17729 +	// flush low tasklets.
17730 +	if(litirq_pending_low_irqoff(which))
17731 +	{
17732 +		which->pending &= ~LIT_TASKLET_LOW;
17733 +
17734 +		list = which->pending_tasklets.head;
17735 +		which->pending_tasklets.head = NULL;
17736 +		which->pending_tasklets.tail = &which->pending_tasklets.head;
17737 +
17738 +		TRACE("%s: Handing LOW tasklets back to Linux.\n", __FUNCTION__);
17739 +
17740 +		while(list)
17741 +		{
17742 +			struct tasklet_struct *t = list;
17743 +			list = list->next;
17744 +
17745 +			if(likely((t->owner == owner) || (owner == NULL)))
17746 +			{
17747 +				if(unlikely(!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state)))
17748 +				{
17749 +					BUG();
17750 +				}
17751 +
17752 +				work_flushed |= LIT_TASKLET_LOW;
17753 +
17754 +				t->owner = NULL;
17755 +				sched_trace_tasklet_end(owner, 1ul);
17756 +
17757 +				if(!test_and_set_bit(TASKLET_STATE_SCHED, &t->state))
17758 +				{
17759 +					atomic_dec(&which->num_low_pending);
17760 +					___tasklet_schedule(t);
17761 +				}
17762 +				else
17763 +				{
17764 +					TRACE("%s: dropped tasklet??\n", __FUNCTION__);
17765 +					BUG();
17766 +				}
17767 +			}
17768 +			else
17769 +			{
17770 +				TRACE("%s: Could not flush a LOW tasklet.\n", __FUNCTION__);
17771 +				// put back on queue
17772 +				___litmus_tasklet_schedule(t, which, 0);
17773 +			}
17774 +		}
17775 +	}
17776 +
17777 +	// flush work objects
17778 +	if(litirq_pending_work_irqoff(which))
17779 +	{
17780 +		which->pending &= ~LIT_WORK;
17781 +
17782 +		TRACE("%s: Handing work objects back to Linux.\n", __FUNCTION__);
17783 +
17784 +		while(!list_empty(&which->worklist))
17785 +		{
17786 +			struct work_struct* work =
17787 +				list_first_entry(&which->worklist, struct work_struct, entry);
17788 +			list_del_init(&work->entry);
17789 +
17790 +			if(likely((work->owner == owner) || (owner == NULL)))
17791 +			{
17792 +				work_flushed |= LIT_WORK;
17793 +				atomic_dec(&which->num_work_pending);
17794 +
17795 +				work->owner = NULL;
17796 +				sched_trace_work_end(owner, current, 1ul);
17797 +				__schedule_work(work);
17798 +			}
17799 +			else
17800 +			{
17801 +				TRACE("%s: Could not flush a work object.\n", __FUNCTION__);
17802 +				// put back on queue
17803 +				___litmus_schedule_work(work, which, 0);
17804 +			}
17805 +		}
17806 +	}
17807 +
17808 +	//__dump_state(which, "flush_pending: after (before reeval prio)");
17809 +
17810 +
17811 +	mb(); /* commit changes to pending flags */
17812 +
17813 +	/* reset the scheduling priority */
17814 +	if(work_flushed)
17815 +	{
17816 +		__reeval_prio(which);
17817 +
17818 +		/* Try to offload flushed tasklets to Linux's ksoftirqd. */
17819 +		if(work_flushed & (LIT_TASKLET_LOW | LIT_TASKLET_HI))
17820 +		{
17821 +			wakeup_softirqd();
17822 +		}
17823 +	}
17824 +	else
17825 +	{
17826 +		TRACE_CUR("%s: no work flushed, so __reeval_prio() skipped\n", __FUNCTION__);
17827 +	}
17828 +
17829 +	raw_spin_unlock_irqrestore(&which->lock, flags);
17830 +}
17831 +
17832 +
17833 +
17834 +
17835 +static void ___litmus_tasklet_schedule(struct tasklet_struct *t,
17836 +									   struct klitirqd_info *which,
17837 +									   int wakeup)
17838 +{
17839 +	unsigned long flags;
17840 +	u32 old_pending;
17841 +
17842 +	t->next = NULL;
17843 +
17844 +    raw_spin_lock_irqsave(&which->lock, flags);
17845 +
17846 +	//__dump_state(which, "___litmus_tasklet_schedule: before queuing");
17847 +
17848 +    *(which->pending_tasklets.tail) = t;
17849 +    which->pending_tasklets.tail = &t->next;
17850 +
17851 +	old_pending = which->pending;
17852 +	which->pending |= LIT_TASKLET_LOW;
17853 +
17854 +	atomic_inc(&which->num_low_pending);
17855 +
17856 +	mb();
17857 +
17858 +	if(!old_pending && wakeup)
17859 +	{
17860 +		wakeup_litirqd_locked(which); /* wake up the klitirqd */
17861 +	}
17862 +
17863 +	//__dump_state(which, "___litmus_tasklet_schedule: after queuing");
17864 +
17865 +    raw_spin_unlock_irqrestore(&which->lock, flags);
17866 +}
17867 +
17868 +int __litmus_tasklet_schedule(struct tasklet_struct *t, unsigned int k_id)
17869 +{
17870 +	int ret = 0; /* assume failure */
17871 +    if(unlikely((t->owner == NULL) || !is_realtime(t->owner)))
17872 +    {
17873 +        TRACE("%s: No owner associated with this tasklet!\n", __FUNCTION__);
17874 +        BUG();
17875 +    }
17876 +
17877 +    if(unlikely(k_id >= NR_LITMUS_SOFTIRQD))
17878 +    {
17879 +        TRACE("%s: No klitirqd_th%d!\n", __FUNCTION__, k_id);
17880 +        BUG();
17881 +    }
17882 +
17883 +	if(likely(!klitirqds[k_id].terminating))
17884 +	{
17885 +		/* Can't accept tasklets while we're processing a workqueue
17886 +		   because they're handled by the same thread. This case is
17887 +		   very RARE.
17888 +
17889 +		   TODO: Use a separate thread for work objects!!!!!!
17890 +         */
17891 +		if(likely(atomic_read(&klitirqds[k_id].num_work_pending) == 0))
17892 +		{
17893 +			ret = 1;
17894 +			___litmus_tasklet_schedule(t, &klitirqds[k_id], 1);
17895 +		}
17896 +		else
17897 +		{
17898 +			TRACE("%s: rejected tasklet because of pending work.\n",
17899 +						__FUNCTION__);
17900 +		}
17901 +	}
17902 +	return(ret);
17903 +}
17904 +
17905 +EXPORT_SYMBOL(__litmus_tasklet_schedule);
17906 +
17907 +
17908 +static void ___litmus_tasklet_hi_schedule(struct tasklet_struct *t,
17909 +									   struct klitirqd_info *which,
17910 +									   int wakeup)
17911 +{
17912 +	unsigned long flags;
17913 +	u32 old_pending;
17914 +
17915 +	t->next = NULL;
17916 +
17917 +    raw_spin_lock_irqsave(&which->lock, flags);
17918 +
17919 +    *(which->pending_tasklets_hi.tail) = t;
17920 +    which->pending_tasklets_hi.tail = &t->next;
17921 +
17922 +	old_pending = which->pending;
17923 +	which->pending |= LIT_TASKLET_HI;
17924 +
17925 +	atomic_inc(&which->num_hi_pending);
17926 +
17927 +	mb();
17928 +
17929 +	if(!old_pending && wakeup)
17930 +	{
17931 +		wakeup_litirqd_locked(which); /* wake up the klitirqd */
17932 +	}
17933 +
17934 +    raw_spin_unlock_irqrestore(&which->lock, flags);
17935 +}
17936 +
17937 +int __litmus_tasklet_hi_schedule(struct tasklet_struct *t, unsigned int k_id)
17938 +{
17939 +	int ret = 0; /* assume failure */
17940 +    if(unlikely((t->owner == NULL) || !is_realtime(t->owner)))
17941 +    {
17942 +        TRACE("%s: No owner associated with this tasklet!\n", __FUNCTION__);
17943 +        BUG();
17944 +    }
17945 +
17946 +    if(unlikely(k_id >= NR_LITMUS_SOFTIRQD))
17947 +    {
17948 +        TRACE("%s: No klitirqd_th%d!\n", __FUNCTION__, k_id);
17949 +        BUG();
17950 +    }
17951 +
17952 +    if(unlikely(!klitirqd_is_ready()))
17953 +    {
17954 +        TRACE("%s: klitirqd is not ready!\n", __FUNCTION__, k_id);
17955 +        BUG();
17956 +    }
17957 +
17958 +	if(likely(!klitirqds[k_id].terminating))
17959 +	{
17960 +		if(likely(atomic_read(&klitirqds[k_id].num_work_pending) == 0))
17961 +		{
17962 +			ret = 1;
17963 +			___litmus_tasklet_hi_schedule(t, &klitirqds[k_id], 1);
17964 +		}
17965 +		else
17966 +		{
17967 +			TRACE("%s: rejected tasklet because of pending work.\n",
17968 +						__FUNCTION__);
17969 +		}
17970 +	}
17971 +	return(ret);
17972 +}
17973 +
17974 +EXPORT_SYMBOL(__litmus_tasklet_hi_schedule);
17975 +
17976 +
17977 +int __litmus_tasklet_hi_schedule_first(struct tasklet_struct *t, unsigned int k_id)
17978 +{
17979 +	int ret = 0; /* assume failure */
17980 +	u32 old_pending;
17981 +
17982 +	BUG_ON(!irqs_disabled());
17983 +
17984 +    if(unlikely((t->owner == NULL) || !is_realtime(t->owner)))
17985 +    {
17986 +        TRACE("%s: No owner associated with this tasklet!\n", __FUNCTION__);
17987 +        BUG();
17988 +    }
17989 +
17990 +    if(unlikely(k_id >= NR_LITMUS_SOFTIRQD))
17991 +    {
17992 +        TRACE("%s: No klitirqd_th%u!\n", __FUNCTION__, k_id);
17993 +        BUG();
17994 +    }
17995 +
17996 +    if(unlikely(!klitirqd_is_ready()))
17997 +    {
17998 +        TRACE("%s: klitirqd is not ready!\n", __FUNCTION__, k_id);
17999 +        BUG();
18000 +    }
18001 +
18002 +	if(likely(!klitirqds[k_id].terminating))
18003 +	{
18004 +    	raw_spin_lock(&klitirqds[k_id].lock);
18005 +
18006 +		if(likely(atomic_read(&klitirqds[k_id].num_work_pending) == 0))
18007 +		{
18008 +			ret = 1;  // success!
18009 +
18010 +			t->next = klitirqds[k_id].pending_tasklets_hi.head;
18011 +    		klitirqds[k_id].pending_tasklets_hi.head = t;
18012 +
18013 +			old_pending = klitirqds[k_id].pending;
18014 +			klitirqds[k_id].pending |= LIT_TASKLET_HI;
18015 +
18016 +			atomic_inc(&klitirqds[k_id].num_hi_pending);
18017 +
18018 +			mb();
18019 +
18020 +			if(!old_pending)
18021 +    			wakeup_litirqd_locked(&klitirqds[k_id]); /* wake up the klitirqd */
18022 +		}
18023 +		else
18024 +		{
18025 +			TRACE("%s: rejected tasklet because of pending work.\n",
18026 +					__FUNCTION__);
18027 +		}
18028 +
18029 +    	raw_spin_unlock(&klitirqds[k_id].lock);
18030 +	}
18031 +	return(ret);
18032 +}
18033 +
18034 +EXPORT_SYMBOL(__litmus_tasklet_hi_schedule_first);
18035 +
18036 +
18037 +
18038 +static void ___litmus_schedule_work(struct work_struct *w,
18039 +									struct klitirqd_info *which,
18040 +									int wakeup)
18041 +{
18042 +	unsigned long flags;
18043 +	u32 old_pending;
18044 +
18045 +	raw_spin_lock_irqsave(&which->lock, flags);
18046 +
18047 +	work_pending(w);
18048 +	list_add_tail(&w->entry, &which->worklist);
18049 +
18050 +	old_pending = which->pending;
18051 +	which->pending |= LIT_WORK;
18052 +
18053 +	atomic_inc(&which->num_work_pending);
18054 +
18055 +	mb();
18056 +
18057 +	if(!old_pending && wakeup)
18058 +	{
18059 +		wakeup_litirqd_locked(which); /* wakeup the klitirqd */
18060 +	}
18061 +
18062 +	raw_spin_unlock_irqrestore(&which->lock, flags);
18063 +}
18064 +
18065 +int __litmus_schedule_work(struct work_struct *w, unsigned int k_id)
18066 +{
18067 +	int ret = 1; /* assume success */
18068 +	if(unlikely(w->owner == NULL) || !is_realtime(w->owner))
18069 +	{
18070 +		TRACE("%s: No owner associated with this work object!\n", __FUNCTION__);
18071 +		BUG();
18072 +	}
18073 +
18074 +	if(unlikely(k_id >= NR_LITMUS_SOFTIRQD))
18075 +	{
18076 +		TRACE("%s: No klitirqd_th%u!\n", k_id);
18077 +		BUG();
18078 +	}
18079 +
18080 +    if(unlikely(!klitirqd_is_ready()))
18081 +    {
18082 +        TRACE("%s: klitirqd is not ready!\n", __FUNCTION__, k_id);
18083 +        BUG();
18084 +    }
18085 +
18086 +	if(likely(!klitirqds[k_id].terminating))
18087 +		___litmus_schedule_work(w, &klitirqds[k_id], 1);
18088 +	else
18089 +		ret = 0;
18090 +	return(ret);
18091 +}
18092 +EXPORT_SYMBOL(__litmus_schedule_work);
18093 +
18094 +
18095 +static int set_klitirqd_sem_status(unsigned long stat)
18096 +{
18097 +	TRACE_CUR("SETTING STATUS FROM %d TO %d\n",
18098 +					atomic_read(&tsk_rt(current)->klitirqd_sem_stat),
18099 +					stat);
18100 +	atomic_set(&tsk_rt(current)->klitirqd_sem_stat, stat);
18101 +	//mb();
18102 +
18103 +	return(0);
18104 +}
18105 +
18106 +static int set_klitirqd_sem_status_if_not_held(unsigned long stat)
18107 +{
18108 +	if(atomic_read(&tsk_rt(current)->klitirqd_sem_stat) != HELD)
18109 +	{
18110 +		return(set_klitirqd_sem_status(stat));
18111 +	}
18112 +	return(-1);
18113 +}
18114 +
18115 +
18116 +void __down_and_reset_and_set_stat(struct task_struct* t,
18117 +					   enum klitirqd_sem_status to_reset,
18118 +					   enum klitirqd_sem_status to_set,
18119 +					   struct mutex* sem)
18120 +{
18121 +#if 0
18122 +	struct rt_param* param = container_of(sem, struct rt_param, klitirqd_sem);
18123 +	struct task_struct* task = container_of(param, struct task_struct, rt_param);
18124 +
18125 +	TRACE_CUR("%s: entered.  Locking semaphore of %s/%d\n",
18126 +					__FUNCTION__, task->comm, task->pid);
18127 +#endif
18128 +
18129 +	mutex_lock_sfx(sem,
18130 +				   set_klitirqd_sem_status_if_not_held, to_reset,
18131 +				   set_klitirqd_sem_status, to_set);
18132 +#if 0
18133 +	TRACE_CUR("%s: exiting.  Have semaphore of %s/%d\n",
18134 +					__FUNCTION__, task->comm, task->pid);
18135 +#endif
18136 +}
18137 +
18138 +void down_and_set_stat(struct task_struct* t,
18139 +					   enum klitirqd_sem_status to_set,
18140 +					   struct mutex* sem)
18141 +{
18142 +#if 0
18143 +	struct rt_param* param = container_of(sem, struct rt_param, klitirqd_sem);
18144 +	struct task_struct* task = container_of(param, struct task_struct, rt_param);
18145 +
18146 +	TRACE_CUR("%s: entered.  Locking semaphore of %s/%d\n",
18147 +					__FUNCTION__, task->comm, task->pid);
18148 +#endif
18149 +
18150 +	mutex_lock_sfx(sem,
18151 +				   NULL, 0,
18152 +				   set_klitirqd_sem_status, to_set);
18153 +
18154 +#if 0
18155 +	TRACE_CUR("%s: exiting.  Have semaphore of %s/%d\n",
18156 +					__FUNCTION__, task->comm, task->pid);
18157 +#endif
18158 +}
18159 +
18160 +
18161 +void up_and_set_stat(struct task_struct* t,
18162 +					 enum klitirqd_sem_status to_set,
18163 +					 struct mutex* sem)
18164 +{
18165 +#if 0
18166 +	struct rt_param* param = container_of(sem, struct rt_param, klitirqd_sem);
18167 +	struct task_struct* task = container_of(param, struct task_struct, rt_param);
18168 +
18169 +	TRACE_CUR("%s: entered.  Unlocking semaphore of %s/%d\n",
18170 +					__FUNCTION__,
18171 +					task->comm, task->pid);
18172 +#endif
18173 +
18174 +	mutex_unlock_sfx(sem, NULL, 0,
18175 +					 set_klitirqd_sem_status, to_set);
18176 +
18177 +#if 0
18178 +	TRACE_CUR("%s: exiting.  Unlocked semaphore of %s/%d\n",
18179 +					__FUNCTION__,
18180 +					task->comm, task->pid);
18181 +#endif
18182 +}
18183 +
18184 +
18185 +
18186 +void release_klitirqd_lock(struct task_struct* t)
18187 +{
18188 +	if(is_realtime(t) && (atomic_read(&tsk_rt(t)->klitirqd_sem_stat) == HELD))
18189 +	{
18190 +		struct mutex* sem;
18191 +		struct task_struct* owner = t;
18192 +
18193 +		if(t->state == TASK_RUNNING)
18194 +		{
18195 +			TRACE_TASK(t, "NOT giving up klitirqd_sem because we're not blocked!\n");
18196 +			return;
18197 +		}
18198 +
18199 +		if(likely(!tsk_rt(t)->is_proxy_thread))
18200 +		{
18201 +			sem = &tsk_rt(t)->klitirqd_sem;
18202 +		}
18203 +		else
18204 +		{
18205 +			unsigned int k_id = klitirqd_id(t);
18206 +			owner = klitirqds[k_id].current_owner;
18207 +
18208 +			BUG_ON(t != klitirqds[k_id].klitirqd);
18209 +
18210 +			if(likely(owner))
18211 +			{
18212 +				sem = &tsk_rt(owner)->klitirqd_sem;
18213 +			}
18214 +			else
18215 +			{
18216 +				BUG();
18217 +
18218 +				// We had the rug pulled out from under us.  Abort attempt
18219 +				// to reacquire the lock since our client no longer needs us.
18220 +				TRACE_CUR("HUH?!  How did this happen?\n");
18221 +				atomic_set(&tsk_rt(t)->klitirqd_sem_stat, NOT_HELD);
18222 +				return;
18223 +			}
18224 +		}
18225 +
18226 +		//TRACE_CUR("Releasing semaphore of %s/%d...\n", owner->comm, owner->pid);
18227 +		up_and_set_stat(t, NEED_TO_REACQUIRE, sem);
18228 +		//TRACE_CUR("Semaphore of %s/%d released!\n", owner->comm, owner->pid);
18229 +	}
18230 +	/*
18231 +	else if(is_realtime(t))
18232 +	{
18233 +		TRACE_CUR("%s: Nothing to do.  Stat = %d\n", __FUNCTION__, tsk_rt(t)->klitirqd_sem_stat);
18234 +	}
18235 +	*/
18236 +}
18237 +
18238 +int reacquire_klitirqd_lock(struct task_struct* t)
18239 +{
18240 +	int ret = 0;
18241 +
18242 +	if(is_realtime(t) && (atomic_read(&tsk_rt(t)->klitirqd_sem_stat) == NEED_TO_REACQUIRE))
18243 +	{
18244 +		struct mutex* sem;
18245 +		struct task_struct* owner = t;
18246 +
18247 +		if(likely(!tsk_rt(t)->is_proxy_thread))
18248 +		{
18249 +			sem = &tsk_rt(t)->klitirqd_sem;
18250 +		}
18251 +		else
18252 +		{
18253 +			unsigned int k_id = klitirqd_id(t);
18254 +			//struct task_struct* owner = klitirqds[k_id].current_owner;
18255 +			owner = klitirqds[k_id].current_owner;
18256 +
18257 +			BUG_ON(t != klitirqds[k_id].klitirqd);
18258 +
18259 +			if(likely(owner))
18260 +			{
18261 +				sem = &tsk_rt(owner)->klitirqd_sem;
18262 +			}
18263 +			else
18264 +			{
18265 +				// We had the rug pulled out from under us.  Abort attempt
18266 +				// to reacquire the lock since our client no longer needs us.
18267 +				TRACE_CUR("No longer needs to reacquire klitirqd_sem!\n");
18268 +				atomic_set(&tsk_rt(t)->klitirqd_sem_stat, NOT_HELD);
18269 +				return(0);
18270 +			}
18271 +		}
18272 +
18273 +		//TRACE_CUR("Trying to reacquire semaphore of %s/%d\n", owner->comm, owner->pid);
18274 +		__down_and_reset_and_set_stat(t, REACQUIRING, HELD, sem);
18275 +		//TRACE_CUR("Reacquired semaphore %s/%d\n", owner->comm, owner->pid);
18276 +	}
18277 +	/*
18278 +	else if(is_realtime(t))
18279 +	{
18280 +		TRACE_CUR("%s: Nothing to do.  Stat = %d\n", __FUNCTION__, tsk_rt(t)->klitirqd_sem_stat);
18281 +	}
18282 +	*/
18283 +
18284 +	return(ret);
18285 +}
18286 +
18287 diff --git a/litmus/locking.c b/litmus/locking.c
18288 new file mode 100644
18289 index 0000000..718a5a3
18290 --- /dev/null
18291 +++ b/litmus/locking.c
18292 @@ -0,0 +1,524 @@
18293 +#include <litmus/fdso.h>
18294 +
18295 +#ifdef CONFIG_LITMUS_LOCKING
18296 +
18297 +#include <litmus/sched_plugin.h>
18298 +#include <litmus/trace.h>
18299 +#include <litmus/litmus.h>
18300 +
18301 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
18302 +#include <linux/uaccess.h>
18303 +#endif
18304 +
18305 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
18306 +#include <litmus/gpu_affinity.h>
18307 +#endif
18308 +
18309 +static int create_generic_lock(void** obj_ref, obj_type_t type, void* __user arg);
18310 +static int open_generic_lock(struct od_table_entry* entry, void* __user arg);
18311 +static int close_generic_lock(struct od_table_entry* entry);
18312 +static void destroy_generic_lock(obj_type_t type, void* sem);
18313 +
18314 +struct fdso_ops generic_lock_ops = {
18315 +	.create  = create_generic_lock,
18316 +	.open    = open_generic_lock,
18317 +	.close   = close_generic_lock,
18318 +	.destroy = destroy_generic_lock
18319 +};
18320 +
18321 +static atomic_t lock_id_gen = ATOMIC_INIT(0);
18322 +
18323 +
18324 +static inline bool is_lock(struct od_table_entry* entry)
18325 +{
18326 +	return entry->class == &generic_lock_ops;
18327 +}
18328 +
18329 +static inline struct litmus_lock* get_lock(struct od_table_entry* entry)
18330 +{
18331 +	BUG_ON(!is_lock(entry));
18332 +	return (struct litmus_lock*) entry->obj->obj;
18333 +}
18334 +
18335 +static  int create_generic_lock(void** obj_ref, obj_type_t type, void* __user arg)
18336 +{
18337 +	struct litmus_lock* lock;
18338 +	int err;
18339 +
18340 +	err = litmus->allocate_lock(&lock, type, arg);
18341 +	if (err == 0) {
18342 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
18343 +		lock->nest.lock = lock;
18344 +		lock->nest.hp_waiter_eff_prio = NULL;
18345 +
18346 +		INIT_BINHEAP_NODE(&lock->nest.hp_binheap_node);
18347 +		if(!lock->nest.hp_waiter_ptr) {
18348 +			TRACE_CUR("BEWARE: hp_waiter_ptr should probably not be NULL in "
18349 +					  "most uses. (exception: IKGLP donors)\n");
18350 +		}
18351 +#endif
18352 +		lock->type = type;
18353 +		lock->ident = atomic_inc_return(&lock_id_gen);
18354 +		*obj_ref = lock;
18355 +    }
18356 +	return err;
18357 +}
18358 +
18359 +static int open_generic_lock(struct od_table_entry* entry, void* __user arg)
18360 +{
18361 +	struct litmus_lock* lock = get_lock(entry);
18362 +	if (lock->ops->open)
18363 +		return lock->ops->open(lock, arg);
18364 +	else
18365 +		return 0; /* default: any task can open it */
18366 +}
18367 +
18368 +static int close_generic_lock(struct od_table_entry* entry)
18369 +{
18370 +	struct litmus_lock* lock = get_lock(entry);
18371 +	if (lock->ops->close)
18372 +		return lock->ops->close(lock);
18373 +	else
18374 +		return 0; /* default: closing succeeds */
18375 +}
18376 +
18377 +static void destroy_generic_lock(obj_type_t type, void* obj)
18378 +{
18379 +	struct litmus_lock* lock = (struct litmus_lock*) obj;
18380 +	lock->ops->deallocate(lock);
18381 +}
18382 +
18383 +asmlinkage long sys_litmus_lock(int lock_od)
18384 +{
18385 +	long err = -EINVAL;
18386 +	struct od_table_entry* entry;
18387 +	struct litmus_lock* l;
18388 +
18389 +	TS_LOCK_START;
18390 +
18391 +	entry = get_entry_for_od(lock_od);
18392 +	if (entry && is_lock(entry)) {
18393 +		l = get_lock(entry);
18394 +		//TRACE_CUR("attempts to lock 0x%p\n", l);
18395 +		TRACE_CUR("attempts to lock %d\n", l->ident);
18396 +		err = l->ops->lock(l);
18397 +	}
18398 +
18399 +	/* Note: task my have been suspended or preempted in between!  Take
18400 +	 * this into account when computing overheads. */
18401 +	TS_LOCK_END;
18402 +
18403 +	return err;
18404 +}
18405 +
18406 +asmlinkage long sys_litmus_unlock(int lock_od)
18407 +{
18408 +	long err = -EINVAL;
18409 +	struct od_table_entry* entry;
18410 +	struct litmus_lock* l;
18411 +
18412 +	TS_UNLOCK_START;
18413 +
18414 +	entry = get_entry_for_od(lock_od);
18415 +	if (entry && is_lock(entry)) {
18416 +		l = get_lock(entry);
18417 +		//TRACE_CUR("attempts to unlock 0x%p\n", l);
18418 +		TRACE_CUR("attempts to unlock %d\n", l->ident);
18419 +		err = l->ops->unlock(l);
18420 +	}
18421 +
18422 +	/* Note: task my have been preempted in between!  Take this into
18423 +	 * account when computing overheads. */
18424 +	TS_UNLOCK_END;
18425 +
18426 +	return err;
18427 +}
18428 +
18429 +struct task_struct* __waitqueue_remove_first(wait_queue_head_t *wq)
18430 +{
18431 +	wait_queue_t* q;
18432 +	struct task_struct* t = NULL;
18433 +
18434 +	if (waitqueue_active(wq)) {
18435 +		q = list_entry(wq->task_list.next,
18436 +			       wait_queue_t, task_list);
18437 +		t = (struct task_struct*) q->private;
18438 +		__remove_wait_queue(wq, q);
18439 +	}
18440 +	return(t);
18441 +}
18442 +
18443 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
18444 +
18445 +void print_hp_waiters(struct binheap_node* n, int depth)
18446 +{
18447 +	struct litmus_lock *l;
18448 +	struct nested_info *nest;
18449 +	char padding[81] = "                                                                                ";
18450 +	struct task_struct *hp = NULL;
18451 +	struct task_struct *hp_eff = NULL;
18452 +	struct task_struct *node_prio = NULL;
18453 +
18454 +
18455 +	if(n == NULL) {
18456 +		TRACE("+-> %p\n", NULL);
18457 +		return;
18458 +	}
18459 +
18460 +	nest = binheap_entry(n, struct nested_info, hp_binheap_node);
18461 +	l = nest->lock;
18462 +
18463 +	if(depth*2 <= 80)
18464 +		padding[depth*2] = '\0';
18465 +
18466 +	if(nest->hp_waiter_ptr && *(nest->hp_waiter_ptr)) {
18467 +		hp = *(nest->hp_waiter_ptr);
18468 +
18469 +		if(tsk_rt(hp)->inh_task) {
18470 +			hp_eff = tsk_rt(hp)->inh_task;
18471 +		}
18472 +	}
18473 +
18474 +	node_prio = nest->hp_waiter_eff_prio;
18475 +
18476 +	TRACE("%s+-> %s/%d [waiter = %s/%d] [waiter's inh = %s/%d] (lock = %d)\n",
18477 +		  padding,
18478 +		  (node_prio) ? node_prio->comm : "nil",
18479 +		  (node_prio) ? node_prio->pid : -1,
18480 +		  (hp) ? hp->comm : "nil",
18481 +		  (hp) ? hp->pid : -1,
18482 +		  (hp_eff) ? hp_eff->comm : "nil",
18483 +		  (hp_eff) ? hp_eff->pid : -1,
18484 +		  l->ident);
18485 +
18486 +    if(n->left) print_hp_waiters(n->left, depth+1);
18487 +    if(n->right) print_hp_waiters(n->right, depth+1);
18488 +}
18489 +#endif
18490 +
18491 +
18492 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
18493 +
18494 +void select_next_lock(dgl_wait_state_t* dgl_wait /*, struct litmus_lock* prev_lock*/)
18495 +{
18496 +	/*
18497 +	 We pick the next lock in reverse order. This causes inheritance propagation
18498 +	 from locks received earlier to flow in the same direction as regular nested
18499 +	 locking. This might make fine-grain DGL easier in the future.
18500 +	 */
18501 +
18502 +	BUG_ON(tsk_rt(dgl_wait->task)->blocked_lock);
18503 +
18504 +	//WARN_ON(dgl_wait->locks[dgl_wait->last_primary] != prev_lock);
18505 +
18506 +	// note reverse order
18507 +	for(dgl_wait->last_primary = dgl_wait->last_primary - 1;
18508 +		dgl_wait->last_primary >= 0;
18509 +		--(dgl_wait->last_primary)){
18510 +		if(!dgl_wait->locks[dgl_wait->last_primary]->ops->is_owner(
18511 +				dgl_wait->locks[dgl_wait->last_primary], dgl_wait->task)) {
18512 +
18513 +			tsk_rt(dgl_wait->task)->blocked_lock =
18514 +					dgl_wait->locks[dgl_wait->last_primary];
18515 +			mb();
18516 +
18517 +			TRACE_CUR("New blocked lock is %d\n",
18518 +					  dgl_wait->locks[dgl_wait->last_primary]->ident);
18519 +
18520 +			break;
18521 +		}
18522 +	}
18523 +}
18524 +
18525 +int dgl_wake_up(wait_queue_t *wq_node, unsigned mode, int sync, void *key)
18526 +{
18527 +	// should never be called.
18528 +	BUG();
18529 +	return 1;
18530 +}
18531 +
18532 +void __waitqueue_dgl_remove_first(wait_queue_head_t *wq,
18533 +								  dgl_wait_state_t** dgl_wait,
18534 +								  struct task_struct **task)
18535 +{
18536 +	wait_queue_t *q;
18537 +
18538 +	*dgl_wait = NULL;
18539 +	*task = NULL;
18540 +
18541 +	if (waitqueue_active(wq)) {
18542 +		q = list_entry(wq->task_list.next,
18543 +					   wait_queue_t, task_list);
18544 +
18545 +		if(q->func == dgl_wake_up) {
18546 +			*dgl_wait = (dgl_wait_state_t*) q->private;
18547 +		}
18548 +		else {
18549 +			*task = (struct task_struct*) q->private;
18550 +		}
18551 +
18552 +		__remove_wait_queue(wq, q);
18553 +	}
18554 +}
18555 +
18556 +void init_dgl_waitqueue_entry(wait_queue_t *wq_node, dgl_wait_state_t* dgl_wait)
18557 +{
18558 +	init_waitqueue_entry(wq_node, dgl_wait->task);
18559 +	wq_node->private = dgl_wait;
18560 +	wq_node->func = dgl_wake_up;
18561 +}
18562 +
18563 +
18564 +static long do_litmus_dgl_lock(dgl_wait_state_t *dgl_wait)
18565 +{
18566 +	int i;
18567 +	unsigned long irqflags; //, dummyflags;
18568 +	raw_spinlock_t *dgl_lock = litmus->get_dgl_spinlock(dgl_wait->task);
18569 +
18570 +	BUG_ON(dgl_wait->task != current);
18571 +
18572 +	raw_spin_lock_irqsave(dgl_lock, irqflags);
18573 +
18574 +
18575 +	dgl_wait->nr_remaining = dgl_wait->size;
18576 +
18577 +	TRACE_CUR("Locking DGL with size %d\n", dgl_wait->size);
18578 +
18579 +	// try to acquire each lock.  enqueue (non-blocking) if it is unavailable.
18580 +	for(i = 0; i < dgl_wait->size; ++i) {
18581 +		struct litmus_lock *l = dgl_wait->locks[i];
18582 +
18583 +		// dgl_lock() must set task state to TASK_UNINTERRUPTIBLE if task blocks.
18584 +
18585 +		if(l->ops->dgl_lock(l, dgl_wait, &dgl_wait->wq_nodes[i])) {
18586 +			--(dgl_wait->nr_remaining);
18587 +			TRACE_CUR("Acquired lock %d immediatly.\n", l->ident);
18588 +		}
18589 +	}
18590 +
18591 +	if(dgl_wait->nr_remaining == 0) {
18592 +		// acquired entire group immediatly
18593 +		TRACE_CUR("Acquired all locks in DGL immediatly!\n");
18594 +	}
18595 +	else {
18596 +
18597 +		TRACE_CUR("As many as %d locks in DGL are pending. Suspending.\n",
18598 +				  dgl_wait->nr_remaining);
18599 +
18600 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
18601 +		// KLUDGE: don't count this suspension as time in the critical gpu
18602 +		// critical section
18603 +		if(tsk_rt(dgl_wait->task)->held_gpus) {
18604 +			tsk_rt(dgl_wait->task)->suspend_gpu_tracker_on_block = 1;
18605 +		}
18606 +#endif
18607 +
18608 +		// note reverse order.  see comments in select_next_lock for reason.
18609 +		for(i = dgl_wait->size - 1; i >= 0; --i) {
18610 +			struct litmus_lock *l = dgl_wait->locks[i];
18611 +			if(!l->ops->is_owner(l, dgl_wait->task)) {  // double-check to be thread safe
18612 +
18613 +				TRACE_CUR("Activating priority inheritance on lock %d\n",
18614 +						  l->ident);
18615 +
18616 +				TS_DGL_LOCK_SUSPEND;
18617 +
18618 +				l->ops->enable_priority(l, dgl_wait);
18619 +				dgl_wait->last_primary = i;
18620 +
18621 +				TRACE_CUR("Suspending for lock %d\n", l->ident);
18622 +
18623 +				raw_spin_unlock_irqrestore(dgl_lock, irqflags);  // free dgl_lock before suspending
18624 +
18625 +				schedule();  // suspend!!!
18626 +
18627 +				TS_DGL_LOCK_RESUME;
18628 +
18629 +				TRACE_CUR("Woken up from DGL suspension.\n");
18630 +
18631 +				goto all_acquired;  // we should hold all locks when we wake up.
18632 +			}
18633 +		}
18634 +
18635 +		TRACE_CUR("Didn't have to suspend after all, but calling schedule() anyway.\n");
18636 +		//BUG();
18637 +	}
18638 +
18639 +	raw_spin_unlock_irqrestore(dgl_lock, irqflags);
18640 +
18641 +all_acquired:
18642 +
18643 +	// FOR SANITY CHECK FOR TESTING
18644 +//	for(i = 0; i < dgl_wait->size; ++i) {
18645 +//		struct litmus_lock *l = dgl_wait->locks[i];
18646 +//		BUG_ON(!l->ops->is_owner(l, dgl_wait->task));
18647 +//	}
18648 +
18649 +	TRACE_CUR("Acquired entire DGL\n");
18650 +
18651 +	return 0;
18652 +}
18653 +
18654 +static int supports_dgl(struct litmus_lock *l)
18655 +{
18656 +	struct litmus_lock_ops* ops = l->ops;
18657 +
18658 +	return (ops->dgl_lock			&&
18659 +			ops->is_owner			&&
18660 +			ops->enable_priority);
18661 +}
18662 +
18663 +asmlinkage long sys_litmus_dgl_lock(void* __user usr_dgl_ods, int dgl_size)
18664 +{
18665 +	struct task_struct *t = current;
18666 +	long err = -EINVAL;
18667 +	int dgl_ods[MAX_DGL_SIZE];
18668 +	int i;
18669 +
18670 +	dgl_wait_state_t dgl_wait_state;  // lives on the stack until all resources in DGL are held.
18671 +
18672 +	if(dgl_size > MAX_DGL_SIZE || dgl_size < 1)
18673 +		goto out;
18674 +
18675 +	if(!access_ok(VERIFY_READ, usr_dgl_ods, dgl_size*(sizeof(int))))
18676 +		goto out;
18677 +
18678 +	if(__copy_from_user(&dgl_ods, usr_dgl_ods, dgl_size*(sizeof(int))))
18679 +		goto out;
18680 +
18681 +	if (!is_realtime(t)) {
18682 +		err = -EPERM;
18683 +		goto out;
18684 +	}
18685 +
18686 +	for(i = 0; i < dgl_size; ++i) {
18687 +		struct od_table_entry *entry = get_entry_for_od(dgl_ods[i]);
18688 +		if(entry && is_lock(entry)) {
18689 +			dgl_wait_state.locks[i] = get_lock(entry);
18690 +			if(!supports_dgl(dgl_wait_state.locks[i])) {
18691 +				TRACE_CUR("Lock %d does not support all required DGL operations.\n",
18692 +						  dgl_wait_state.locks[i]->ident);
18693 +				goto out;
18694 +			}
18695 +		}
18696 +		else {
18697 +			TRACE_CUR("Invalid lock identifier\n");
18698 +			goto out;
18699 +		}
18700 +	}
18701 +
18702 +	dgl_wait_state.task = t;
18703 +	dgl_wait_state.size = dgl_size;
18704 +
18705 +	TS_DGL_LOCK_START;
18706 +	err = do_litmus_dgl_lock(&dgl_wait_state);
18707 +
18708 +	/* Note: task my have been suspended or preempted in between!  Take
18709 +	 * this into account when computing overheads. */
18710 +	TS_DGL_LOCK_END;
18711 +
18712 +out:
18713 +	return err;
18714 +}
18715 +
18716 +static long do_litmus_dgl_unlock(struct litmus_lock* dgl_locks[], int dgl_size)
18717 +{
18718 +	int i;
18719 +	long err = 0;
18720 +
18721 +	TRACE_CUR("Unlocking a DGL of %d size\n", dgl_size);
18722 +
18723 +	for(i = dgl_size - 1; i >= 0; --i) {  // unlock in reverse order
18724 +
18725 +		struct litmus_lock *l = dgl_locks[i];
18726 +		long tmp_err;
18727 +
18728 +		TRACE_CUR("Unlocking lock %d of DGL.\n", l->ident);
18729 +
18730 +		tmp_err = l->ops->unlock(l);
18731 +
18732 +		if(tmp_err) {
18733 +			TRACE_CUR("There was an error unlocking %d: %d.\n", l->ident, tmp_err);
18734 +			err = tmp_err;
18735 +		}
18736 +	}
18737 +
18738 +	TRACE_CUR("DGL unlocked. err = %d\n", err);
18739 +
18740 +	return err;
18741 +}
18742 +
18743 +asmlinkage long sys_litmus_dgl_unlock(void* __user usr_dgl_ods, int dgl_size)
18744 +{
18745 +	long err = -EINVAL;
18746 +	int dgl_ods[MAX_DGL_SIZE];
18747 +	struct od_table_entry* entry;
18748 +	int i;
18749 +
18750 +	struct litmus_lock* dgl_locks[MAX_DGL_SIZE];
18751 +
18752 +	if(dgl_size > MAX_DGL_SIZE || dgl_size < 1)
18753 +		goto out;
18754 +
18755 +	if(!access_ok(VERIFY_READ, usr_dgl_ods, dgl_size*(sizeof(int))))
18756 +		goto out;
18757 +
18758 +	if(__copy_from_user(&dgl_ods, usr_dgl_ods, dgl_size*(sizeof(int))))
18759 +		goto out;
18760 +
18761 +	for(i = 0; i < dgl_size; ++i) {
18762 +		entry = get_entry_for_od(dgl_ods[i]);
18763 +		if(entry && is_lock(entry)) {
18764 +			dgl_locks[i] = get_lock(entry);
18765 +			if(!supports_dgl(dgl_locks[i])) {
18766 +				TRACE_CUR("Lock %d does not support all required DGL operations.\n",
18767 +						  dgl_locks[i]->ident);
18768 +				goto out;
18769 +			}
18770 +		}
18771 +		else {
18772 +			TRACE_CUR("Invalid lock identifier\n");
18773 +			goto out;
18774 +		}
18775 +	}
18776 +
18777 +	TS_DGL_UNLOCK_START;
18778 +	err = do_litmus_dgl_unlock(dgl_locks, dgl_size);
18779 +
18780 +	/* Note: task my have been suspended or preempted in between!  Take
18781 +	 * this into account when computing overheads. */
18782 +	TS_DGL_UNLOCK_END;
18783 +
18784 +out:
18785 +	return err;
18786 +}
18787 +
18788 +#else  // CONFIG_LITMUS_DGL_SUPPORT
18789 +
18790 +asmlinkage long sys_litmus_dgl_lock(void* __user usr_dgl_ods, int dgl_size)
18791 +{
18792 +	return -ENOSYS;
18793 +}
18794 +
18795 +asmlinkage long sys_litmus_dgl_unlock(void* __user usr_dgl_ods, int dgl_size)
18796 +{
18797 +	return -ENOSYS;
18798 +}
18799 +
18800 +#endif
18801 +
18802 +#else  // CONFIG_LITMUS_LOCKING
18803 +
18804 +struct fdso_ops generic_lock_ops = {};
18805 +
18806 +asmlinkage long sys_litmus_lock(int sem_od)
18807 +{
18808 +	return -ENOSYS;
18809 +}
18810 +
18811 +asmlinkage long sys_litmus_unlock(int sem_od)
18812 +{
18813 +	return -ENOSYS;
18814 +}
18815 +
18816 +#endif
18817 diff --git a/litmus/nvidia_info.c b/litmus/nvidia_info.c
18818 new file mode 100644
18819 index 0000000..4b86a50
18820 --- /dev/null
18821 +++ b/litmus/nvidia_info.c
18822 @@ -0,0 +1,597 @@
18823 +#include <linux/module.h>
18824 +#include <linux/semaphore.h>
18825 +#include <linux/pci.h>
18826 +
18827 +#include <litmus/sched_trace.h>
18828 +#include <litmus/nvidia_info.h>
18829 +#include <litmus/litmus.h>
18830 +
18831 +#include <litmus/sched_plugin.h>
18832 +
18833 +#include <litmus/binheap.h>
18834 +
18835 +typedef unsigned char      NvV8;  /* "void": enumerated or multiple fields   */
18836 +typedef unsigned short     NvV16; /* "void": enumerated or multiple fields   */
18837 +typedef unsigned char      NvU8;  /* 0 to 255                                */
18838 +typedef unsigned short     NvU16; /* 0 to 65535                              */
18839 +typedef signed char        NvS8;  /* -128 to 127                             */
18840 +typedef signed short       NvS16; /* -32768 to 32767                         */
18841 +typedef float              NvF32; /* IEEE Single Precision (S1E8M23)         */
18842 +typedef double             NvF64; /* IEEE Double Precision (S1E11M52)        */
18843 +typedef unsigned int       NvV32; /* "void": enumerated or multiple fields   */
18844 +typedef unsigned int       NvU32; /* 0 to 4294967295                         */
18845 +typedef unsigned long long NvU64; /* 0 to 18446744073709551615          */
18846 +typedef union
18847 +{
18848 +    volatile NvV8 Reg008[1];
18849 +    volatile NvV16 Reg016[1];
18850 +    volatile NvV32 Reg032[1];
18851 +} litmus_nv_hwreg_t, * litmus_nv_phwreg_t;
18852 +
18853 +typedef struct
18854 +{
18855 +    NvU64 address;
18856 +    NvU64 size;
18857 +    NvU32 offset;
18858 +    NvU32 *map;
18859 +    litmus_nv_phwreg_t map_u;
18860 +} litmus_nv_aperture_t;
18861 +
18862 +typedef struct
18863 +{
18864 +    void  *priv;                    /* private data */
18865 +    void  *os_state;                /* os-specific device state */
18866 +
18867 +    int    rmInitialized;
18868 +    int    flags;
18869 +
18870 +    /* PCI config info */
18871 +    NvU32 domain;
18872 +    NvU16 bus;
18873 +    NvU16 slot;
18874 +    NvU16 vendor_id;
18875 +    NvU16 device_id;
18876 +    NvU16 subsystem_id;
18877 +    NvU32 gpu_id;
18878 +    void *handle;
18879 +
18880 +    NvU32 pci_cfg_space[16];
18881 +
18882 +    /* physical characteristics */
18883 +    litmus_nv_aperture_t bars[3];
18884 +    litmus_nv_aperture_t *regs;
18885 +    litmus_nv_aperture_t *fb, ud;
18886 +    litmus_nv_aperture_t agp;
18887 +
18888 +    NvU32  interrupt_line;
18889 +
18890 +    NvU32 agp_config;
18891 +    NvU32 agp_status;
18892 +
18893 +    NvU32 primary_vga;
18894 +
18895 +    NvU32 sim_env;
18896 +
18897 +    NvU32 rc_timer_enabled;
18898 +
18899 +    /* list of events allocated for this device */
18900 +    void *event_list;
18901 +
18902 +    void *kern_mappings;
18903 +
18904 +} litmus_nv_state_t;
18905 +
18906 +typedef struct work_struct litmus_nv_task_t;
18907 +
18908 +typedef struct litmus_nv_work_s {
18909 +    litmus_nv_task_t task;
18910 +    void *data;
18911 +} litmus_nv_work_t;
18912 +
18913 +typedef struct litmus_nv_linux_state_s {
18914 +    litmus_nv_state_t nv_state;
18915 +    atomic_t usage_count;
18916 +
18917 +    struct pci_dev *dev;
18918 +    void *agp_bridge;
18919 +    void *alloc_queue;
18920 +
18921 +    void *timer_sp;
18922 +    void *isr_sp;
18923 +    void *pci_cfgchk_sp;
18924 +    void *isr_bh_sp;
18925 +
18926 +#ifdef CONFIG_CUDA_4_0
18927 +	char registry_keys[512];
18928 +#endif
18929 +
18930 +    /* keep track of any pending bottom halfes */
18931 +    struct tasklet_struct tasklet;
18932 +    litmus_nv_work_t work;
18933 +
18934 +    /* get a timer callback every second */
18935 +    struct timer_list rc_timer;
18936 +
18937 +    /* lock for linux-specific data, not used by core rm */
18938 +    struct semaphore ldata_lock;
18939 +
18940 +    /* lock for linux-specific alloc queue */
18941 +    struct semaphore at_lock;
18942 +
18943 +#if 0
18944 +#if defined(NV_USER_MAP)
18945 +    /* list of user mappings */
18946 +    struct nv_usermap_s *usermap_list;
18947 +
18948 +    /* lock for VMware-specific mapping list */
18949 +    struct semaphore mt_lock;
18950 +#endif /* defined(NV_USER_MAP) */
18951 +#if defined(NV_PM_SUPPORT_OLD_STYLE_APM)
18952 +	void *apm_nv_dev;
18953 +#endif
18954 +#endif
18955 +
18956 +    NvU32 device_num;
18957 +    struct litmus_nv_linux_state_s *next;
18958 +} litmus_nv_linux_state_t;
18959 +
18960 +void dump_nvidia_info(const struct tasklet_struct *t)
18961 +{
18962 +	litmus_nv_state_t* nvstate = NULL;
18963 +	litmus_nv_linux_state_t* linuxstate =  NULL;
18964 +	struct pci_dev* pci = NULL;
18965 +
18966 +	nvstate = (litmus_nv_state_t*)(t->data);
18967 +
18968 +	if(nvstate)
18969 +	{
18970 +		TRACE("NV State:\n"
18971 +			  "\ttasklet ptr = %p\n"
18972 +			  "\tstate ptr = %p\n"
18973 +			  "\tprivate data ptr = %p\n"
18974 +			  "\tos state ptr = %p\n"
18975 +			  "\tdomain = %u\n"
18976 +			  "\tbus = %u\n"
18977 +			  "\tslot = %u\n"
18978 +			  "\tvender_id = %u\n"
18979 +			  "\tdevice_id = %u\n"
18980 +			  "\tsubsystem_id = %u\n"
18981 +			  "\tgpu_id = %u\n"
18982 +			  "\tinterrupt_line = %u\n",
18983 +			  t,
18984 +			  nvstate,
18985 +			  nvstate->priv,
18986 +			  nvstate->os_state,
18987 +			  nvstate->domain,
18988 +			  nvstate->bus,
18989 +			  nvstate->slot,
18990 +			  nvstate->vendor_id,
18991 +			  nvstate->device_id,
18992 +			  nvstate->subsystem_id,
18993 +			  nvstate->gpu_id,
18994 +			  nvstate->interrupt_line);
18995 +
18996 +		linuxstate = container_of(nvstate, litmus_nv_linux_state_t, nv_state);
18997 +	}
18998 +	else
18999 +	{
19000 +		TRACE("INVALID NVSTATE????\n");
19001 +	}
19002 +
19003 +	if(linuxstate)
19004 +	{
19005 +		int ls_offset = (void*)(&(linuxstate->device_num)) - (void*)(linuxstate);
19006 +		int ns_offset_raw = (void*)(&(linuxstate->device_num)) - (void*)(&(linuxstate->nv_state));
19007 +		int ns_offset_desired = (void*)(&(linuxstate->device_num)) - (void*)(nvstate);
19008 +
19009 +
19010 +		TRACE("LINUX NV State:\n"
19011 +			  "\tlinux nv state ptr: %p\n"
19012 +			  "\taddress of tasklet: %p\n"
19013 +			  "\taddress of work: %p\n"
19014 +			  "\tusage_count: %d\n"
19015 +			  "\tdevice_num: %u\n"
19016 +			  "\ttasklet addr == this tasklet: %d\n"
19017 +			  "\tpci: %p\n",
19018 +			  linuxstate,
19019 +			  &(linuxstate->tasklet),
19020 +			  &(linuxstate->work),
19021 +			  atomic_read(&(linuxstate->usage_count)),
19022 +			  linuxstate->device_num,
19023 +			  (t == &(linuxstate->tasklet)),
19024 +			  linuxstate->dev);
19025 +
19026 +		pci = linuxstate->dev;
19027 +
19028 +		TRACE("Offsets:\n"
19029 +			  "\tOffset from LinuxState: %d, %x\n"
19030 +			  "\tOffset from NVState: %d, %x\n"
19031 +			  "\tOffset from parameter: %d, %x\n"
19032 +			  "\tdevice_num: %u\n",
19033 +			  ls_offset, ls_offset,
19034 +			  ns_offset_raw, ns_offset_raw,
19035 +			  ns_offset_desired, ns_offset_desired,
19036 +			  *((u32*)((void*)nvstate + ns_offset_desired)));
19037 +	}
19038 +	else
19039 +	{
19040 +		TRACE("INVALID LINUXNVSTATE?????\n");
19041 +	}
19042 +
19043 +#if 0
19044 +	if(pci)
19045 +	{
19046 +		TRACE("PCI DEV Info:\n"
19047 +			  "pci device ptr: %p\n"
19048 +			  "\tdevfn = %d\n"
19049 +			  "\tvendor = %d\n"
19050 +			  "\tdevice = %d\n"
19051 +			  "\tsubsystem_vendor = %d\n"
19052 +			  "\tsubsystem_device = %d\n"
19053 +			  "\tslot # = %d\n",
19054 +			  pci,
19055 +			  pci->devfn,
19056 +			  pci->vendor,
19057 +			  pci->device,
19058 +			  pci->subsystem_vendor,
19059 +			  pci->subsystem_device,
19060 +			  pci->slot->number);
19061 +	}
19062 +	else
19063 +	{
19064 +		TRACE("INVALID PCIDEV PTR?????\n");
19065 +	}
19066 +#endif
19067 +}
19068 +
19069 +static struct module* nvidia_mod = NULL;
19070 +int init_nvidia_info(void)
19071 +{
19072 +	mutex_lock(&module_mutex);
19073 +	nvidia_mod = find_module("nvidia");
19074 +	mutex_unlock(&module_mutex);
19075 +	if(nvidia_mod != NULL)
19076 +	{
19077 +		TRACE("%s : Found NVIDIA module. Core Code: %p to %p\n", __FUNCTION__,
19078 +			  (void*)(nvidia_mod->module_core),
19079 +			  (void*)(nvidia_mod->module_core) + nvidia_mod->core_size);
19080 +		init_nv_device_reg();
19081 +		return(0);
19082 +	}
19083 +	else
19084 +	{
19085 +		TRACE("%s : Could not find NVIDIA module!  Loaded?\n", __FUNCTION__);
19086 +		return(-1);
19087 +	}
19088 +}
19089 +
19090 +void shutdown_nvidia_info(void)
19091 +{
19092 +	nvidia_mod = NULL;
19093 +	mb();
19094 +}
19095 +
19096 +/* works with pointers to static data inside the module too. */
19097 +int is_nvidia_func(void* func_addr)
19098 +{
19099 +	int ret = 0;
19100 +	if(nvidia_mod)
19101 +	{
19102 +		ret = within_module_core((long unsigned int)func_addr, nvidia_mod);
19103 +		/*
19104 +		if(ret)
19105 +		{
19106 +			TRACE("%s : %p is in NVIDIA module: %d\n",
19107 +			  	__FUNCTION__, func_addr, ret);
19108 +		}*/
19109 +	}
19110 +
19111 +	return(ret);
19112 +}
19113 +
19114 +u32 get_tasklet_nv_device_num(const struct tasklet_struct *t)
19115 +{
19116 +	// life is too short to use hard-coded offsets.  update this later.
19117 +	litmus_nv_state_t* nvstate = (litmus_nv_state_t*)(t->data);
19118 +	litmus_nv_linux_state_t* linuxstate = container_of(nvstate, litmus_nv_linux_state_t, nv_state);
19119 +
19120 +	BUG_ON(linuxstate->device_num >= NV_DEVICE_NUM);
19121 +
19122 +	return(linuxstate->device_num);
19123 +
19124 +	//int DEVICE_NUM_OFFSET = (void*)(&(linuxstate->device_num)) - (void*)(nvstate);
19125 +
19126 +#if 0
19127 +	// offset determined though observed behavior of the NV driver.
19128 +	//const int DEVICE_NUM_OFFSET = 0x480;  // CUDA 4.0 RC1
19129 +	//const int DEVICE_NUM_OFFSET = 0x510;  // CUDA 4.0 RC2
19130 +
19131 +	void* state = (void*)(t->data);
19132 +	void* device_num_ptr = state + DEVICE_NUM_OFFSET;
19133 +
19134 +	//dump_nvidia_info(t);
19135 +	return(*((u32*)device_num_ptr));
19136 +#endif
19137 +}
19138 +
19139 +u32 get_work_nv_device_num(const struct work_struct *t)
19140 +{
19141 +	// offset determined though observed behavior of the NV driver.
19142 +	const int DEVICE_NUM_OFFSET = sizeof(struct work_struct);
19143 +	void* state = (void*)(t);
19144 +	void** device_num_ptr = state + DEVICE_NUM_OFFSET;
19145 +	return(*((u32*)(*device_num_ptr)));
19146 +}
19147 +
19148 +
19149 +typedef struct {
19150 +	raw_spinlock_t	lock;
19151 +	int	nr_owners;
19152 +	struct task_struct* max_prio_owner;
19153 +	struct task_struct*	owners[NV_MAX_SIMULT_USERS];
19154 +}nv_device_registry_t;
19155 +
19156 +static nv_device_registry_t NV_DEVICE_REG[NV_DEVICE_NUM];
19157 +
19158 +int init_nv_device_reg(void)
19159 +{
19160 +	int i;
19161 +
19162 +	memset(NV_DEVICE_REG, 0, sizeof(NV_DEVICE_REG));
19163 +
19164 +	for(i = 0; i < NV_DEVICE_NUM; ++i)
19165 +	{
19166 +		raw_spin_lock_init(&NV_DEVICE_REG[i].lock);
19167 +	}
19168 +
19169 +	return(1);
19170 +}
19171 +
19172 +/* use to get nv_device_id by given owner.
19173 + (if return -1, can't get the assocaite device id)*/
19174 +/*
19175 +int get_nv_device_id(struct task_struct* owner)
19176 +{
19177 +	int i;
19178 +	if(!owner)
19179 +	{
19180 +		return(-1);
19181 +	}
19182 +	for(i = 0; i < NV_DEVICE_NUM; ++i)
19183 +	{
19184 +		if(NV_DEVICE_REG[i].device_owner == owner)
19185 +			return(i);
19186 +	}
19187 +	return(-1);
19188 +}
19189 +*/
19190 +
19191 +static struct task_struct* find_hp_owner(nv_device_registry_t *reg, struct task_struct *skip) {
19192 +	int i;
19193 +	struct task_struct *found = NULL;
19194 +	for(i = 0; i < reg->nr_owners; ++i) {
19195 +		if(reg->owners[i] && reg->owners[i] != skip && litmus->compare(reg->owners[i], found)) {
19196 +			found = reg->owners[i];
19197 +		}
19198 +	}
19199 +	return found;
19200 +}
19201 +
19202 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
19203 +void pai_check_priority_increase(struct task_struct *t, int reg_device_id)
19204 +{
19205 +	unsigned long flags;
19206 +	nv_device_registry_t *reg = &NV_DEVICE_REG[reg_device_id];
19207 +
19208 +	if(reg->max_prio_owner != t) {
19209 +
19210 +		raw_spin_lock_irqsave(&reg->lock, flags);
19211 +
19212 +		if(reg->max_prio_owner != t) {
19213 +			if(litmus->compare(t, reg->max_prio_owner)) {
19214 +				litmus->change_prio_pai_tasklet(reg->max_prio_owner, t);
19215 +				reg->max_prio_owner = t;
19216 +			}
19217 +		}
19218 +
19219 +		raw_spin_unlock_irqrestore(&reg->lock, flags);
19220 +	}
19221 +}
19222 +
19223 +
19224 +void pai_check_priority_decrease(struct task_struct *t, int reg_device_id)
19225 +{
19226 +	unsigned long flags;
19227 +	nv_device_registry_t *reg = &NV_DEVICE_REG[reg_device_id];
19228 +
19229 +	if(reg->max_prio_owner == t) {
19230 +
19231 +		raw_spin_lock_irqsave(&reg->lock, flags);
19232 +
19233 +		if(reg->max_prio_owner == t) {
19234 +			reg->max_prio_owner = find_hp_owner(reg, NULL);
19235 +			if(reg->max_prio_owner != t) {
19236 +				litmus->change_prio_pai_tasklet(t, reg->max_prio_owner);
19237 +			}
19238 +		}
19239 +
19240 +		raw_spin_unlock_irqrestore(&reg->lock, flags);
19241 +	}
19242 +}
19243 +#endif
19244 +
19245 +static int __reg_nv_device(int reg_device_id, struct task_struct *t)
19246 +{
19247 +	int ret = 0;
19248 +	int i;
19249 +	struct task_struct *old_max = NULL;
19250 +	unsigned long flags;
19251 +	nv_device_registry_t *reg = &NV_DEVICE_REG[reg_device_id];
19252 +
19253 +    if(test_bit(reg_device_id, &tsk_rt(t)->held_gpus)) {
19254 +		// TODO: check if taks is already registered.
19255 +		return ret;  // assume already registered.
19256 +	}
19257 +
19258 +
19259 +	raw_spin_lock_irqsave(&reg->lock, flags);
19260 +
19261 +	if(reg->nr_owners < NV_MAX_SIMULT_USERS) {
19262 +		TRACE_TASK(t, "registers GPU %d\n", reg_device_id);
19263 +		for(i = 0; i < NV_MAX_SIMULT_USERS; ++i) {
19264 +			if(reg->owners[i] == NULL) {
19265 +				reg->owners[i] = t;
19266 +
19267 +				//if(edf_higher_prio(t, reg->max_prio_owner)) {
19268 +				if(litmus->compare(t, reg->max_prio_owner)) {
19269 +					old_max = reg->max_prio_owner;
19270 +					reg->max_prio_owner = t;
19271 +
19272 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
19273 +					litmus->change_prio_pai_tasklet(old_max, t);
19274 +#endif
19275 +				}
19276 +
19277 +#ifdef CONFIG_LITMUS_SOFTIRQD
19278 +				down_and_set_stat(t, HELD, &tsk_rt(t)->klitirqd_sem);
19279 +#endif
19280 +				++(reg->nr_owners);
19281 +
19282 +				break;
19283 +			}
19284 +		}
19285 +	}
19286 +	else
19287 +	{
19288 +		TRACE_CUR("%s: device %d is already in use!\n", __FUNCTION__, reg_device_id);
19289 +		//ret = -EBUSY;
19290 +	}
19291 +
19292 +	raw_spin_unlock_irqrestore(&reg->lock, flags);
19293 +
19294 +	__set_bit(reg_device_id, &tsk_rt(t)->held_gpus);
19295 +
19296 +	return(ret);
19297 +}
19298 +
19299 +static int __clear_reg_nv_device(int de_reg_device_id, struct task_struct *t)
19300 +{
19301 +	int ret = 0;
19302 +	int i;
19303 +	unsigned long flags;
19304 +	nv_device_registry_t *reg = &NV_DEVICE_REG[de_reg_device_id];
19305 +
19306 +#ifdef CONFIG_LITMUS_SOFTIRQD
19307 +    struct task_struct* klitirqd_th = get_klitirqd(de_reg_device_id);
19308 +#endif
19309 +
19310 +	if(!test_bit(de_reg_device_id, &tsk_rt(t)->held_gpus)) {
19311 +		return ret;
19312 +	}
19313 +
19314 +	raw_spin_lock_irqsave(&reg->lock, flags);
19315 +
19316 +	TRACE_TASK(t, "unregisters GPU %d\n", de_reg_device_id);
19317 +
19318 +	for(i = 0; i < NV_MAX_SIMULT_USERS; ++i) {
19319 +		if(reg->owners[i] == t) {
19320 +#ifdef CONFIG_LITMUS_SOFTIRQD
19321 +			flush_pending(klitirqd_th, t);
19322 +#endif
19323 +			if(reg->max_prio_owner == t) {
19324 +				reg->max_prio_owner = find_hp_owner(reg, t);
19325 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
19326 +				litmus->change_prio_pai_tasklet(t, reg->max_prio_owner);
19327 +#endif
19328 +			}
19329 +
19330 +#ifdef CONFIG_LITMUS_SOFTIRQD
19331 +			up_and_set_stat(t, NOT_HELD, &tsk_rt(t)->klitirqd_sem);
19332 +#endif
19333 +
19334 +			reg->owners[i] = NULL;
19335 +			--(reg->nr_owners);
19336 +
19337 +			break;
19338 +		}
19339 +	}
19340 +
19341 +	raw_spin_unlock_irqrestore(&reg->lock, flags);
19342 +
19343 +	__clear_bit(de_reg_device_id, &tsk_rt(t)->held_gpus);
19344 +
19345 +	return(ret);
19346 +}
19347 +
19348 +
19349 +int reg_nv_device(int reg_device_id, int reg_action, struct task_struct *t)
19350 +{
19351 +	int ret;
19352 +
19353 +	if((reg_device_id < NV_DEVICE_NUM) && (reg_device_id >= 0))
19354 +	{
19355 +		if(reg_action)
19356 +			ret = __reg_nv_device(reg_device_id, t);
19357 +		else
19358 +			ret = __clear_reg_nv_device(reg_device_id, t);
19359 +	}
19360 +	else
19361 +	{
19362 +		ret = -ENODEV;
19363 +	}
19364 +
19365 +	return(ret);
19366 +}
19367 +
19368 +/* use to get the owner of nv_device_id. */
19369 +struct task_struct* get_nv_max_device_owner(u32 target_device_id)
19370 +{
19371 +	struct task_struct *owner = NULL;
19372 +	BUG_ON(target_device_id >= NV_DEVICE_NUM);
19373 +	owner = NV_DEVICE_REG[target_device_id].max_prio_owner;
19374 +	return(owner);
19375 +}
19376 +
19377 +void lock_nv_registry(u32 target_device_id, unsigned long* flags)
19378 +{
19379 +	BUG_ON(target_device_id >= NV_DEVICE_NUM);
19380 +
19381 +	if(in_interrupt())
19382 +		TRACE("Locking registry for %d.\n", target_device_id);
19383 +	else
19384 +		TRACE_CUR("Locking registry for %d.\n", target_device_id);
19385 +
19386 +	raw_spin_lock_irqsave(&NV_DEVICE_REG[target_device_id].lock, *flags);
19387 +}
19388 +
19389 +void unlock_nv_registry(u32 target_device_id, unsigned long* flags)
19390 +{
19391 +	BUG_ON(target_device_id >= NV_DEVICE_NUM);
19392 +
19393 +	if(in_interrupt())
19394 +		TRACE("Unlocking registry for %d.\n", target_device_id);
19395 +	else
19396 +		TRACE_CUR("Unlocking registry for %d.\n", target_device_id);
19397 +
19398 +	raw_spin_unlock_irqrestore(&NV_DEVICE_REG[target_device_id].lock, *flags);
19399 +}
19400 +
19401 +
19402 +//void increment_nv_int_count(u32 device)
19403 +//{
19404 +//	unsigned long flags;
19405 +//	struct task_struct* owner;
19406 +//
19407 +//	lock_nv_registry(device, &flags);
19408 +//
19409 +//	owner = NV_DEVICE_REG[device].device_owner;
19410 +//	if(owner)
19411 +//	{
19412 +//		atomic_inc(&tsk_rt(owner)->nv_int_count);
19413 +//	}
19414 +//
19415 +//	unlock_nv_registry(device, &flags);
19416 +//}
19417 +//EXPORT_SYMBOL(increment_nv_int_count);
19418 +
19419 +
19420 diff --git a/litmus/preempt.c b/litmus/preempt.c
19421 new file mode 100644
19422 index 0000000..28368d5
19423 --- /dev/null
19424 +++ b/litmus/preempt.c
19425 @@ -0,0 +1,138 @@
19426 +#include <linux/sched.h>
19427 +
19428 +#include <litmus/litmus.h>
19429 +#include <litmus/preempt.h>
19430 +
19431 +/* The rescheduling state of each processor.
19432 + */
19433 +DEFINE_PER_CPU_SHARED_ALIGNED(atomic_t, resched_state);
19434 +
19435 +void sched_state_will_schedule(struct task_struct* tsk)
19436 +{
19437 +	/* Litmus hack: we only care about processor-local invocations of
19438 +	 * set_tsk_need_resched(). We can't reliably set the flag remotely
19439 +	 * since it might race with other updates to the scheduling state.  We
19440 +	 * can't rely on the runqueue lock protecting updates to the sched
19441 +	 * state since processors do not acquire the runqueue locks for all
19442 +	 * updates to the sched state (to avoid acquiring two runqueue locks at
19443 +	 * the same time). Further, if tsk is residing on a remote processor,
19444 +	 * then that processor doesn't actually know yet that it is going to
19445 +	 * reschedule; it still must receive an IPI (unless a local invocation
19446 +	 * races).
19447 +	 */
19448 +	if (likely(task_cpu(tsk) == smp_processor_id())) {
19449 +		VERIFY_SCHED_STATE(TASK_SCHEDULED | SHOULD_SCHEDULE | TASK_PICKED | WILL_SCHEDULE);
19450 +		if (is_in_sched_state(TASK_PICKED | PICKED_WRONG_TASK))
19451 +			set_sched_state(PICKED_WRONG_TASK);
19452 +		else
19453 +			set_sched_state(WILL_SCHEDULE);
19454 +	} else
19455 +		/* Litmus tasks should never be subject to a remote
19456 +		 * set_tsk_need_resched(). */
19457 +		BUG_ON(is_realtime(tsk));
19458 +
19459 +#ifdef CONFIG_PREEMPT_STATE_TRACE
19460 +	TRACE_TASK(tsk, "set_tsk_need_resched() ret:%p\n",
19461 +		   __builtin_return_address(0));
19462 +#endif
19463 +}
19464 +
19465 +/* Called by the IPI handler after another CPU called smp_send_resched(). */
19466 +void sched_state_ipi(void)
19467 +{
19468 +	/* If the IPI was slow, we might be in any state right now. The IPI is
19469 +	 * only meaningful if we are in SHOULD_SCHEDULE. */
19470 +	if (is_in_sched_state(SHOULD_SCHEDULE)) {
19471 +		/* Cause scheduler to be invoked.
19472 +		 * This will cause a transition to WILL_SCHEDULE. */
19473 +		set_tsk_need_resched(current);
19474 +		/*
19475 +		TRACE_STATE("IPI -> set_tsk_need_resched(%s/%d)\n",
19476 +			    current->comm, current->pid);
19477 +		*/
19478 +	} else {
19479 +		/* ignore */
19480 +		/*
19481 +		TRACE_STATE("ignoring IPI in state %x (%s)\n",
19482 +			    get_sched_state(),
19483 +			    sched_state_name(get_sched_state()));
19484 +		*/
19485 +	}
19486 +}
19487 +
19488 +/* Called by plugins to cause a CPU to reschedule. IMPORTANT: the caller must
19489 + * hold the lock that is used to serialize scheduling decisions. */
19490 +void litmus_reschedule(int cpu)
19491 +{
19492 +	int picked_transition_ok = 0;
19493 +	int scheduled_transition_ok = 0;
19494 +
19495 +	/* The (remote) CPU could be in any state. */
19496 +
19497 +	/* The critical states are TASK_PICKED and TASK_SCHEDULED, as the CPU
19498 +	 * is not aware of the need to reschedule at this point. */
19499 +
19500 +	/* is a context switch in progress? */
19501 +	if (cpu_is_in_sched_state(cpu, TASK_PICKED))
19502 +		picked_transition_ok = sched_state_transition_on(
19503 +			cpu, TASK_PICKED, PICKED_WRONG_TASK);
19504 +
19505 +	if (!picked_transition_ok &&
19506 +	    cpu_is_in_sched_state(cpu, TASK_SCHEDULED)) {
19507 +		/* We either raced with the end of the context switch, or the
19508 +		 * CPU was in TASK_SCHEDULED anyway. */
19509 +		scheduled_transition_ok = sched_state_transition_on(
19510 +			cpu, TASK_SCHEDULED, SHOULD_SCHEDULE);
19511 +	}
19512 +
19513 +	/* If the CPU was in state TASK_SCHEDULED, then we need to cause the
19514 +	 * scheduler to be invoked. */
19515 +	if (scheduled_transition_ok) {
19516 +		if (smp_processor_id() == cpu)
19517 +			set_tsk_need_resched(current);
19518 +		else
19519 +			smp_send_reschedule(cpu);
19520 +	}
19521 +
19522 +	TRACE_STATE("%s picked-ok:%d sched-ok:%d\n",
19523 +		    __FUNCTION__,
19524 +		    picked_transition_ok,
19525 +		    scheduled_transition_ok);
19526 +}
19527 +
19528 +void litmus_reschedule_local(void)
19529 +{
19530 +	if (is_in_sched_state(TASK_PICKED))
19531 +		set_sched_state(PICKED_WRONG_TASK);
19532 +	else if (is_in_sched_state(TASK_SCHEDULED | SHOULD_SCHEDULE)) {
19533 +		set_sched_state(WILL_SCHEDULE);
19534 +		set_tsk_need_resched(current);
19535 +	}
19536 +}
19537 +
19538 +#ifdef CONFIG_DEBUG_KERNEL
19539 +
19540 +void sched_state_plugin_check(void)
19541 +{
19542 +	if (!is_in_sched_state(TASK_PICKED | PICKED_WRONG_TASK)) {
19543 +		TRACE("!!!! plugin did not call sched_state_task_picked()!"
19544 +		      "Calling sched_state_task_picked() is mandatory---fix this.\n");
19545 +		set_sched_state(TASK_PICKED);
19546 +	}
19547 +}
19548 +
19549 +#define NAME_CHECK(x) case x:  return #x
19550 +const char* sched_state_name(int s)
19551 +{
19552 +	switch (s) {
19553 +		NAME_CHECK(TASK_SCHEDULED);
19554 +		NAME_CHECK(SHOULD_SCHEDULE);
19555 +		NAME_CHECK(WILL_SCHEDULE);
19556 +		NAME_CHECK(TASK_PICKED);
19557 +		NAME_CHECK(PICKED_WRONG_TASK);
19558 +	default:
19559 +		return "UNKNOWN";
19560 +	};
19561 +}
19562 +
19563 +#endif
19564 diff --git a/litmus/rsm_lock.c b/litmus/rsm_lock.c
19565 new file mode 100644
19566 index 0000000..75ed87c
19567 --- /dev/null
19568 +++ b/litmus/rsm_lock.c
19569 @@ -0,0 +1,796 @@
19570 +#include <linux/slab.h>
19571 +#include <linux/uaccess.h>
19572 +
19573 +#include <litmus/trace.h>
19574 +#include <litmus/sched_plugin.h>
19575 +#include <litmus/rsm_lock.h>
19576 +
19577 +//#include <litmus/edf_common.h>
19578 +
19579 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
19580 +#include <litmus/gpu_affinity.h>
19581 +#endif
19582 +
19583 +
19584 +/* caller is responsible for locking */
19585 +static struct task_struct* rsm_mutex_find_hp_waiter(struct rsm_mutex *mutex,
19586 +                                             struct task_struct* skip)
19587 +{
19588 +    wait_queue_t        *q;
19589 +    struct list_head    *pos;
19590 +    struct task_struct  *queued = NULL, *found = NULL;
19591 +
19592 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
19593 +    dgl_wait_state_t    *dgl_wait = NULL;
19594 +#endif
19595 +
19596 +    list_for_each(pos, &mutex->wait.task_list) {
19597 +        q = list_entry(pos, wait_queue_t, task_list);
19598 +
19599 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
19600 +        if(q->func == dgl_wake_up) {
19601 +            dgl_wait = (dgl_wait_state_t*) q->private;
19602 +            if(tsk_rt(dgl_wait->task)->blocked_lock == &mutex->litmus_lock) {
19603 +                queued = dgl_wait->task;
19604 +            }
19605 +            else {
19606 +                queued = NULL;  // skip it.
19607 +            }
19608 +        }
19609 +        else {
19610 +            queued = (struct task_struct*) q->private;
19611 +        }
19612 +#else
19613 +        queued = (struct task_struct*) q->private;
19614 +#endif
19615 +
19616 +        /* Compare task prios, find high prio task. */
19617 +        //if (queued && queued != skip && edf_higher_prio(queued, found)) {
19618 +		if (queued && queued != skip && litmus->compare(queued, found)) {
19619 +            found = queued;
19620 +        }
19621 +    }
19622 +    return found;
19623 +}
19624 +
19625 +
19626 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
19627 +
19628 +int rsm_mutex_is_owner(struct litmus_lock *l, struct task_struct *t)
19629 +{
19630 +	struct rsm_mutex *mutex = rsm_mutex_from_lock(l);
19631 +	return(mutex->owner == t);
19632 +}
19633 +
19634 +// return 1 if resource was immediatly acquired.
19635 +// Assumes mutex->lock is held.
19636 +// Must set task state to TASK_UNINTERRUPTIBLE if task blocks.
19637 +int rsm_mutex_dgl_lock(struct litmus_lock *l, dgl_wait_state_t* dgl_wait,
19638 +					   wait_queue_t* wq_node)
19639 +{
19640 +	struct rsm_mutex *mutex = rsm_mutex_from_lock(l);
19641 +	struct task_struct *t = dgl_wait->task;
19642 +
19643 +	int acquired_immediatly = 0;
19644 +
19645 +	BUG_ON(t != current);
19646 +
19647 +	if (mutex->owner) {
19648 +		TRACE_TASK(t, "Enqueuing on lock %d.\n", l->ident);
19649 +
19650 +		init_dgl_waitqueue_entry(wq_node, dgl_wait);
19651 +
19652 +		set_task_state(t, TASK_UNINTERRUPTIBLE);
19653 +		__add_wait_queue_tail_exclusive(&mutex->wait, wq_node);
19654 +	} else {
19655 +		TRACE_TASK(t, "Acquired lock %d with no blocking.\n", l->ident);
19656 +
19657 +		/* it's ours now */
19658 +		mutex->owner = t;
19659 +
19660 +		raw_spin_lock(&tsk_rt(t)->hp_blocked_tasks_lock);
19661 +		binheap_add(&l->nest.hp_binheap_node, &tsk_rt(t)->hp_blocked_tasks,
19662 +					struct nested_info, hp_binheap_node);
19663 +		raw_spin_unlock(&tsk_rt(t)->hp_blocked_tasks_lock);
19664 +
19665 +		acquired_immediatly = 1;
19666 +	}
19667 +
19668 +	return acquired_immediatly;
19669 +}
19670 +
19671 +void rsm_mutex_enable_priority(struct litmus_lock *l,
19672 +							   dgl_wait_state_t* dgl_wait)
19673 +{
19674 +	struct rsm_mutex *mutex = rsm_mutex_from_lock(l);
19675 +	struct task_struct *t = dgl_wait->task;
19676 +	struct task_struct *owner = mutex->owner;
19677 +	unsigned long flags = 0;  // these are unused under DGL coarse-grain locking
19678 +
19679 +	BUG_ON(owner == t);
19680 +
19681 +	tsk_rt(t)->blocked_lock = l;
19682 +	mb();
19683 +
19684 +	//if (edf_higher_prio(t, mutex->hp_waiter)) {
19685 +	if (litmus->compare(t, mutex->hp_waiter)) {
19686 +
19687 +		struct task_struct *old_max_eff_prio;
19688 +		struct task_struct *new_max_eff_prio;
19689 +		struct task_struct *new_prio = NULL;
19690 +
19691 +		if(mutex->hp_waiter)
19692 +			TRACE_TASK(t, "has higher prio than hp_waiter (%s/%d).\n",
19693 +					   mutex->hp_waiter->comm, mutex->hp_waiter->pid);
19694 +		else
19695 +			TRACE_TASK(t, "has higher prio than hp_waiter (NIL).\n");
19696 +
19697 +		raw_spin_lock(&tsk_rt(owner)->hp_blocked_tasks_lock);
19698 +
19699 +		old_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
19700 +		mutex->hp_waiter = t;
19701 +		l->nest.hp_waiter_eff_prio = effective_priority(mutex->hp_waiter);
19702 +		binheap_decrease(&l->nest.hp_binheap_node,
19703 +						 &tsk_rt(owner)->hp_blocked_tasks);
19704 +		new_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
19705 +
19706 +		if(new_max_eff_prio != old_max_eff_prio) {
19707 +			TRACE_TASK(t, "is new hp_waiter.\n");
19708 +
19709 +			if ((effective_priority(owner) == old_max_eff_prio) ||
19710 +				//(__edf_higher_prio(new_max_eff_prio, BASE, owner, EFFECTIVE))){
19711 +				(litmus->__compare(new_max_eff_prio, BASE, owner, EFFECTIVE))){
19712 +				new_prio = new_max_eff_prio;
19713 +			}
19714 +		}
19715 +		else {
19716 +			TRACE_TASK(t, "no change in max_eff_prio of heap.\n");
19717 +		}
19718 +
19719 +		if(new_prio) {
19720 +			litmus->nested_increase_prio(owner, new_prio,
19721 +										 &mutex->lock, flags);  // unlocks lock.
19722 +		}
19723 +		else {
19724 +			raw_spin_unlock(&tsk_rt(owner)->hp_blocked_tasks_lock);
19725 +			unlock_fine_irqrestore(&mutex->lock, flags);
19726 +		}
19727 +	}
19728 +	else {
19729 +		TRACE_TASK(t, "no change in hp_waiter.\n");
19730 +		unlock_fine_irqrestore(&mutex->lock, flags);
19731 +	}
19732 +}
19733 +
19734 +static void select_next_lock_if_primary(struct litmus_lock *l,
19735 +										dgl_wait_state_t *dgl_wait)
19736 +{
19737 +	if(tsk_rt(dgl_wait->task)->blocked_lock == l) {
19738 +		TRACE_CUR("Lock %d in DGL was primary for %s/%d.\n",
19739 +				  l->ident, dgl_wait->task->comm, dgl_wait->task->pid);
19740 +		tsk_rt(dgl_wait->task)->blocked_lock = NULL;
19741 +		mb();
19742 +		select_next_lock(dgl_wait /*, l*/);  // pick the next lock to be blocked on
19743 +	}
19744 +	else {
19745 +		TRACE_CUR("Got lock early! Lock %d in DGL was NOT primary for %s/%d.\n",
19746 +				  l->ident, dgl_wait->task->comm, dgl_wait->task->pid);
19747 +	}
19748 +}
19749 +#endif
19750 +
19751 +
19752 +
19753 +
19754 +int rsm_mutex_lock(struct litmus_lock* l)
19755 +{
19756 +	struct task_struct *t = current;
19757 +	struct task_struct *owner;
19758 +	struct rsm_mutex *mutex = rsm_mutex_from_lock(l);
19759 +	wait_queue_t wait;
19760 +	unsigned long flags;
19761 +
19762 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
19763 +	raw_spinlock_t *dgl_lock;
19764 +#endif
19765 +
19766 +	if (!is_realtime(t))
19767 +		return -EPERM;
19768 +
19769 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
19770 +	dgl_lock = litmus->get_dgl_spinlock(t);
19771 +#endif
19772 +
19773 +	lock_global_irqsave(dgl_lock, flags);
19774 +	lock_fine_irqsave(&mutex->lock, flags);
19775 +
19776 +	if (mutex->owner) {
19777 +		TRACE_TASK(t, "Blocking on lock %d.\n", l->ident);
19778 +
19779 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
19780 +		// KLUDGE: don't count this suspension as time in the critical gpu
19781 +		// critical section
19782 +		if(tsk_rt(t)->held_gpus) {
19783 +			tsk_rt(t)->suspend_gpu_tracker_on_block = 1;
19784 +		}
19785 +#endif
19786 +
19787 +		/* resource is not free => must suspend and wait */
19788 +
19789 +		owner = mutex->owner;
19790 +
19791 +		init_waitqueue_entry(&wait, t);
19792 +
19793 +		tsk_rt(t)->blocked_lock = l;  /* record where we are blocked */
19794 +		mb();  // needed?
19795 +
19796 +		/* FIXME: interruptible would be nice some day */
19797 +		set_task_state(t, TASK_UNINTERRUPTIBLE);
19798 +
19799 +		__add_wait_queue_tail_exclusive(&mutex->wait, &wait);
19800 +
19801 +		/* check if we need to activate priority inheritance */
19802 +		//if (edf_higher_prio(t, mutex->hp_waiter)) {
19803 +		if (litmus->compare(t, mutex->hp_waiter)) {
19804 +
19805 +			struct task_struct *old_max_eff_prio;
19806 +			struct task_struct *new_max_eff_prio;
19807 +			struct task_struct *new_prio = NULL;
19808 +
19809 +			if(mutex->hp_waiter)
19810 +				TRACE_TASK(t, "has higher prio than hp_waiter (%s/%d).\n",
19811 +						   mutex->hp_waiter->comm, mutex->hp_waiter->pid);
19812 +			else
19813 +				TRACE_TASK(t, "has higher prio than hp_waiter (NIL).\n");
19814 +
19815 +			raw_spin_lock(&tsk_rt(owner)->hp_blocked_tasks_lock);
19816 +
19817 +			old_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
19818 +			mutex->hp_waiter = t;
19819 +			l->nest.hp_waiter_eff_prio = effective_priority(mutex->hp_waiter);
19820 +			binheap_decrease(&l->nest.hp_binheap_node,
19821 +							 &tsk_rt(owner)->hp_blocked_tasks);
19822 +			new_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
19823 +
19824 +			if(new_max_eff_prio != old_max_eff_prio) {
19825 +				TRACE_TASK(t, "is new hp_waiter.\n");
19826 +
19827 +				if ((effective_priority(owner) == old_max_eff_prio) ||
19828 +					//(__edf_higher_prio(new_max_eff_prio, BASE, owner, EFFECTIVE))){
19829 +					(litmus->__compare(new_max_eff_prio, BASE, owner, EFFECTIVE))){
19830 +					new_prio = new_max_eff_prio;
19831 +				}
19832 +			}
19833 +			else {
19834 +				TRACE_TASK(t, "no change in max_eff_prio of heap.\n");
19835 +			}
19836 +
19837 +			if(new_prio) {
19838 +				litmus->nested_increase_prio(owner, new_prio, &mutex->lock,
19839 +											 flags);  // unlocks lock.
19840 +			}
19841 +			else {
19842 +				raw_spin_unlock(&tsk_rt(owner)->hp_blocked_tasks_lock);
19843 +				unlock_fine_irqrestore(&mutex->lock, flags);
19844 +			}
19845 +		}
19846 +		else {
19847 +			TRACE_TASK(t, "no change in hp_waiter.\n");
19848 +
19849 +			unlock_fine_irqrestore(&mutex->lock, flags);
19850 +		}
19851 +
19852 +		unlock_global_irqrestore(dgl_lock, flags);
19853 +
19854 +		TS_LOCK_SUSPEND;
19855 +
19856 +		/* We depend on the FIFO order.  Thus, we don't need to recheck
19857 +		 * when we wake up; we are guaranteed to have the lock since
19858 +		 * there is only one wake up per release.
19859 +		 */
19860 +
19861 +		schedule();
19862 +
19863 +		TS_LOCK_RESUME;
19864 +
19865 +		/* Since we hold the lock, no other task will change
19866 +		 * ->owner. We can thus check it without acquiring the spin
19867 +		 * lock. */
19868 +		BUG_ON(mutex->owner != t);
19869 +
19870 +		TRACE_TASK(t, "Acquired lock %d.\n", l->ident);
19871 +
19872 +	} else {
19873 +		TRACE_TASK(t, "Acquired lock %d with no blocking.\n", l->ident);
19874 +
19875 +		/* it's ours now */
19876 +		mutex->owner = t;
19877 +
19878 +		raw_spin_lock(&tsk_rt(mutex->owner)->hp_blocked_tasks_lock);
19879 +		binheap_add(&l->nest.hp_binheap_node, &tsk_rt(t)->hp_blocked_tasks,
19880 +					struct nested_info, hp_binheap_node);
19881 +		raw_spin_unlock(&tsk_rt(mutex->owner)->hp_blocked_tasks_lock);
19882 +
19883 +
19884 +		unlock_fine_irqrestore(&mutex->lock, flags);
19885 +		unlock_global_irqrestore(dgl_lock, flags);
19886 +	}
19887 +
19888 +	return 0;
19889 +}
19890 +
19891 +
19892 +
19893 +int rsm_mutex_unlock(struct litmus_lock* l)
19894 +{
19895 +	struct task_struct *t = current, *next = NULL;
19896 +	struct rsm_mutex *mutex = rsm_mutex_from_lock(l);
19897 +	unsigned long flags;
19898 +
19899 +	struct task_struct *old_max_eff_prio;
19900 +
19901 +	int wake_up_task = 1;
19902 +
19903 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
19904 +	dgl_wait_state_t *dgl_wait = NULL;
19905 +	raw_spinlock_t *dgl_lock = litmus->get_dgl_spinlock(t);
19906 +#endif
19907 +
19908 +	int err = 0;
19909 +
19910 +	if (mutex->owner != t) {
19911 +		err = -EINVAL;
19912 +		return err;
19913 +	}
19914 +
19915 +	lock_global_irqsave(dgl_lock, flags);
19916 +	lock_fine_irqsave(&mutex->lock, flags);
19917 +
19918 +	raw_spin_lock(&tsk_rt(t)->hp_blocked_tasks_lock);
19919 +
19920 +	TRACE_TASK(t, "Freeing lock %d\n", l->ident);
19921 +
19922 +	old_max_eff_prio = top_priority(&tsk_rt(t)->hp_blocked_tasks);
19923 +	binheap_delete(&l->nest.hp_binheap_node, &tsk_rt(t)->hp_blocked_tasks);
19924 +
19925 +	if(tsk_rt(t)->inh_task){
19926 +		struct task_struct *new_max_eff_prio =
19927 +			top_priority(&tsk_rt(t)->hp_blocked_tasks);
19928 +
19929 +		if((new_max_eff_prio == NULL) ||
19930 +		      /* there was a change in eff prio */
19931 +		   (  (new_max_eff_prio != old_max_eff_prio) &&
19932 +			/* and owner had the old eff prio */
19933 +			  (effective_priority(t) == old_max_eff_prio))  )
19934 +		{
19935 +			// old_max_eff_prio > new_max_eff_prio
19936 +
19937 +			//if(__edf_higher_prio(new_max_eff_prio, BASE, t, EFFECTIVE)) {
19938 +			if(litmus->__compare(new_max_eff_prio, BASE, t, EFFECTIVE)) {
19939 +				TRACE_TASK(t, "new_max_eff_prio > task's eff_prio-- new_max_eff_prio: %s/%d   task: %s/%d [%s/%d]\n",
19940 +						   new_max_eff_prio->comm, new_max_eff_prio->pid,
19941 +						   t->comm, t->pid, tsk_rt(t)->inh_task->comm,
19942 +						   tsk_rt(t)->inh_task->pid);
19943 +				WARN_ON(1);
19944 +			}
19945 +
19946 +			litmus->decrease_prio(t, new_max_eff_prio);
19947 +		}
19948 +	}
19949 +
19950 +	if(binheap_empty(&tsk_rt(t)->hp_blocked_tasks) &&
19951 +	   tsk_rt(t)->inh_task != NULL)
19952 +	{
19953 +		WARN_ON(tsk_rt(t)->inh_task != NULL);
19954 +		TRACE_TASK(t, "No more locks are held, but eff_prio = %s/%d\n",
19955 +				   tsk_rt(t)->inh_task->comm, tsk_rt(t)->inh_task->pid);
19956 +	}
19957 +
19958 +	raw_spin_unlock(&tsk_rt(t)->hp_blocked_tasks_lock);
19959 +
19960 +
19961 +	/* check if there are jobs waiting for this resource */
19962 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
19963 +	__waitqueue_dgl_remove_first(&mutex->wait, &dgl_wait, &next);
19964 +	if(dgl_wait) {
19965 +		next = dgl_wait->task;
19966 +		//select_next_lock_if_primary(l, dgl_wait);
19967 +	}
19968 +#else
19969 +	next = __waitqueue_remove_first(&mutex->wait);
19970 +#endif
19971 +	if (next) {
19972 +		/* next becomes the resouce holder */
19973 +		mutex->owner = next;
19974 +		TRACE_CUR("lock ownership passed to %s/%d\n", next->comm, next->pid);
19975 +
19976 +		/* determine new hp_waiter if necessary */
19977 +		if (next == mutex->hp_waiter) {
19978 +
19979 +			TRACE_TASK(next, "was highest-prio waiter\n");
19980 +			/* next has the highest priority --- it doesn't need to
19981 +			 * inherit.  However, we need to make sure that the
19982 +			 * next-highest priority in the queue is reflected in
19983 +			 * hp_waiter. */
19984 +			mutex->hp_waiter = rsm_mutex_find_hp_waiter(mutex, next);
19985 +			l->nest.hp_waiter_eff_prio = (mutex->hp_waiter) ?
19986 +				effective_priority(mutex->hp_waiter) :
19987 +				NULL;
19988 +
19989 +			if (mutex->hp_waiter)
19990 +				TRACE_TASK(mutex->hp_waiter, "is new highest-prio waiter\n");
19991 +			else
19992 +				TRACE("no further waiters\n");
19993 +
19994 +			raw_spin_lock(&tsk_rt(next)->hp_blocked_tasks_lock);
19995 +
19996 +			binheap_add(&l->nest.hp_binheap_node,
19997 +						&tsk_rt(next)->hp_blocked_tasks,
19998 +						struct nested_info, hp_binheap_node);
19999 +
20000 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
20001 +			if(dgl_wait) {
20002 +				select_next_lock_if_primary(l, dgl_wait);
20003 +				//wake_up_task = atomic_dec_and_test(&dgl_wait->nr_remaining);
20004 +				--(dgl_wait->nr_remaining);
20005 +				wake_up_task = (dgl_wait->nr_remaining == 0);
20006 +			}
20007 +#endif
20008 +			raw_spin_unlock(&tsk_rt(next)->hp_blocked_tasks_lock);
20009 +		}
20010 +		else {
20011 +			/* Well, if 'next' is not the highest-priority waiter,
20012 +			 * then it (probably) ought to inherit the highest-priority
20013 +			 * waiter's priority. */
20014 +			TRACE_TASK(next, "is not hp_waiter of lock %d.\n", l->ident);
20015 +
20016 +			raw_spin_lock(&tsk_rt(next)->hp_blocked_tasks_lock);
20017 +
20018 +			binheap_add(&l->nest.hp_binheap_node,
20019 +						&tsk_rt(next)->hp_blocked_tasks,
20020 +						struct nested_info, hp_binheap_node);
20021 +
20022 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
20023 +			if(dgl_wait) {
20024 +				select_next_lock_if_primary(l, dgl_wait);
20025 +				--(dgl_wait->nr_remaining);
20026 +				wake_up_task = (dgl_wait->nr_remaining == 0);
20027 +			}
20028 +#endif
20029 +
20030 +			/* It is possible that 'next' *should* be the hp_waiter, but isn't
20031 +		     * because that update hasn't yet executed (update operation is
20032 +			 * probably blocked on mutex->lock). So only inherit if the top of
20033 +			 * 'next's top heap node is indeed the effective prio. of hp_waiter.
20034 +			 * (We use l->hp_waiter_eff_prio instead of effective_priority(hp_waiter)
20035 +			 * since the effective priority of hp_waiter can change (and the
20036 +			 * update has not made it to this lock).)
20037 +			 */
20038 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
20039 +			if((l->nest.hp_waiter_eff_prio != NULL) &&
20040 +			   (top_priority(&tsk_rt(next)->hp_blocked_tasks) ==
20041 +													l->nest.hp_waiter_eff_prio))
20042 +			{
20043 +				if(dgl_wait && tsk_rt(next)->blocked_lock) {
20044 +					BUG_ON(wake_up_task);
20045 +					//if(__edf_higher_prio(l->nest.hp_waiter_eff_prio, BASE, next, EFFECTIVE)) {
20046 +					if(litmus->__compare(l->nest.hp_waiter_eff_prio, BASE, next, EFFECTIVE)) {
20047 +						litmus->nested_increase_prio(next,
20048 +							l->nest.hp_waiter_eff_prio, &mutex->lock, flags);  // unlocks lock && hp_blocked_tasks_lock.
20049 +						goto out;  // all spinlocks are released.  bail out now.
20050 +					}
20051 +				}
20052 +				else {
20053 +					litmus->increase_prio(next, l->nest.hp_waiter_eff_prio);
20054 +				}
20055 +			}
20056 +
20057 +			raw_spin_unlock(&tsk_rt(next)->hp_blocked_tasks_lock);
20058 +#else
20059 +			if(likely(top_priority(&tsk_rt(next)->hp_blocked_tasks) ==
20060 +													l->nest.hp_waiter_eff_prio))
20061 +			{
20062 +				litmus->increase_prio(next, l->nest.hp_waiter_eff_prio);
20063 +			}
20064 +			raw_spin_unlock(&tsk_rt(next)->hp_blocked_tasks_lock);
20065 +#endif
20066 +		}
20067 +
20068 +		if(wake_up_task) {
20069 +			TRACE_TASK(next, "waking up since it is no longer blocked.\n");
20070 +
20071 +			tsk_rt(next)->blocked_lock = NULL;
20072 +			mb();
20073 +
20074 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
20075 +			// re-enable tracking
20076 +			if(tsk_rt(next)->held_gpus) {
20077 +				tsk_rt(next)->suspend_gpu_tracker_on_block = 0;
20078 +			}
20079 +#endif
20080 +
20081 +			wake_up_process(next);
20082 +		}
20083 +		else {
20084 +			TRACE_TASK(next, "is still blocked.\n");
20085 +		}
20086 +	}
20087 +	else {
20088 +		/* becomes available */
20089 +		mutex->owner = NULL;
20090 +	}
20091 +
20092 +	unlock_fine_irqrestore(&mutex->lock, flags);
20093 +
20094 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
20095 +out:
20096 +#endif
20097 +	unlock_global_irqrestore(dgl_lock, flags);
20098 +
20099 +	return err;
20100 +}
20101 +
20102 +
20103 +void rsm_mutex_propagate_increase_inheritance(struct litmus_lock* l,
20104 +											struct task_struct* t,
20105 +											raw_spinlock_t* to_unlock,
20106 +											unsigned long irqflags)
20107 +{
20108 +	struct rsm_mutex *mutex = rsm_mutex_from_lock(l);
20109 +
20110 +	// relay-style locking
20111 +	lock_fine(&mutex->lock);
20112 +	unlock_fine(to_unlock);
20113 +
20114 +	if(tsk_rt(t)->blocked_lock == l) {  // prevent race on tsk_rt(t)->blocked
20115 +		struct task_struct *owner = mutex->owner;
20116 +
20117 +		struct task_struct *old_max_eff_prio;
20118 +		struct task_struct *new_max_eff_prio;
20119 +
20120 +		raw_spin_lock(&tsk_rt(owner)->hp_blocked_tasks_lock);
20121 +
20122 +		old_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
20123 +
20124 +		//if((t != mutex->hp_waiter) && edf_higher_prio(t, mutex->hp_waiter)) {
20125 +		if((t != mutex->hp_waiter) && litmus->compare(t, mutex->hp_waiter)) {
20126 +			TRACE_TASK(t, "is new highest-prio waiter by propagation.\n");
20127 +			mutex->hp_waiter = t;
20128 +		}
20129 +		if(t == mutex->hp_waiter) {
20130 +			// reflect the decreased priority in the heap node.
20131 +			l->nest.hp_waiter_eff_prio = effective_priority(mutex->hp_waiter);
20132 +
20133 +			BUG_ON(!binheap_is_in_heap(&l->nest.hp_binheap_node));
20134 +			BUG_ON(!binheap_is_in_this_heap(&l->nest.hp_binheap_node,
20135 +											&tsk_rt(owner)->hp_blocked_tasks));
20136 +
20137 +			binheap_decrease(&l->nest.hp_binheap_node,
20138 +							 &tsk_rt(owner)->hp_blocked_tasks);
20139 +		}
20140 +
20141 +		new_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
20142 +
20143 +
20144 +		if(new_max_eff_prio != old_max_eff_prio) {
20145 +			// new_max_eff_prio > old_max_eff_prio holds.
20146 +			if ((effective_priority(owner) == old_max_eff_prio) ||
20147 +				//(__edf_higher_prio(new_max_eff_prio, BASE, owner, EFFECTIVE))) {
20148 +				(litmus->__compare(new_max_eff_prio, BASE, owner, EFFECTIVE))) {
20149 +				TRACE_CUR("Propagating inheritance to holder of lock %d.\n",
20150 +						  l->ident);
20151 +
20152 +				// beware: recursion
20153 +				litmus->nested_increase_prio(owner, new_max_eff_prio,
20154 +											 &mutex->lock, irqflags);  // unlocks mutex->lock
20155 +			}
20156 +			else {
20157 +				TRACE_CUR("Lower priority than holder %s/%d.  No propagation.\n",
20158 +						  owner->comm, owner->pid);
20159 +				raw_spin_unlock(&tsk_rt(owner)->hp_blocked_tasks_lock);
20160 +				unlock_fine_irqrestore(&mutex->lock, irqflags);
20161 +			}
20162 +		}
20163 +		else {
20164 +			TRACE_TASK(mutex->owner, "No change in maxiumum effective priority.\n");
20165 +			raw_spin_unlock(&tsk_rt(owner)->hp_blocked_tasks_lock);
20166 +			unlock_fine_irqrestore(&mutex->lock, irqflags);
20167 +		}
20168 +	}
20169 +	else {
20170 +		struct litmus_lock *still_blocked = tsk_rt(t)->blocked_lock;
20171 +
20172 +		TRACE_TASK(t, "is not blocked on lock %d.\n", l->ident);
20173 +		if(still_blocked) {
20174 +			TRACE_TASK(t, "is still blocked on a lock though (lock %d).\n",
20175 +					   still_blocked->ident);
20176 +			if(still_blocked->ops->propagate_increase_inheritance) {
20177 +				/* due to relay-style nesting of spinlocks (acq. A, acq. B, free A, free B)
20178 +				 we know that task 't' has not released any locks behind us in this
20179 +				 chain.  Propagation just needs to catch up with task 't'. */
20180 +				still_blocked->ops->propagate_increase_inheritance(still_blocked,
20181 +																   t,
20182 +																   &mutex->lock,
20183 +																   irqflags);
20184 +			}
20185 +			else {
20186 +				TRACE_TASK(t,
20187 +						   "Inheritor is blocked on lock (%p) that does not "
20188 +						   "support nesting!\n",
20189 +						   still_blocked);
20190 +				unlock_fine_irqrestore(&mutex->lock, irqflags);
20191 +			}
20192 +		}
20193 +		else {
20194 +			unlock_fine_irqrestore(&mutex->lock, irqflags);
20195 +		}
20196 +	}
20197 +}
20198 +
20199 +
20200 +void rsm_mutex_propagate_decrease_inheritance(struct litmus_lock* l,
20201 +											 struct task_struct* t,
20202 +											 raw_spinlock_t* to_unlock,
20203 +											 unsigned long irqflags)
20204 +{
20205 +	struct rsm_mutex *mutex = rsm_mutex_from_lock(l);
20206 +
20207 +	// relay-style locking
20208 +	lock_fine(&mutex->lock);
20209 +	unlock_fine(to_unlock);
20210 +
20211 +	if(tsk_rt(t)->blocked_lock == l) {  // prevent race on tsk_rt(t)->blocked
20212 +		if(t == mutex->hp_waiter) {
20213 +			struct task_struct *owner = mutex->owner;
20214 +
20215 +			struct task_struct *old_max_eff_prio;
20216 +			struct task_struct *new_max_eff_prio;
20217 +
20218 +			raw_spin_lock(&tsk_rt(owner)->hp_blocked_tasks_lock);
20219 +
20220 +			old_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
20221 +
20222 +			binheap_delete(&l->nest.hp_binheap_node, &tsk_rt(owner)->hp_blocked_tasks);
20223 +			mutex->hp_waiter = rsm_mutex_find_hp_waiter(mutex, NULL);
20224 +			l->nest.hp_waiter_eff_prio = (mutex->hp_waiter) ?
20225 +				effective_priority(mutex->hp_waiter) : NULL;
20226 +			binheap_add(&l->nest.hp_binheap_node,
20227 +						&tsk_rt(owner)->hp_blocked_tasks,
20228 +						struct nested_info, hp_binheap_node);
20229 +
20230 +			new_max_eff_prio = top_priority(&tsk_rt(owner)->hp_blocked_tasks);
20231 +
20232 +			if((old_max_eff_prio != new_max_eff_prio) &&
20233 +			   (effective_priority(owner) == old_max_eff_prio))
20234 +			{
20235 +				// Need to set new effective_priority for owner
20236 +
20237 +				struct task_struct *decreased_prio;
20238 +
20239 +				TRACE_CUR("Propagating decreased inheritance to holder of lock %d.\n",
20240 +						  l->ident);
20241 +
20242 +				//if(__edf_higher_prio(new_max_eff_prio, BASE, owner, BASE)) {
20243 +				if(litmus->__compare(new_max_eff_prio, BASE, owner, BASE)) {
20244 +					TRACE_CUR("%s/%d has greater base priority than base priority of owner (%s/%d) of lock %d.\n",
20245 +							  (new_max_eff_prio) ? new_max_eff_prio->comm : "nil",
20246 +							  (new_max_eff_prio) ? new_max_eff_prio->pid : -1,
20247 +							  owner->comm,
20248 +							  owner->pid,
20249 +							  l->ident);
20250 +
20251 +					decreased_prio = new_max_eff_prio;
20252 +				}
20253 +				else {
20254 +					TRACE_CUR("%s/%d has lesser base priority than base priority of owner (%s/%d) of lock %d.\n",
20255 +							  (new_max_eff_prio) ? new_max_eff_prio->comm : "nil",
20256 +							  (new_max_eff_prio) ? new_max_eff_prio->pid : -1,
20257 +							  owner->comm,
20258 +							  owner->pid,
20259 +							  l->ident);
20260 +
20261 +					decreased_prio = NULL;
20262 +				}
20263 +
20264 +				// beware: recursion
20265 +				litmus->nested_decrease_prio(owner, decreased_prio, &mutex->lock, irqflags);	// will unlock mutex->lock
20266 +			}
20267 +			else {
20268 +				raw_spin_unlock(&tsk_rt(owner)->hp_blocked_tasks_lock);
20269 +				unlock_fine_irqrestore(&mutex->lock, irqflags);
20270 +			}
20271 +		}
20272 +		else {
20273 +			TRACE_TASK(t, "is not hp_waiter.  No propagation.\n");
20274 +			unlock_fine_irqrestore(&mutex->lock, irqflags);
20275 +		}
20276 +	}
20277 +	else {
20278 +		struct litmus_lock *still_blocked = tsk_rt(t)->blocked_lock;
20279 +
20280 +		TRACE_TASK(t, "is not blocked on lock %d.\n", l->ident);
20281 +		if(still_blocked) {
20282 +			TRACE_TASK(t, "is still blocked on a lock though (lock %d).\n",
20283 +					   still_blocked->ident);
20284 +			if(still_blocked->ops->propagate_decrease_inheritance) {
20285 +				/* due to linked nesting of spinlocks (acq. A, acq. B, free A, free B)
20286 +				 we know that task 't' has not released any locks behind us in this
20287 +				 chain.  propagation just needs to catch up with task 't' */
20288 +				still_blocked->ops->propagate_decrease_inheritance(still_blocked,
20289 +																   t,
20290 +																   &mutex->lock,
20291 +																   irqflags);
20292 +			}
20293 +			else {
20294 +				TRACE_TASK(t, "Inheritor is blocked on lock (%p) that does not support nesting!\n",
20295 +						   still_blocked);
20296 +				unlock_fine_irqrestore(&mutex->lock, irqflags);
20297 +			}
20298 +		}
20299 +		else {
20300 +			unlock_fine_irqrestore(&mutex->lock, irqflags);
20301 +		}
20302 +	}
20303 +}
20304 +
20305 +
20306 +int rsm_mutex_close(struct litmus_lock* l)
20307 +{
20308 +	struct task_struct *t = current;
20309 +	struct rsm_mutex *mutex = rsm_mutex_from_lock(l);
20310 +	unsigned long flags;
20311 +
20312 +	int owner;
20313 +
20314 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
20315 +	raw_spinlock_t *dgl_lock = litmus->get_dgl_spinlock(t);
20316 +#endif
20317 +
20318 +	lock_global_irqsave(dgl_lock, flags);
20319 +	lock_fine_irqsave(&mutex->lock, flags);
20320 +
20321 +	owner = (mutex->owner == t);
20322 +
20323 +	unlock_fine_irqrestore(&mutex->lock, flags);
20324 +	unlock_global_irqrestore(dgl_lock, flags);
20325 +
20326 +	if (owner)
20327 +		rsm_mutex_unlock(l);
20328 +
20329 +	return 0;
20330 +}
20331 +
20332 +void rsm_mutex_free(struct litmus_lock* lock)
20333 +{
20334 +	kfree(rsm_mutex_from_lock(lock));
20335 +}
20336 +
20337 +struct litmus_lock* rsm_mutex_new(struct litmus_lock_ops* ops)
20338 +{
20339 +	struct rsm_mutex* mutex;
20340 +
20341 +	mutex = kmalloc(sizeof(*mutex), GFP_KERNEL);
20342 +	if (!mutex)
20343 +		return NULL;
20344 +
20345 +	mutex->litmus_lock.ops = ops;
20346 +	mutex->owner   = NULL;
20347 +	mutex->hp_waiter = NULL;
20348 +	init_waitqueue_head(&mutex->wait);
20349 +
20350 +
20351 +#ifdef CONFIG_DEBUG_SPINLOCK
20352 +	{
20353 +		__raw_spin_lock_init(&mutex->lock,
20354 +							 ((struct litmus_lock*)mutex)->cheat_lockdep,
20355 +							 &((struct litmus_lock*)mutex)->key);
20356 +	}
20357 +#else
20358 +	raw_spin_lock_init(&mutex->lock);
20359 +#endif
20360 +
20361 +	((struct litmus_lock*)mutex)->nest.hp_waiter_ptr = &mutex->hp_waiter;
20362 +
20363 +	return &mutex->litmus_lock;
20364 +}
20365 +
20366 diff --git a/litmus/rt_domain.c b/litmus/rt_domain.c
20367 new file mode 100644
20368 index 0000000..d405854
20369 --- /dev/null
20370 +++ b/litmus/rt_domain.c
20371 @@ -0,0 +1,357 @@
20372 +/*
20373 + * litmus/rt_domain.c
20374 + *
20375 + * LITMUS real-time infrastructure. This file contains the
20376 + * functions that manipulate RT domains. RT domains are an abstraction
20377 + * of a ready queue and a release queue.
20378 + */
20379 +
20380 +#include <linux/percpu.h>
20381 +#include <linux/sched.h>
20382 +#include <linux/list.h>
20383 +#include <linux/slab.h>
20384 +
20385 +#include <litmus/litmus.h>
20386 +#include <litmus/sched_plugin.h>
20387 +#include <litmus/sched_trace.h>
20388 +
20389 +#include <litmus/rt_domain.h>
20390 +
20391 +#include <litmus/trace.h>
20392 +
20393 +#include <litmus/bheap.h>
20394 +
20395 +/* Uncomment when debugging timer races... */
20396 +#if 0
20397 +#define VTRACE_TASK TRACE_TASK
20398 +#define VTRACE TRACE
20399 +#else
20400 +#define VTRACE_TASK(t, fmt, args...) /* shut up */
20401 +#define VTRACE(fmt, args...) /* be quiet already */
20402 +#endif
20403 +
20404 +static int dummy_resched(rt_domain_t *rt)
20405 +{
20406 +	return 0;
20407 +}
20408 +
20409 +static int dummy_order(struct bheap_node* a, struct bheap_node* b)
20410 +{
20411 +	return 0;
20412 +}
20413 +
20414 +/* default implementation: use default lock */
20415 +static void default_release_jobs(rt_domain_t* rt, struct bheap* tasks)
20416 +{
20417 +	merge_ready(rt, tasks);
20418 +}
20419 +
20420 +static unsigned int time2slot(lt_t time)
20421 +{
20422 +	return (unsigned int) time2quanta(time, FLOOR) % RELEASE_QUEUE_SLOTS;
20423 +}
20424 +
20425 +static enum hrtimer_restart on_release_timer(struct hrtimer *timer)
20426 +{
20427 +	unsigned long flags;
20428 +	struct release_heap* rh;
20429 +	rh = container_of(timer, struct release_heap, timer);
20430 +
20431 +	TS_RELEASE_LATENCY(rh->release_time);
20432 +
20433 +	VTRACE("on_release_timer(0x%p) starts.\n", timer);
20434 +
20435 +	TS_RELEASE_START;
20436 +
20437 +
20438 +	raw_spin_lock_irqsave(&rh->dom->release_lock, flags);
20439 +	VTRACE("CB has the release_lock 0x%p\n", &rh->dom->release_lock);
20440 +	/* remove from release queue */
20441 +	list_del(&rh->list);
20442 +	raw_spin_unlock_irqrestore(&rh->dom->release_lock, flags);
20443 +	VTRACE("CB returned release_lock 0x%p\n", &rh->dom->release_lock);
20444 +
20445 +	/* call release callback */
20446 +	rh->dom->release_jobs(rh->dom, &rh->heap);
20447 +	/* WARNING: rh can be referenced from other CPUs from now on. */
20448 +
20449 +	TS_RELEASE_END;
20450 +
20451 +	VTRACE("on_release_timer(0x%p) ends.\n", timer);
20452 +
20453 +	return  HRTIMER_NORESTART;
20454 +}
20455 +
20456 +/* allocated in litmus.c */
20457 +struct kmem_cache * release_heap_cache;
20458 +
20459 +struct release_heap* release_heap_alloc(int gfp_flags)
20460 +{
20461 +	struct release_heap* rh;
20462 +	rh= kmem_cache_alloc(release_heap_cache, gfp_flags);
20463 +	if (rh) {
20464 +		/* initialize timer */
20465 +		hrtimer_init(&rh->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
20466 +		rh->timer.function = on_release_timer;
20467 +	}
20468 +	return rh;
20469 +}
20470 +
20471 +void release_heap_free(struct release_heap* rh)
20472 +{
20473 +	/* make sure timer is no longer in use */
20474 +	hrtimer_cancel(&rh->timer);
20475 +	kmem_cache_free(release_heap_cache, rh);
20476 +}
20477 +
20478 +/* Caller must hold release lock.
20479 + * Will return heap for given time. If no such heap exists prior to
20480 + * the invocation it will be created.
20481 + */
20482 +static struct release_heap* get_release_heap(rt_domain_t *rt,
20483 +					     struct task_struct* t,
20484 +					     int use_task_heap)
20485 +{
20486 +	struct list_head* pos;
20487 +	struct release_heap* heap = NULL;
20488 +	struct release_heap* rh;
20489 +	lt_t release_time = get_release(t);
20490 +	unsigned int slot = time2slot(release_time);
20491 +
20492 +	/* initialize pos for the case that the list is empty */
20493 +	pos = rt->release_queue.slot[slot].next;
20494 +	list_for_each(pos, &rt->release_queue.slot[slot]) {
20495 +		rh = list_entry(pos, struct release_heap, list);
20496 +		if (release_time == rh->release_time) {
20497 +			/* perfect match -- this happens on hyperperiod
20498 +			 * boundaries
20499 +			 */
20500 +			heap = rh;
20501 +			break;
20502 +		} else if (lt_before(release_time, rh->release_time)) {
20503 +			/* we need to insert a new node since rh is
20504 +			 * already in the future
20505 +			 */
20506 +			break;
20507 +		}
20508 +	}
20509 +	if (!heap && use_task_heap) {
20510 +		/* use pre-allocated release heap */
20511 +		rh = tsk_rt(t)->rel_heap;
20512 +
20513 +		rh->dom = rt;
20514 +		rh->release_time = release_time;
20515 +
20516 +		/* add to release queue */
20517 +		list_add(&rh->list, pos->prev);
20518 +		heap = rh;
20519 +	}
20520 +	return heap;
20521 +}
20522 +
20523 +static void reinit_release_heap(struct task_struct* t)
20524 +{
20525 +	struct release_heap* rh;
20526 +
20527 +	/* use pre-allocated release heap */
20528 +	rh = tsk_rt(t)->rel_heap;
20529 +
20530 +	/* Make sure it is safe to use.  The timer callback could still
20531 +	 * be executing on another CPU; hrtimer_cancel() will wait
20532 +	 * until the timer callback has completed.  However, under no
20533 +	 * circumstances should the timer be active (= yet to be
20534 +	 * triggered).
20535 +	 *
20536 +	 * WARNING: If the CPU still holds the release_lock at this point,
20537 +	 *          deadlock may occur!
20538 +	 */
20539 +	BUG_ON(hrtimer_cancel(&rh->timer));
20540 +
20541 +	/* initialize */
20542 +	bheap_init(&rh->heap);
20543 +#ifdef CONFIG_RELEASE_MASTER
20544 +	atomic_set(&rh->info.state, HRTIMER_START_ON_INACTIVE);
20545 +#endif
20546 +}
20547 +/* arm_release_timer() - start local release timer or trigger
20548 + *     remote timer (pull timer)
20549 + *
20550 + * Called by add_release() with:
20551 + * - tobe_lock taken
20552 + * - IRQ disabled
20553 + */
20554 +#ifdef CONFIG_RELEASE_MASTER
20555 +#define arm_release_timer(t) arm_release_timer_on((t), NO_CPU)
20556 +static void arm_release_timer_on(rt_domain_t *_rt , int target_cpu)
20557 +#else
20558 +static void arm_release_timer(rt_domain_t *_rt)
20559 +#endif
20560 +{
20561 +	rt_domain_t *rt = _rt;
20562 +	struct list_head list;
20563 +	struct list_head *pos, *safe;
20564 +	struct task_struct* t;
20565 +	struct release_heap* rh;
20566 +
20567 +	VTRACE("arm_release_timer() at %llu\n", litmus_clock());
20568 +	list_replace_init(&rt->tobe_released, &list);
20569 +
20570 +	list_for_each_safe(pos, safe, &list) {
20571 +		/* pick task of work list */
20572 +		t = list_entry(pos, struct task_struct, rt_param.list);
20573 +		sched_trace_task_release(t);
20574 +		list_del(pos);
20575 +
20576 +		/* put into release heap while holding release_lock */
20577 +		raw_spin_lock(&rt->release_lock);
20578 +		VTRACE_TASK(t, "I have the release_lock 0x%p\n", &rt->release_lock);
20579 +
20580 +		rh = get_release_heap(rt, t, 0);
20581 +		if (!rh) {
20582 +			/* need to use our own, but drop lock first */
20583 +			raw_spin_unlock(&rt->release_lock);
20584 +			VTRACE_TASK(t, "Dropped release_lock 0x%p\n",
20585 +				    &rt->release_lock);
20586 +
20587 +			reinit_release_heap(t);
20588 +			VTRACE_TASK(t, "release_heap ready\n");
20589 +
20590 +			raw_spin_lock(&rt->release_lock);
20591 +			VTRACE_TASK(t, "Re-acquired release_lock 0x%p\n",
20592 +				    &rt->release_lock);
20593 +
20594 +			rh = get_release_heap(rt, t, 1);
20595 +		}
20596 +		bheap_insert(rt->order, &rh->heap, tsk_rt(t)->heap_node);
20597 +		VTRACE_TASK(t, "arm_release_timer(): added to release heap\n");
20598 +
20599 +		raw_spin_unlock(&rt->release_lock);
20600 +		VTRACE_TASK(t, "Returned the release_lock 0x%p\n", &rt->release_lock);
20601 +
20602 +		/* To avoid arming the timer multiple times, we only let the
20603 +		 * owner do the arming (which is the "first" task to reference
20604 +		 * this release_heap anyway).
20605 +		 */
20606 +		if (rh == tsk_rt(t)->rel_heap) {
20607 +			VTRACE_TASK(t, "arming timer 0x%p\n", &rh->timer);
20608 +			/* we cannot arm the timer using hrtimer_start()
20609 +			 * as it may deadlock on rq->lock
20610 +			 *
20611 +			 * PINNED mode is ok on both local and remote CPU
20612 +			 */
20613 +#ifdef CONFIG_RELEASE_MASTER
20614 +			if (rt->release_master == NO_CPU &&
20615 +			    target_cpu == NO_CPU)
20616 +#endif
20617 +				__hrtimer_start_range_ns(&rh->timer,
20618 +						ns_to_ktime(rh->release_time),
20619 +						0, HRTIMER_MODE_ABS_PINNED, 0);
20620 +#ifdef CONFIG_RELEASE_MASTER
20621 +			else
20622 +				hrtimer_start_on(
20623 +					/* target_cpu overrides release master */
20624 +					(target_cpu != NO_CPU ?
20625 +					 target_cpu : rt->release_master),
20626 +					&rh->info, &rh->timer,
20627 +					ns_to_ktime(rh->release_time),
20628 +					HRTIMER_MODE_ABS_PINNED);
20629 +#endif
20630 +		} else
20631 +			VTRACE_TASK(t, "0x%p is not my timer\n", &rh->timer);
20632 +	}
20633 +}
20634 +
20635 +void rt_domain_init(rt_domain_t *rt,
20636 +		    bheap_prio_t order,
20637 +		    check_resched_needed_t check,
20638 +		    release_jobs_t release
20639 +		   )
20640 +{
20641 +	int i;
20642 +
20643 +	BUG_ON(!rt);
20644 +	if (!check)
20645 +		check = dummy_resched;
20646 +	if (!release)
20647 +		release = default_release_jobs;
20648 +	if (!order)
20649 +		order = dummy_order;
20650 +
20651 +#ifdef CONFIG_RELEASE_MASTER
20652 +	rt->release_master = NO_CPU;
20653 +#endif
20654 +
20655 +	bheap_init(&rt->ready_queue);
20656 +	INIT_LIST_HEAD(&rt->tobe_released);
20657 +	for (i = 0; i < RELEASE_QUEUE_SLOTS; i++)
20658 +		INIT_LIST_HEAD(&rt->release_queue.slot[i]);
20659 +
20660 +	raw_spin_lock_init(&rt->ready_lock);
20661 +	raw_spin_lock_init(&rt->release_lock);
20662 +	raw_spin_lock_init(&rt->tobe_lock);
20663 +
20664 +	rt->check_resched 	= check;
20665 +	rt->release_jobs	= release;
20666 +	rt->order		= order;
20667 +}
20668 +
20669 +/* add_ready - add a real-time task to the rt ready queue. It must be runnable.
20670 + * @new:       the newly released task
20671 + */
20672 +void __add_ready(rt_domain_t* rt, struct task_struct *new)
20673 +{
20674 +	TRACE("rt: adding %s/%d (%llu, %llu) rel=%llu to ready queue at %llu\n",
20675 +	      new->comm, new->pid, get_exec_cost(new), get_rt_period(new),
20676 +	      get_release(new), litmus_clock());
20677 +
20678 +	BUG_ON(bheap_node_in_heap(tsk_rt(new)->heap_node));
20679 +
20680 +	bheap_insert(rt->order, &rt->ready_queue, tsk_rt(new)->heap_node);
20681 +	rt->check_resched(rt);
20682 +}
20683 +
20684 +/* merge_ready - Add a sorted set of tasks to the rt ready queue. They must be runnable.
20685 + * @tasks      - the newly released tasks
20686 + */
20687 +void __merge_ready(rt_domain_t* rt, struct bheap* tasks)
20688 +{
20689 +	bheap_union(rt->order, &rt->ready_queue, tasks);
20690 +	rt->check_resched(rt);
20691 +}
20692 +
20693 +
20694 +#ifdef CONFIG_RELEASE_MASTER
20695 +void __add_release_on(rt_domain_t* rt, struct task_struct *task,
20696 +		      int target_cpu)
20697 +{
20698 +	TRACE_TASK(task, "add_release_on(), rel=%llu, target=%d\n",
20699 +		   get_release(task), target_cpu);
20700 +	list_add(&tsk_rt(task)->list, &rt->tobe_released);
20701 +	task->rt_param.domain = rt;
20702 +
20703 +	/* start release timer */
20704 +	TS_SCHED2_START(task);
20705 +
20706 +	arm_release_timer_on(rt, target_cpu);
20707 +
20708 +	TS_SCHED2_END(task);
20709 +}
20710 +#endif
20711 +
20712 +/* add_release - add a real-time task to the rt release queue.
20713 + * @task:        the sleeping task
20714 + */
20715 +void __add_release(rt_domain_t* rt, struct task_struct *task)
20716 +{
20717 +	TRACE_TASK(task, "add_release(), rel=%llu\n", get_release(task));
20718 +	list_add(&tsk_rt(task)->list, &rt->tobe_released);
20719 +	task->rt_param.domain = rt;
20720 +
20721 +	/* start release timer */
20722 +	TS_SCHED2_START(task);
20723 +
20724 +	arm_release_timer(rt);
20725 +
20726 +	TS_SCHED2_END(task);
20727 +}
20728 +
20729 diff --git a/litmus/sched_cedf.c b/litmus/sched_cedf.c
20730 new file mode 100644
20731 index 0000000..be14dbe
20732 --- /dev/null
20733 +++ b/litmus/sched_cedf.c
20734 @@ -0,0 +1,1849 @@
20735 +/*
20736 + * litmus/sched_cedf.c
20737 + *
20738 + * Implementation of the C-EDF scheduling algorithm.
20739 + *
20740 + * This implementation is based on G-EDF:
20741 + * - CPUs are clustered around L2 or L3 caches.
20742 + * - Clusters topology is automatically detected (this is arch dependent
20743 + *   and is working only on x86 at the moment --- and only with modern
20744 + *   cpus that exports cpuid4 information)
20745 + * - The plugins _does not_ attempt to put tasks in the right cluster i.e.
20746 + *   the programmer needs to be aware of the topology to place tasks
20747 + *   in the desired cluster
20748 + * - default clustering is around L2 cache (cache index = 2)
20749 + *   supported clusters are: L1 (private cache: pedf), L2, L3, ALL (all
20750 + *   online_cpus are placed in a single cluster).
20751 + *
20752 + *   For details on functions, take a look at sched_gsn_edf.c
20753 + *
20754 + * Currently, we do not support changes in the number of online cpus.
20755 + * If the num_online_cpus() dynamically changes, the plugin is broken.
20756 + *
20757 + * This version uses the simple approach and serializes all scheduling
20758 + * decisions by the use of a queue lock. This is probably not the
20759 + * best way to do it, but it should suffice for now.
20760 + */
20761 +
20762 +#include <linux/spinlock.h>
20763 +#include <linux/percpu.h>
20764 +#include <linux/sched.h>
20765 +#include <linux/slab.h>
20766 +#include <linux/uaccess.h>
20767 +#include <linux/module.h>
20768 +
20769 +#include <litmus/litmus.h>
20770 +#include <litmus/jobs.h>
20771 +#include <litmus/preempt.h>
20772 +#include <litmus/sched_plugin.h>
20773 +#include <litmus/edf_common.h>
20774 +#include <litmus/sched_trace.h>
20775 +
20776 +#include <litmus/clustered.h>
20777 +
20778 +#include <litmus/bheap.h>
20779 +#include <litmus/binheap.h>
20780 +
20781 +#ifdef CONFIG_LITMUS_LOCKING
20782 +#include <litmus/kfmlp_lock.h>
20783 +#endif
20784 +
20785 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
20786 +#include <litmus/rsm_lock.h>
20787 +#include <litmus/ikglp_lock.h>
20788 +#endif
20789 +
20790 +#ifdef CONFIG_SCHED_CPU_AFFINITY
20791 +#include <litmus/affinity.h>
20792 +#endif
20793 +
20794 +/* to configure the cluster size */
20795 +#include <litmus/litmus_proc.h>
20796 +
20797 +#ifdef CONFIG_SCHED_CPU_AFFINITY
20798 +#include <litmus/affinity.h>
20799 +#endif
20800 +
20801 +#ifdef CONFIG_LITMUS_SOFTIRQD
20802 +#include <litmus/litmus_softirq.h>
20803 +#endif
20804 +
20805 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
20806 +#include <linux/interrupt.h>
20807 +#include <litmus/trace.h>
20808 +#endif
20809 +
20810 +#ifdef CONFIG_LITMUS_NVIDIA
20811 +#include <litmus/nvidia_info.h>
20812 +#endif
20813 +
20814 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
20815 +#include <litmus/gpu_affinity.h>
20816 +#endif
20817 +
20818 +/* Reference configuration variable. Determines which cache level is used to
20819 + * group CPUs into clusters.  GLOBAL_CLUSTER, which is the default, means that
20820 + * all CPUs form a single cluster (just like GSN-EDF).
20821 + */
20822 +static enum cache_level cluster_config = GLOBAL_CLUSTER;
20823 +
20824 +struct clusterdomain;
20825 +
20826 +/* cpu_entry_t - maintain the linked and scheduled state
20827 + *
20828 + * A cpu also contains a pointer to the cedf_domain_t cluster
20829 + * that owns it (struct clusterdomain*)
20830 + */
20831 +typedef struct  {
20832 +	int 			cpu;
20833 +	struct clusterdomain*	cluster;	/* owning cluster */
20834 +	struct task_struct*	linked;		/* only RT tasks */
20835 +	struct task_struct*	scheduled;	/* only RT tasks */
20836 +	atomic_t		will_schedule;	/* prevent unneeded IPIs */
20837 +	struct binheap_node hn;
20838 +} cpu_entry_t;
20839 +
20840 +/* one cpu_entry_t per CPU */
20841 +DEFINE_PER_CPU(cpu_entry_t, cedf_cpu_entries);
20842 +
20843 +#define set_will_schedule() \
20844 +	(atomic_set(&__get_cpu_var(cedf_cpu_entries).will_schedule, 1))
20845 +#define clear_will_schedule() \
20846 +	(atomic_set(&__get_cpu_var(cedf_cpu_entries).will_schedule, 0))
20847 +#define test_will_schedule(cpu) \
20848 +	(atomic_read(&per_cpu(cedf_cpu_entries, cpu).will_schedule))
20849 +
20850 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
20851 +struct tasklet_head
20852 +{
20853 +	struct tasklet_struct *head;
20854 +	struct tasklet_struct **tail;
20855 +};
20856 +#endif
20857 +
20858 +/*
20859 + * In C-EDF there is a cedf domain _per_ cluster
20860 + * The number of clusters is dynamically determined accordingly to the
20861 + * total cpu number and the cluster size
20862 + */
20863 +typedef struct clusterdomain {
20864 +	/* rt_domain for this cluster */
20865 +	rt_domain_t	domain;
20866 +	/* cpus in this cluster */
20867 +	cpu_entry_t*	*cpus;
20868 +	/* map of this cluster cpus */
20869 +	cpumask_var_t	cpu_map;
20870 +	/* the cpus queue themselves according to priority in here */
20871 +	struct binheap_handle cpu_heap;
20872 +	/* lock for this cluster */
20873 +#define cluster_lock domain.ready_lock
20874 +
20875 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
20876 +	struct tasklet_head pending_tasklets;
20877 +#endif
20878 +
20879 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
20880 +	raw_spinlock_t dgl_lock;
20881 +#endif
20882 +} cedf_domain_t;
20883 +
20884 +/* a cedf_domain per cluster; allocation is done at init/activation time */
20885 +cedf_domain_t *cedf;
20886 +
20887 +#define remote_cluster(cpu)	((cedf_domain_t *) per_cpu(cedf_cpu_entries, cpu).cluster)
20888 +#define task_cpu_cluster(task)	remote_cluster(get_partition(task))
20889 +
20890 +/* total number of cluster */
20891 +static int num_clusters;
20892 +/* we do not support cluster of different sizes */
20893 +static unsigned int cluster_size;
20894 +
20895 +static int clusters_allocated = 0;
20896 +
20897 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
20898 +static raw_spinlock_t* cedf_get_dgl_spinlock(struct task_struct *t)
20899 +{
20900 +	cedf_domain_t *cluster = task_cpu_cluster(t);
20901 +	return(&cluster->dgl_lock);
20902 +}
20903 +#endif
20904 +
20905 +
20906 +/* Uncomment WANT_ALL_SCHED_EVENTS if you want to see all scheduling
20907 + * decisions in the TRACE() log; uncomment VERBOSE_INIT for verbose
20908 + * information during the initialization of the plugin (e.g., topology)
20909 +#define WANT_ALL_SCHED_EVENTS
20910 + */
20911 +#define VERBOSE_INIT
20912 +
20913 +static int cpu_lower_prio(struct binheap_node *_a, struct binheap_node *_b)
20914 +{
20915 +	cpu_entry_t *a = binheap_entry(_a, cpu_entry_t, hn);
20916 +	cpu_entry_t *b = binheap_entry(_b, cpu_entry_t, hn);
20917 +
20918 +	/* Note that a and b are inverted: we want the lowest-priority CPU at
20919 +	 * the top of the heap.
20920 +	 */
20921 +	return edf_higher_prio(b->linked, a->linked);
20922 +}
20923 +
20924 +/* update_cpu_position - Move the cpu entry to the correct place to maintain
20925 + *                       order in the cpu queue. Caller must hold cedf lock.
20926 + */
20927 +static void update_cpu_position(cpu_entry_t *entry)
20928 +{
20929 +	cedf_domain_t *cluster = entry->cluster;
20930 +
20931 +	if (likely(binheap_is_in_heap(&entry->hn))) {
20932 +		binheap_delete(&entry->hn, &cluster->cpu_heap);
20933 +	}
20934 +
20935 +	binheap_add(&entry->hn, &cluster->cpu_heap, cpu_entry_t, hn);
20936 +}
20937 +
20938 +/* caller must hold cedf lock */
20939 +static cpu_entry_t* lowest_prio_cpu(cedf_domain_t *cluster)
20940 +{
20941 +	return binheap_top_entry(&cluster->cpu_heap, cpu_entry_t, hn);
20942 +}
20943 +
20944 +
20945 +/* link_task_to_cpu - Update the link of a CPU.
20946 + *                    Handles the case where the to-be-linked task is already
20947 + *                    scheduled on a different CPU.
20948 + */
20949 +static noinline void link_task_to_cpu(struct task_struct* linked,
20950 +				      cpu_entry_t *entry)
20951 +{
20952 +	cpu_entry_t *sched;
20953 +	struct task_struct* tmp;
20954 +	int on_cpu;
20955 +
20956 +	BUG_ON(linked && !is_realtime(linked));
20957 +
20958 +	/* Currently linked task is set to be unlinked. */
20959 +	if (entry->linked) {
20960 +		entry->linked->rt_param.linked_on = NO_CPU;
20961 +	}
20962 +
20963 +	/* Link new task to CPU. */
20964 +	if (linked) {
20965 +		set_rt_flags(linked, RT_F_RUNNING);
20966 +		/* handle task is already scheduled somewhere! */
20967 +		on_cpu = linked->rt_param.scheduled_on;
20968 +		if (on_cpu != NO_CPU) {
20969 +			sched = &per_cpu(cedf_cpu_entries, on_cpu);
20970 +			/* this should only happen if not linked already */
20971 +			BUG_ON(sched->linked == linked);
20972 +
20973 +			/* If we are already scheduled on the CPU to which we
20974 +			 * wanted to link, we don't need to do the swap --
20975 +			 * we just link ourselves to the CPU and depend on
20976 +			 * the caller to get things right.
20977 +			 */
20978 +			if (entry != sched) {
20979 +				TRACE_TASK(linked,
20980 +					   "already scheduled on %d, updating link.\n",
20981 +					   sched->cpu);
20982 +				tmp = sched->linked;
20983 +				linked->rt_param.linked_on = sched->cpu;
20984 +				sched->linked = linked;
20985 +				update_cpu_position(sched);
20986 +				linked = tmp;
20987 +			}
20988 +		}
20989 +		if (linked) /* might be NULL due to swap */
20990 +			linked->rt_param.linked_on = entry->cpu;
20991 +	}
20992 +	entry->linked = linked;
20993 +#ifdef WANT_ALL_SCHED_EVENTS
20994 +	if (linked)
20995 +		TRACE_TASK(linked, "linked to %d.\n", entry->cpu);
20996 +	else
20997 +		TRACE("NULL linked to %d.\n", entry->cpu);
20998 +#endif
20999 +	update_cpu_position(entry);
21000 +}
21001 +
21002 +/* unlink - Make sure a task is not linked any longer to an entry
21003 + *          where it was linked before. Must hold cluster_lock.
21004 + */
21005 +static noinline void unlink(struct task_struct* t)
21006 +{
21007 +    	cpu_entry_t *entry;
21008 +
21009 +	if (t->rt_param.linked_on != NO_CPU) {
21010 +		/* unlink */
21011 +		entry = &per_cpu(cedf_cpu_entries, t->rt_param.linked_on);
21012 +		t->rt_param.linked_on = NO_CPU;
21013 +		link_task_to_cpu(NULL, entry);
21014 +	} else if (is_queued(t)) {
21015 +		/* This is an interesting situation: t is scheduled,
21016 +		 * but was just recently unlinked.  It cannot be
21017 +		 * linked anywhere else (because then it would have
21018 +		 * been relinked to this CPU), thus it must be in some
21019 +		 * queue. We must remove it from the list in this
21020 +		 * case.
21021 +		 *
21022 +		 * in C-EDF case is should be somewhere in the queue for
21023 +		 * its domain, therefore and we can get the domain using
21024 +		 * task_cpu_cluster
21025 +		 */
21026 +		remove(&(task_cpu_cluster(t))->domain, t);
21027 +	}
21028 +}
21029 +
21030 +
21031 +/* preempt - force a CPU to reschedule
21032 + */
21033 +static void preempt(cpu_entry_t *entry)
21034 +{
21035 +	preempt_if_preemptable(entry->scheduled, entry->cpu);
21036 +}
21037 +
21038 +/* requeue - Put an unlinked task into gsn-edf domain.
21039 + *           Caller must hold cluster_lock.
21040 + */
21041 +static noinline void requeue(struct task_struct* task)
21042 +{
21043 +	cedf_domain_t *cluster = task_cpu_cluster(task);
21044 +	BUG_ON(!task);
21045 +	/* sanity check before insertion */
21046 +	BUG_ON(is_queued(task));
21047 +
21048 +	if (is_released(task, litmus_clock()))
21049 +		__add_ready(&cluster->domain, task);
21050 +	else {
21051 +		/* it has got to wait */
21052 +		add_release(&cluster->domain, task);
21053 +	}
21054 +}
21055 +
21056 +#ifdef CONFIG_SCHED_CPU_AFFINITY
21057 +static cpu_entry_t* cedf_get_nearest_available_cpu(
21058 +				cedf_domain_t *cluster, cpu_entry_t *start)
21059 +{
21060 +	cpu_entry_t *affinity;
21061 +
21062 +	get_nearest_available_cpu(affinity, start, cedf_cpu_entries,
21063 +#ifdef CONFIG_RELEASE_MASTER
21064 +		cluster->domain.release_master
21065 +#else
21066 +		NO_CPU
21067 +#endif
21068 +		);
21069 +
21070 +	/* make sure CPU is in our cluster */
21071 +	if (affinity && cpu_isset(affinity->cpu, *cluster->cpu_map))
21072 +		return(affinity);
21073 +	else
21074 +		return(NULL);
21075 +}
21076 +#endif
21077 +
21078 +
21079 +/* check for any necessary preemptions */
21080 +static void check_for_preemptions(cedf_domain_t *cluster)
21081 +{
21082 +	struct task_struct *task;
21083 +	cpu_entry_t *last;
21084 +
21085 +	for(last = lowest_prio_cpu(cluster);
21086 +	    edf_preemption_needed(&cluster->domain, last->linked);
21087 +	    last = lowest_prio_cpu(cluster)) {
21088 +		/* preemption necessary */
21089 +		task = __take_ready(&cluster->domain);
21090 +		TRACE("check_for_preemptions: attempting to link task %d to %d\n",
21091 +		      task->pid, last->cpu);
21092 +#ifdef CONFIG_SCHED_CPU_AFFINITY
21093 +		{
21094 +			cpu_entry_t *affinity =
21095 +					cedf_get_nearest_available_cpu(cluster,
21096 +						&per_cpu(cedf_cpu_entries, task_cpu(task)));
21097 +			if(affinity)
21098 +				last = affinity;
21099 +			else if(last->linked)
21100 +				requeue(last->linked);
21101 +		}
21102 +#else
21103 +		if (last->linked)
21104 +			requeue(last->linked);
21105 +#endif
21106 +		link_task_to_cpu(task, last);
21107 +		preempt(last);
21108 +	}
21109 +}
21110 +
21111 +/* cedf_job_arrival: task is either resumed or released */
21112 +static noinline void cedf_job_arrival(struct task_struct* task)
21113 +{
21114 +	cedf_domain_t *cluster = task_cpu_cluster(task);
21115 +	BUG_ON(!task);
21116 +
21117 +	requeue(task);
21118 +	check_for_preemptions(cluster);
21119 +}
21120 +
21121 +static void cedf_release_jobs(rt_domain_t* rt, struct bheap* tasks)
21122 +{
21123 +	cedf_domain_t* cluster = container_of(rt, cedf_domain_t, domain);
21124 +	unsigned long flags;
21125 +
21126 +	raw_spin_lock_irqsave(&cluster->cluster_lock, flags);
21127 +
21128 +	__merge_ready(&cluster->domain, tasks);
21129 +	check_for_preemptions(cluster);
21130 +
21131 +	raw_spin_unlock_irqrestore(&cluster->cluster_lock, flags);
21132 +}
21133 +
21134 +/* caller holds cluster_lock */
21135 +static noinline void job_completion(struct task_struct *t, int forced)
21136 +{
21137 +	BUG_ON(!t);
21138 +
21139 +	sched_trace_task_completion(t, forced);
21140 +
21141 +#ifdef CONFIG_LITMUS_NVIDIA
21142 +	atomic_set(&tsk_rt(t)->nv_int_count, 0);
21143 +#endif
21144 +
21145 +	TRACE_TASK(t, "job_completion().\n");
21146 +
21147 +	/* set flags */
21148 +	set_rt_flags(t, RT_F_SLEEP);
21149 +	/* prepare for next period */
21150 +	prepare_for_next_period(t);
21151 +	if (is_released(t, litmus_clock()))
21152 +		sched_trace_task_release(t);
21153 +	/* unlink */
21154 +	unlink(t);
21155 +	/* requeue
21156 +	 * But don't requeue a blocking task. */
21157 +	if (is_running(t))
21158 +		cedf_job_arrival(t);
21159 +}
21160 +
21161 +/* cedf_tick - this function is called for every local timer
21162 + *                         interrupt.
21163 + *
21164 + *                   checks whether the current task has expired and checks
21165 + *                   whether we need to preempt it if it has not expired
21166 + */
21167 +static void cedf_tick(struct task_struct* t)
21168 +{
21169 +	if (is_realtime(t) && budget_enforced(t) && budget_exhausted(t)) {
21170 +		if (!is_np(t)) {
21171 +			/* np tasks will be preempted when they become
21172 +			 * preemptable again
21173 +			 */
21174 +			litmus_reschedule_local();
21175 +			set_will_schedule();
21176 +			TRACE("cedf_scheduler_tick: "
21177 +			      "%d is preemptable "
21178 +			      " => FORCE_RESCHED\n", t->pid);
21179 +		} else if (is_user_np(t)) {
21180 +			TRACE("cedf_scheduler_tick: "
21181 +			      "%d is non-preemptable, "
21182 +			      "preemption delayed.\n", t->pid);
21183 +			request_exit_np(t);
21184 +		}
21185 +	}
21186 +}
21187 +
21188 +
21189 +
21190 +
21191 +
21192 +
21193 +
21194 +
21195 +
21196 +
21197 +
21198 +
21199 +
21200 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
21201 +
21202 +
21203 +static void __do_lit_tasklet(struct tasklet_struct* tasklet, unsigned long flushed)
21204 +{
21205 +	if (!atomic_read(&tasklet->count)) {
21206 +		if(tasklet->owner) {
21207 +			sched_trace_tasklet_begin(tasklet->owner);
21208 +		}
21209 +
21210 +		if (!test_and_clear_bit(TASKLET_STATE_SCHED, &tasklet->state))
21211 +		{
21212 +			BUG();
21213 +		}
21214 +		TRACE("%s: Invoking tasklet with owner pid = %d (flushed = %d).\n",
21215 +			  __FUNCTION__,
21216 +			  (tasklet->owner) ? tasklet->owner->pid : -1,
21217 +			  (tasklet->owner) ? 0 : 1);
21218 +		tasklet->func(tasklet->data);
21219 +		tasklet_unlock(tasklet);
21220 +
21221 +		if(tasklet->owner) {
21222 +			sched_trace_tasklet_end(tasklet->owner, flushed);
21223 +		}
21224 +	}
21225 +	else {
21226 +		BUG();
21227 +	}
21228 +}
21229 +
21230 +
21231 +static void do_lit_tasklets(cedf_domain_t* cluster, struct task_struct* sched_task)
21232 +{
21233 +	int work_to_do = 1;
21234 +	struct tasklet_struct *tasklet = NULL;
21235 +	unsigned long flags;
21236 +
21237 +	while(work_to_do) {
21238 +
21239 +		TS_NV_SCHED_BOTISR_START;
21240 +
21241 +		raw_spin_lock_irqsave(&cluster->cluster_lock, flags);
21242 +
21243 +		if(cluster->pending_tasklets.head != NULL) {
21244 +			// remove tasklet at head.
21245 +			struct tasklet_struct *prev = NULL;
21246 +			tasklet = cluster->pending_tasklets.head;
21247 +
21248 +			// find a tasklet with prio to execute; skip ones where
21249 +			// sched_task has a higher priority.
21250 +			// We use the '!edf' test instead of swaping function arguments since
21251 +			// both sched_task and owner could be NULL.  In this case, we want to
21252 +			// still execute the tasklet.
21253 +			while(tasklet && !edf_higher_prio(tasklet->owner, sched_task)) {
21254 +				prev = tasklet;
21255 +				tasklet = tasklet->next;
21256 +			}
21257 +
21258 +			if(tasklet) {  // found something to execuite
21259 +				// remove the tasklet from the queue
21260 +				if(prev) {
21261 +					prev->next = tasklet->next;
21262 +					if(prev->next == NULL) {
21263 +						TRACE("%s: Tasklet for %d is the last element in tasklet queue.\n", __FUNCTION__, tasklet->owner->pid);
21264 +						cluster->pending_tasklets.tail = &(prev);
21265 +					}
21266 +				}
21267 +				else {
21268 +					cluster->pending_tasklets.head = tasklet->next;
21269 +					if(tasklet->next == NULL) {
21270 +						TRACE("%s: Tasklet for %d is the last element in tasklet queue.\n", __FUNCTION__, tasklet->owner->pid);
21271 +						cluster->pending_tasklets.tail = &(cluster->pending_tasklets.head);
21272 +					}
21273 +				}
21274 +			}
21275 +			else {
21276 +				TRACE("%s: No tasklets with eligible priority.\n", __FUNCTION__);
21277 +			}
21278 +		}
21279 +		else {
21280 +			TRACE("%s: Tasklet queue is empty.\n", __FUNCTION__);
21281 +		}
21282 +
21283 +		raw_spin_unlock_irqrestore(&cluster->cluster_lock, flags);
21284 +
21285 +		if(tasklet) {
21286 +			__do_lit_tasklet(tasklet, 0ul);
21287 +			tasklet = NULL;
21288 +		}
21289 +		else {
21290 +			work_to_do = 0;
21291 +		}
21292 +
21293 +		TS_NV_SCHED_BOTISR_END;
21294 +	}
21295 +}
21296 +
21297 +static void __add_pai_tasklet(struct tasklet_struct* tasklet, cedf_domain_t* cluster)
21298 +{
21299 +	struct tasklet_struct* step;
21300 +
21301 +	tasklet->next = NULL;  // make sure there are no old values floating around
21302 +
21303 +	step = cluster->pending_tasklets.head;
21304 +	if(step == NULL) {
21305 +		TRACE("%s: tasklet queue empty.  inserting tasklet for %d at head.\n", __FUNCTION__, tasklet->owner->pid);
21306 +		// insert at tail.
21307 +		*(cluster->pending_tasklets.tail) = tasklet;
21308 +		cluster->pending_tasklets.tail = &(tasklet->next);
21309 +	}
21310 +	else if((*(cluster->pending_tasklets.tail) != NULL) &&
21311 +			edf_higher_prio((*(cluster->pending_tasklets.tail))->owner, tasklet->owner)) {
21312 +		// insert at tail.
21313 +		TRACE("%s: tasklet belongs at end.  inserting tasklet for %d at tail.\n", __FUNCTION__, tasklet->owner->pid);
21314 +
21315 +		*(cluster->pending_tasklets.tail) = tasklet;
21316 +		cluster->pending_tasklets.tail = &(tasklet->next);
21317 +	}
21318 +	else {
21319 +
21320 +		// insert the tasklet somewhere in the middle.
21321 +
21322 +        TRACE("%s: tasklet belongs somewhere in the middle.\n", __FUNCTION__);
21323 +
21324 +		while(step->next && edf_higher_prio(step->next->owner, tasklet->owner)) {
21325 +			step = step->next;
21326 +		}
21327 +
21328 +		// insert tasklet right before step->next.
21329 +
21330 +		TRACE("%s: inserting tasklet for %d between %d and %d.\n", __FUNCTION__,
21331 +			  tasklet->owner->pid,
21332 +			  (step->owner) ?
21333 +			  step->owner->pid :
21334 +			  -1,
21335 +			  (step->next) ?
21336 +			  ((step->next->owner) ?
21337 +			   step->next->owner->pid :
21338 +			   -1) :
21339 +			  -1);
21340 +
21341 +		tasklet->next = step->next;
21342 +		step->next = tasklet;
21343 +
21344 +		// patch up the head if needed.
21345 +		if(cluster->pending_tasklets.head == step)
21346 +		{
21347 +			TRACE("%s: %d is the new tasklet queue head.\n", __FUNCTION__, tasklet->owner->pid);
21348 +			cluster->pending_tasklets.head = tasklet;
21349 +		}
21350 +	}
21351 +}
21352 +
21353 +static void cedf_run_tasklets(struct task_struct* sched_task)
21354 +{
21355 +	cedf_domain_t* cluster;
21356 +
21357 +	preempt_disable();
21358 +
21359 +	cluster = (is_realtime(sched_task)) ?
21360 +		task_cpu_cluster(sched_task) :
21361 +		remote_cluster(smp_processor_id());
21362 +
21363 +	if(cluster && cluster->pending_tasklets.head != NULL) {
21364 +		TRACE("%s: There are tasklets to process.\n", __FUNCTION__);
21365 +		do_lit_tasklets(cluster, sched_task);
21366 +	}
21367 +
21368 +	preempt_enable_no_resched();
21369 +}
21370 +
21371 +
21372 +
21373 +static int cedf_enqueue_pai_tasklet(struct tasklet_struct* tasklet)
21374 +{
21375 +#if 0
21376 +	cedf_domain_t *cluster = NULL;
21377 +	cpu_entry_t *targetCPU = NULL;
21378 +	int thisCPU;
21379 +	int runLocal = 0;
21380 +	int runNow = 0;
21381 +	unsigned long flags;
21382 +
21383 +    if(unlikely((tasklet->owner == NULL) || !is_realtime(tasklet->owner)))
21384 +    {
21385 +        TRACE("%s: No owner associated with this tasklet!\n", __FUNCTION__);
21386 +		return 0;
21387 +    }
21388 +
21389 +	cluster = task_cpu_cluster(tasklet->owner);
21390 +
21391 +	raw_spin_lock_irqsave(&cluster->cluster_lock, flags);
21392 +
21393 +	thisCPU = smp_processor_id();
21394 +
21395 +#ifdef CONFIG_SCHED_CPU_AFFINITY
21396 +	{
21397 +		cpu_entry_t* affinity = NULL;
21398 +
21399 +		// use this CPU if it is in our cluster and isn't running any RT work.
21400 +		if(cpu_isset(thisCPU, *cluster->cpu_map) && (__get_cpu_var(cedf_cpu_entries).linked == NULL)) {
21401 +			affinity = &(__get_cpu_var(cedf_cpu_entries));
21402 +		}
21403 +		else {
21404 +			// this CPU is busy or shouldn't run tasklet in this cluster.
21405 +			// look for available near by CPUs.
21406 +			// NOTE: Affinity towards owner and not this CPU.  Is this right?
21407 +			affinity =
21408 +				cedf_get_nearest_available_cpu(cluster,
21409 +								&per_cpu(cedf_cpu_entries, task_cpu(tasklet->owner)));
21410 +		}
21411 +
21412 +		targetCPU = affinity;
21413 +	}
21414 +#endif
21415 +
21416 +	if (targetCPU == NULL) {
21417 +		targetCPU = lowest_prio_cpu(cluster);
21418 +	}
21419 +
21420 +	if (edf_higher_prio(tasklet->owner, targetCPU->linked)) {
21421 +		if (thisCPU == targetCPU->cpu) {
21422 +			TRACE("%s: Run tasklet locally (and now).\n", __FUNCTION__);
21423 +			runLocal = 1;
21424 +			runNow = 1;
21425 +		}
21426 +		else {
21427 +			TRACE("%s: Run tasklet remotely (and now).\n", __FUNCTION__);
21428 +			runLocal = 0;
21429 +			runNow = 1;
21430 +		}
21431 +	}
21432 +	else {
21433 +		runLocal = 0;
21434 +		runNow = 0;
21435 +	}
21436 +
21437 +	if(!runLocal) {
21438 +		// enqueue the tasklet
21439 +		__add_pai_tasklet(tasklet, cluster);
21440 +	}
21441 +
21442 +	raw_spin_unlock_irqrestore(&cluster->cluster_lock, flags);
21443 +
21444 +
21445 +	if (runLocal /*&& runNow */) {  // runNow == 1 is implied
21446 +		TRACE("%s: Running tasklet on CPU where it was received.\n", __FUNCTION__);
21447 +		__do_lit_tasklet(tasklet, 0ul);
21448 +	}
21449 +	else if (runNow /*&& !runLocal */) {  // runLocal == 0 is implied
21450 +		TRACE("%s: Triggering CPU %d to run tasklet.\n", __FUNCTION__, targetCPU->cpu);
21451 +		preempt(targetCPU);  // need to be protected by cluster_lock?
21452 +	}
21453 +	else {
21454 +		TRACE("%s: Scheduling of tasklet was deferred.\n", __FUNCTION__);
21455 +	}
21456 +#else
21457 +	TRACE("%s: Running tasklet on CPU where it was received.\n", __FUNCTION__);
21458 +	__do_lit_tasklet(tasklet, 0ul);
21459 +#endif
21460 +	return(1); // success
21461 +}
21462 +
21463 +static void cedf_change_prio_pai_tasklet(struct task_struct *old_prio,
21464 +										 struct task_struct *new_prio)
21465 +{
21466 +	struct tasklet_struct* step;
21467 +	unsigned long flags;
21468 +	cedf_domain_t *cluster;
21469 +	struct task_struct *probe;
21470 +
21471 +	// identify the cluster by the assignment of these tasks.  one should
21472 +	// be non-NULL.
21473 +	probe = (old_prio) ? old_prio : new_prio;
21474 +
21475 +	if(probe) {
21476 +		cluster = task_cpu_cluster(probe);
21477 +
21478 +		if(cluster->pending_tasklets.head != NULL) {
21479 +			raw_spin_lock_irqsave(&cluster->cluster_lock, flags);
21480 +			for(step = cluster->pending_tasklets.head; step != NULL; step = step->next) {
21481 +				if(step->owner == old_prio) {
21482 +					TRACE("%s: Found tasklet to change: %d\n", __FUNCTION__, step->owner->pid);
21483 +					step->owner = new_prio;
21484 +				}
21485 +			}
21486 +			raw_spin_unlock_irqrestore(&cluster->cluster_lock, flags);
21487 +		}
21488 +	}
21489 +	else {
21490 +		TRACE("%s: Both priorities were NULL\n");
21491 +	}
21492 +}
21493 +
21494 +#endif  // PAI
21495 +
21496 +/* Getting schedule() right is a bit tricky. schedule() may not make any
21497 + * assumptions on the state of the current task since it may be called for a
21498 + * number of reasons. The reasons include a scheduler_tick() determined that it
21499 + * was necessary, because sys_exit_np() was called, because some Linux
21500 + * subsystem determined so, or even (in the worst case) because there is a bug
21501 + * hidden somewhere. Thus, we must take extreme care to determine what the
21502 + * current state is.
21503 + *
21504 + * The CPU could currently be scheduling a task (or not), be linked (or not).
21505 + *
21506 + * The following assertions for the scheduled task could hold:
21507 + *
21508 + *      - !is_running(scheduled)        // the job blocks
21509 + *	- scheduled->timeslice == 0	// the job completed (forcefully)
21510 + *	- get_rt_flag() == RT_F_SLEEP	// the job completed (by syscall)
21511 + * 	- linked != scheduled		// we need to reschedule (for any reason)
21512 + * 	- is_np(scheduled)		// rescheduling must be delayed,
21513 + *					   sys_exit_np must be requested
21514 + *
21515 + * Any of these can occur together.
21516 + */
21517 +static struct task_struct* cedf_schedule(struct task_struct * prev)
21518 +{
21519 +	cpu_entry_t* entry = &__get_cpu_var(cedf_cpu_entries);
21520 +	cedf_domain_t *cluster = entry->cluster;
21521 +	int out_of_time, sleep, preempt, np, exists, blocks;
21522 +	struct task_struct* next = NULL;
21523 +
21524 +#ifdef CONFIG_RELEASE_MASTER
21525 +	/* Bail out early if we are the release master.
21526 +	 * The release master never schedules any real-time tasks.
21527 +	 */
21528 +	if (unlikely(cluster->domain.release_master == entry->cpu)) {
21529 +		sched_state_task_picked();
21530 +		return NULL;
21531 +	}
21532 +#endif
21533 +
21534 +	raw_spin_lock(&cluster->cluster_lock);
21535 +	clear_will_schedule();
21536 +
21537 +	/* sanity checking */
21538 +	BUG_ON(entry->scheduled && entry->scheduled != prev);
21539 +	BUG_ON(entry->scheduled && !is_realtime(prev));
21540 +	BUG_ON(is_realtime(prev) && !entry->scheduled);
21541 +
21542 +	/* (0) Determine state */
21543 +	exists      = entry->scheduled != NULL;
21544 +	blocks      = exists && !is_running(entry->scheduled);
21545 +	out_of_time = exists &&
21546 +				  budget_enforced(entry->scheduled) &&
21547 +				  budget_exhausted(entry->scheduled);
21548 +	np 	    = exists && is_np(entry->scheduled);
21549 +	sleep	    = exists && get_rt_flags(entry->scheduled) == RT_F_SLEEP;
21550 +	preempt     = entry->scheduled != entry->linked;
21551 +
21552 +#ifdef WANT_ALL_SCHED_EVENTS
21553 +	TRACE_TASK(prev, "invoked cedf_schedule.\n");
21554 +#endif
21555 +
21556 +	if (exists)
21557 +		TRACE_TASK(prev,
21558 +			   "blocks:%d out_of_time:%d np:%d sleep:%d preempt:%d "
21559 +			   "state:%d sig:%d\n",
21560 +			   blocks, out_of_time, np, sleep, preempt,
21561 +			   prev->state, signal_pending(prev));
21562 +	if (entry->linked && preempt)
21563 +		TRACE_TASK(prev, "will be preempted by %s/%d\n",
21564 +			   entry->linked->comm, entry->linked->pid);
21565 +
21566 +
21567 +	/* If a task blocks we have no choice but to reschedule.
21568 +	 */
21569 +	if (blocks)
21570 +		unlink(entry->scheduled);
21571 +
21572 +#if defined(CONFIG_LITMUS_NVIDIA) && defined(CONFIG_LITMUS_AFFINITY_LOCKING)
21573 +	if(exists && is_realtime(entry->scheduled) && tsk_rt(entry->scheduled)->held_gpus) {
21574 +		if(!blocks || tsk_rt(entry->scheduled)->suspend_gpu_tracker_on_block) {
21575 +			// don't track preemptions or locking protocol suspensions.
21576 +			TRACE_TASK(entry->scheduled, "stopping GPU tracker.\n");
21577 +			stop_gpu_tracker(entry->scheduled);
21578 +		}
21579 +		else if(blocks && !tsk_rt(entry->scheduled)->suspend_gpu_tracker_on_block) {
21580 +			TRACE_TASK(entry->scheduled, "GPU tracker remains on during suspension.\n");
21581 +		}
21582 +	}
21583 +#endif
21584 +
21585 +	/* Request a sys_exit_np() call if we would like to preempt but cannot.
21586 +	 * We need to make sure to update the link structure anyway in case
21587 +	 * that we are still linked. Multiple calls to request_exit_np() don't
21588 +	 * hurt.
21589 +	 */
21590 +	if (np && (out_of_time || preempt || sleep)) {
21591 +		unlink(entry->scheduled);
21592 +		request_exit_np(entry->scheduled);
21593 +	}
21594 +
21595 +	/* Any task that is preemptable and either exhausts its execution
21596 +	 * budget or wants to sleep completes. We may have to reschedule after
21597 +	 * this. Don't do a job completion if we block (can't have timers running
21598 +	 * for blocked jobs). Preemption go first for the same reason.
21599 +	 */
21600 +	if (!np && (out_of_time || sleep) && !blocks && !preempt)
21601 +		job_completion(entry->scheduled, !sleep);
21602 +
21603 +	/* Link pending task if we became unlinked.
21604 +	 */
21605 +	if (!entry->linked)
21606 +		link_task_to_cpu(__take_ready(&cluster->domain), entry);
21607 +
21608 +	/* The final scheduling decision. Do we need to switch for some reason?
21609 +	 * If linked is different from scheduled, then select linked as next.
21610 +	 */
21611 +	if ((!np || blocks) &&
21612 +	    entry->linked != entry->scheduled) {
21613 +		/* Schedule a linked job? */
21614 +		if (entry->linked) {
21615 +			entry->linked->rt_param.scheduled_on = entry->cpu;
21616 +			next = entry->linked;
21617 +		}
21618 +		if (entry->scheduled) {
21619 +			/* not gonna be scheduled soon */
21620 +			entry->scheduled->rt_param.scheduled_on = NO_CPU;
21621 +			TRACE_TASK(entry->scheduled, "scheduled_on = NO_CPU\n");
21622 +		}
21623 +	} else
21624 +		/* Only override Linux scheduler if we have a real-time task
21625 +		 * scheduled that needs to continue.
21626 +		 */
21627 +		if (exists)
21628 +			next = prev;
21629 +
21630 +	sched_state_task_picked();
21631 +	raw_spin_unlock(&cluster->cluster_lock);
21632 +
21633 +#ifdef WANT_ALL_SCHED_EVENTS
21634 +	TRACE("cluster_lock released, next=0x%p\n", next);
21635 +
21636 +	if (next)
21637 +		TRACE_TASK(next, "scheduled at %llu\n", litmus_clock());
21638 +	else if (exists && !next)
21639 +		TRACE("becomes idle at %llu.\n", litmus_clock());
21640 +#endif
21641 +
21642 +	return next;
21643 +}
21644 +
21645 +
21646 +/* _finish_switch - we just finished the switch away from prev
21647 + */
21648 +static void cedf_finish_switch(struct task_struct *prev)
21649 +{
21650 +	cpu_entry_t* 	entry = &__get_cpu_var(cedf_cpu_entries);
21651 +
21652 +	entry->scheduled = is_realtime(current) ? current : NULL;
21653 +#ifdef WANT_ALL_SCHED_EVENTS
21654 +	TRACE_TASK(prev, "switched away from\n");
21655 +#endif
21656 +}
21657 +
21658 +
21659 +/*	Prepare a task for running in RT mode
21660 + */
21661 +static void cedf_task_new(struct task_struct * t, int on_rq, int running)
21662 +{
21663 +	unsigned long 		flags;
21664 +	cpu_entry_t* 		entry;
21665 +	cedf_domain_t*		cluster;
21666 +
21667 +	TRACE("c-edf: task new %d\n", t->pid);
21668 +
21669 +	/* the cluster doesn't change even if t is running */
21670 +	cluster = task_cpu_cluster(t);
21671 +
21672 +	raw_spin_lock_irqsave(&cluster->cluster_lock, flags);
21673 +
21674 +	/* setup job params */
21675 +	release_at(t, litmus_clock());
21676 +
21677 +	if (running) {
21678 +		entry = &per_cpu(cedf_cpu_entries, task_cpu(t));
21679 +		BUG_ON(entry->scheduled);
21680 +
21681 +#ifdef CONFIG_RELEASE_MASTER
21682 +		if (entry->cpu != cluster->domain.release_master) {
21683 +#endif
21684 +			entry->scheduled = t;
21685 +			tsk_rt(t)->scheduled_on = task_cpu(t);
21686 +#ifdef CONFIG_RELEASE_MASTER
21687 +		} else {
21688 +			/* do not schedule on release master */
21689 +			preempt(entry); /* force resched */
21690 +			tsk_rt(t)->scheduled_on = NO_CPU;
21691 +		}
21692 +#endif
21693 +	} else {
21694 +		t->rt_param.scheduled_on = NO_CPU;
21695 +	}
21696 +	t->rt_param.linked_on          = NO_CPU;
21697 +
21698 +	cedf_job_arrival(t);
21699 +	raw_spin_unlock_irqrestore(&(cluster->cluster_lock), flags);
21700 +}
21701 +
21702 +static void cedf_task_wake_up(struct task_struct *task)
21703 +{
21704 +	unsigned long flags;
21705 +	//lt_t now;
21706 +	cedf_domain_t *cluster;
21707 +
21708 +	TRACE_TASK(task, "wake_up at %llu\n", litmus_clock());
21709 +
21710 +	cluster = task_cpu_cluster(task);
21711 +
21712 +	raw_spin_lock_irqsave(&cluster->cluster_lock, flags);
21713 +
21714 +#if 0 // sproadic task model
21715 +	/* We need to take suspensions because of semaphores into
21716 +	 * account! If a job resumes after being suspended due to acquiring
21717 +	 * a semaphore, it should never be treated as a new job release.
21718 +	 */
21719 +	if (get_rt_flags(task) == RT_F_EXIT_SEM) {
21720 +		set_rt_flags(task, RT_F_RUNNING);
21721 +	} else {
21722 +		now = litmus_clock();
21723 +		if (is_tardy(task, now)) {
21724 +			/* new sporadic release */
21725 +			release_at(task, now);
21726 +			sched_trace_task_release(task);
21727 +		}
21728 +		else {
21729 +			if (task->rt.time_slice) {
21730 +				/* came back in time before deadline
21731 +				*/
21732 +				set_rt_flags(task, RT_F_RUNNING);
21733 +			}
21734 +		}
21735 +	}
21736 +#else
21737 +	set_rt_flags(task, RT_F_RUNNING);  // periodic model
21738 +#endif
21739 +
21740 +	if(tsk_rt(task)->linked_on == NO_CPU)
21741 +		cedf_job_arrival(task);
21742 +
21743 +	raw_spin_unlock_irqrestore(&cluster->cluster_lock, flags);
21744 +}
21745 +
21746 +static void cedf_task_block(struct task_struct *t)
21747 +{
21748 +	unsigned long flags;
21749 +	cedf_domain_t *cluster;
21750 +
21751 +	TRACE_TASK(t, "block at %llu\n", litmus_clock());
21752 +
21753 +	cluster = task_cpu_cluster(t);
21754 +
21755 +	/* unlink if necessary */
21756 +	raw_spin_lock_irqsave(&cluster->cluster_lock, flags);
21757 +	unlink(t);
21758 +	raw_spin_unlock_irqrestore(&cluster->cluster_lock, flags);
21759 +
21760 +	BUG_ON(!is_realtime(t));
21761 +}
21762 +
21763 +
21764 +static void cedf_task_exit(struct task_struct * t)
21765 +{
21766 +	unsigned long flags;
21767 +	cedf_domain_t *cluster = task_cpu_cluster(t);
21768 +
21769 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
21770 +	cedf_change_prio_pai_tasklet(t, NULL);
21771 +#endif
21772 +
21773 +	/* unlink if necessary */
21774 +	raw_spin_lock_irqsave(&cluster->cluster_lock, flags);
21775 +	unlink(t);
21776 +	if (tsk_rt(t)->scheduled_on != NO_CPU) {
21777 +		cpu_entry_t *cpu;
21778 +		cpu = &per_cpu(cedf_cpu_entries, tsk_rt(t)->scheduled_on);
21779 +		cpu->scheduled = NULL;
21780 +		tsk_rt(t)->scheduled_on = NO_CPU;
21781 +	}
21782 +	raw_spin_unlock_irqrestore(&cluster->cluster_lock, flags);
21783 +
21784 +	BUG_ON(!is_realtime(t));
21785 +        TRACE_TASK(t, "RIP\n");
21786 +}
21787 +
21788 +static long cedf_admit_task(struct task_struct* tsk)
21789 +{
21790 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
21791 +	INIT_BINHEAP_HANDLE(&tsk_rt(tsk)->hp_blocked_tasks,
21792 +						edf_max_heap_base_priority_order);
21793 +#endif
21794 +
21795 +	return task_cpu(tsk) == tsk->rt_param.task_params.cpu ? 0 : -EINVAL;
21796 +}
21797 +
21798 +
21799 +
21800 +#ifdef CONFIG_LITMUS_LOCKING
21801 +
21802 +#include <litmus/fdso.h>
21803 +
21804 +
21805 +
21806 +/* called with IRQs off */
21807 +static void __increase_priority_inheritance(struct task_struct* t,
21808 +										    struct task_struct* prio_inh)
21809 +{
21810 +	int linked_on;
21811 +	int check_preempt = 0;
21812 +
21813 +	cedf_domain_t* cluster = task_cpu_cluster(t);
21814 +
21815 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
21816 +	/* this sanity check allows for weaker locking in protocols */
21817 +	/* TODO (klitirqd): Skip this check if 't' is a proxy thread (???) */
21818 +	if(__edf_higher_prio(prio_inh, BASE, t, EFFECTIVE)) {
21819 +#endif
21820 +		TRACE_TASK(t, "inherits priority from %s/%d\n",
21821 +				   prio_inh->comm, prio_inh->pid);
21822 +		tsk_rt(t)->inh_task = prio_inh;
21823 +
21824 +		linked_on  = tsk_rt(t)->linked_on;
21825 +
21826 +		/* If it is scheduled, then we need to reorder the CPU heap. */
21827 +		if (linked_on != NO_CPU) {
21828 +			TRACE_TASK(t, "%s: linked  on %d\n",
21829 +					   __FUNCTION__, linked_on);
21830 +			/* Holder is scheduled; need to re-order CPUs.
21831 +			 * We can't use heap_decrease() here since
21832 +			 * the cpu_heap is ordered in reverse direction, so
21833 +			 * it is actually an increase. */
21834 +			binheap_delete(&per_cpu(cedf_cpu_entries, linked_on).hn,
21835 +						   &cluster->cpu_heap);
21836 +			binheap_add(&per_cpu(cedf_cpu_entries, linked_on).hn,
21837 +						&cluster->cpu_heap, cpu_entry_t, hn);
21838 +
21839 +		} else {
21840 +			/* holder may be queued: first stop queue changes */
21841 +			raw_spin_lock(&cluster->domain.release_lock);
21842 +			if (is_queued(t)) {
21843 +				TRACE_TASK(t, "%s: is queued\n",
21844 +						   __FUNCTION__);
21845 +				/* We need to update the position of holder in some
21846 +				 * heap. Note that this could be a release heap if we
21847 +				 * budget enforcement is used and this job overran. */
21848 +				check_preempt =
21849 +					!bheap_decrease(edf_ready_order, tsk_rt(t)->heap_node);
21850 +			} else {
21851 +				/* Nothing to do: if it is not queued and not linked
21852 +				 * then it is either sleeping or currently being moved
21853 +				 * by other code (e.g., a timer interrupt handler) that
21854 +				 * will use the correct priority when enqueuing the
21855 +				 * task. */
21856 +				TRACE_TASK(t, "%s: is NOT queued => Done.\n",
21857 +						   __FUNCTION__);
21858 +			}
21859 +			raw_spin_unlock(&cluster->domain.release_lock);
21860 +
21861 +			/* If holder was enqueued in a release heap, then the following
21862 +			 * preemption check is pointless, but we can't easily detect
21863 +			 * that case. If you want to fix this, then consider that
21864 +			 * simply adding a state flag requires O(n) time to update when
21865 +			 * releasing n tasks, which conflicts with the goal to have
21866 +			 * O(log n) merges. */
21867 +			if (check_preempt) {
21868 +				/* heap_decrease() hit the top level of the heap: make
21869 +				 * sure preemption checks get the right task, not the
21870 +				 * potentially stale cache. */
21871 +				bheap_uncache_min(edf_ready_order,
21872 +								  &cluster->domain.ready_queue);
21873 +				check_for_preemptions(cluster);
21874 +			}
21875 +		}
21876 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
21877 +	}
21878 +	else {
21879 +		TRACE_TASK(t, "Spurious invalid priority increase. "
21880 +				   "Inheritance request: %s/%d [eff_prio = %s/%d] to inherit from %s/%d\n"
21881 +				   "Occurance is likely okay: probably due to (hopefully safe) concurrent priority updates.\n",
21882 +				   t->comm, t->pid,
21883 +				   effective_priority(t)->comm, effective_priority(t)->pid,
21884 +				   (prio_inh) ? prio_inh->comm : "nil",
21885 +				   (prio_inh) ? prio_inh->pid : -1);
21886 +		WARN_ON(!prio_inh);
21887 +	}
21888 +#endif
21889 +}
21890 +
21891 +/* called with IRQs off */
21892 +static void increase_priority_inheritance(struct task_struct* t, struct task_struct* prio_inh)
21893 +{
21894 +	cedf_domain_t* cluster = task_cpu_cluster(t);
21895 +
21896 +	raw_spin_lock(&cluster->cluster_lock);
21897 +
21898 +	__increase_priority_inheritance(t, prio_inh);
21899 +
21900 +#ifdef CONFIG_LITMUS_SOFTIRQD
21901 +	if(tsk_rt(t)->cur_klitirqd != NULL)
21902 +	{
21903 +		TRACE_TASK(t, "%s/%d inherits a new priority!\n",
21904 +				   tsk_rt(t)->cur_klitirqd->comm, tsk_rt(t)->cur_klitirqd->pid);
21905 +
21906 +		__increase_priority_inheritance(tsk_rt(t)->cur_klitirqd, prio_inh);
21907 +	}
21908 +#endif
21909 +
21910 +	raw_spin_unlock(&cluster->cluster_lock);
21911 +
21912 +#if defined(CONFIG_LITMUS_PAI_SOFTIRQD) && defined(CONFIG_LITMUS_NVIDIA)
21913 +	if(tsk_rt(t)->held_gpus) {
21914 +		int i;
21915 +		for(i = find_first_bit(&tsk_rt(t)->held_gpus, sizeof(tsk_rt(t)->held_gpus));
21916 +			i < NV_DEVICE_NUM;
21917 +			i = find_next_bit(&tsk_rt(t)->held_gpus, sizeof(tsk_rt(t)->held_gpus), i+1)) {
21918 +			pai_check_priority_increase(t, i);
21919 +		}
21920 +	}
21921 +#endif
21922 +}
21923 +
21924 +/* called with IRQs off */
21925 +static void __decrease_priority_inheritance(struct task_struct* t,
21926 +											struct task_struct* prio_inh)
21927 +{
21928 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
21929 +	if(__edf_higher_prio(t, EFFECTIVE, prio_inh, BASE)) {
21930 +#endif
21931 +		/* A job only stops inheriting a priority when it releases a
21932 +		 * resource. Thus we can make the following assumption.*/
21933 +		if(prio_inh)
21934 +			TRACE_TASK(t, "EFFECTIVE priority decreased to %s/%d\n",
21935 +					   prio_inh->comm, prio_inh->pid);
21936 +		else
21937 +			TRACE_TASK(t, "base priority restored.\n");
21938 +
21939 +		tsk_rt(t)->inh_task = prio_inh;
21940 +
21941 +		if(tsk_rt(t)->scheduled_on != NO_CPU) {
21942 +			TRACE_TASK(t, "is scheduled.\n");
21943 +
21944 +			/* Check if rescheduling is necessary. We can't use heap_decrease()
21945 +			 * since the priority was effectively lowered. */
21946 +			unlink(t);
21947 +			cedf_job_arrival(t);
21948 +		}
21949 +		else {
21950 +			cedf_domain_t* cluster = task_cpu_cluster(t);
21951 +			/* task is queued */
21952 +			raw_spin_lock(&cluster->domain.release_lock);
21953 +			if (is_queued(t)) {
21954 +				TRACE_TASK(t, "is queued.\n");
21955 +
21956 +				/* decrease in priority, so we have to re-add to binomial heap */
21957 +				unlink(t);
21958 +				cedf_job_arrival(t);
21959 +			}
21960 +			else {
21961 +				TRACE_TASK(t, "is not in scheduler. Probably on wait queue somewhere.\n");
21962 +			}
21963 +			raw_spin_unlock(&cluster->domain.release_lock);
21964 +		}
21965 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
21966 +	}
21967 +	else {
21968 +		TRACE_TASK(t, "Spurious invalid priority decrease. "
21969 +				   "Inheritance request: %s/%d [eff_prio = %s/%d] to inherit from %s/%d\n"
21970 +				   "Occurance is likely okay: probably due to (hopefully safe) concurrent priority updates.\n",
21971 +				   t->comm, t->pid,
21972 +				   effective_priority(t)->comm, effective_priority(t)->pid,
21973 +				   (prio_inh) ? prio_inh->comm : "nil",
21974 +				   (prio_inh) ? prio_inh->pid : -1);
21975 +	}
21976 +#endif
21977 +}
21978 +
21979 +static void decrease_priority_inheritance(struct task_struct* t,
21980 +										struct task_struct* prio_inh)
21981 +{
21982 +	cedf_domain_t* cluster = task_cpu_cluster(t);
21983 +
21984 +	raw_spin_lock(&cluster->cluster_lock);
21985 +	__decrease_priority_inheritance(t, prio_inh);
21986 +
21987 +#ifdef CONFIG_LITMUS_SOFTIRQD
21988 +	if(tsk_rt(t)->cur_klitirqd != NULL)
21989 +	{
21990 +		TRACE_TASK(t, "%s/%d decreases in priority!\n",
21991 +				   tsk_rt(t)->cur_klitirqd->comm, tsk_rt(t)->cur_klitirqd->pid);
21992 +
21993 +		__decrease_priority_inheritance(tsk_rt(t)->cur_klitirqd, prio_inh);
21994 +	}
21995 +#endif
21996 +
21997 +	raw_spin_unlock(&cluster->cluster_lock);
21998 +
21999 +#if defined(CONFIG_LITMUS_PAI_SOFTIRQD) && defined(CONFIG_LITMUS_NVIDIA)
22000 +	if(tsk_rt(t)->held_gpus) {
22001 +		int i;
22002 +		for(i = find_first_bit(&tsk_rt(t)->held_gpus, sizeof(tsk_rt(t)->held_gpus));
22003 +			i < NV_DEVICE_NUM;
22004 +			i = find_next_bit(&tsk_rt(t)->held_gpus, sizeof(tsk_rt(t)->held_gpus), i+1)) {
22005 +			pai_check_priority_decrease(t, i);
22006 +		}
22007 +	}
22008 +#endif
22009 +}
22010 +
22011 +
22012 +
22013 +
22014 +
22015 +#ifdef CONFIG_LITMUS_SOFTIRQD
22016 +/* called with IRQs off */
22017 +static void increase_priority_inheritance_klitirqd(struct task_struct* klitirqd,
22018 +											  struct task_struct* old_owner,
22019 +											  struct task_struct* new_owner)
22020 +{
22021 +	cedf_domain_t* cluster = task_cpu_cluster(klitirqd);
22022 +
22023 +	BUG_ON(!(tsk_rt(klitirqd)->is_proxy_thread));
22024 +
22025 +	raw_spin_lock(&cluster->cluster_lock);
22026 +
22027 +	if(old_owner != new_owner)
22028 +	{
22029 +		if(old_owner)
22030 +		{
22031 +			// unreachable?
22032 +			tsk_rt(old_owner)->cur_klitirqd = NULL;
22033 +		}
22034 +
22035 +		TRACE_TASK(klitirqd, "giving ownership to %s/%d.\n",
22036 +				   new_owner->comm, new_owner->pid);
22037 +
22038 +		tsk_rt(new_owner)->cur_klitirqd = klitirqd;
22039 +	}
22040 +
22041 +	__decrease_priority_inheritance(klitirqd, NULL);  // kludge to clear out cur prio.
22042 +
22043 +	__increase_priority_inheritance(klitirqd,
22044 +			(tsk_rt(new_owner)->inh_task == NULL) ?
22045 +				new_owner :
22046 +				tsk_rt(new_owner)->inh_task);
22047 +
22048 +	raw_spin_unlock(&cluster->cluster_lock);
22049 +}
22050 +
22051 +
22052 +/* called with IRQs off */
22053 +static void decrease_priority_inheritance_klitirqd(struct task_struct* klitirqd,
22054 +												   struct task_struct* old_owner,
22055 +												   struct task_struct* new_owner)
22056 +{
22057 +	cedf_domain_t* cluster = task_cpu_cluster(klitirqd);
22058 +
22059 +	BUG_ON(!(tsk_rt(klitirqd)->is_proxy_thread));
22060 +
22061 +	raw_spin_lock(&cluster->cluster_lock);
22062 +
22063 +    TRACE_TASK(klitirqd, "priority restored\n");
22064 +
22065 +	__decrease_priority_inheritance(klitirqd, new_owner);
22066 +
22067 +	tsk_rt(old_owner)->cur_klitirqd = NULL;
22068 +
22069 +	raw_spin_unlock(&cluster->cluster_lock);
22070 +}
22071 +#endif // CONFIG_LITMUS_SOFTIRQD
22072 +
22073 +
22074 +
22075 +
22076 +
22077 +
22078 +
22079 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
22080 +
22081 +/* called with IRQs off */
22082 +/* preconditions:
22083 + (1) The 'hp_blocked_tasks_lock' of task 't' is held.
22084 + (2) The lock 'to_unlock' is held.
22085 + */
22086 +static void nested_increase_priority_inheritance(struct task_struct* t,
22087 +												 struct task_struct* prio_inh,
22088 +												 raw_spinlock_t *to_unlock,
22089 +												 unsigned long irqflags)
22090 +{
22091 +	struct litmus_lock *blocked_lock = tsk_rt(t)->blocked_lock;
22092 +
22093 +	if(tsk_rt(t)->inh_task != prio_inh) { 		// shield redundent calls.
22094 +		increase_priority_inheritance(t, prio_inh);  // increase our prio.
22095 +	}
22096 +
22097 +	raw_spin_unlock(&tsk_rt(t)->hp_blocked_tasks_lock);  // unlock the t's heap.
22098 +
22099 +
22100 +	if(blocked_lock) {
22101 +		if(blocked_lock->ops->propagate_increase_inheritance) {
22102 +			TRACE_TASK(t, "Inheritor is blocked (...perhaps).  Checking lock %d.\n",
22103 +					   blocked_lock->ident);
22104 +
22105 +			// beware: recursion
22106 +			blocked_lock->ops->propagate_increase_inheritance(blocked_lock,
22107 +															  t, to_unlock,
22108 +															  irqflags);
22109 +		}
22110 +		else {
22111 +			TRACE_TASK(t, "Inheritor is blocked on lock (%d) that does not support nesting!\n",
22112 +					   blocked_lock->ident);
22113 +			unlock_fine_irqrestore(to_unlock, irqflags);
22114 +		}
22115 +	}
22116 +	else {
22117 +		TRACE_TASK(t, "is not blocked.  No propagation.\n");
22118 +		unlock_fine_irqrestore(to_unlock, irqflags);
22119 +	}
22120 +}
22121 +
22122 +/* called with IRQs off */
22123 +/* preconditions:
22124 + (1) The 'hp_blocked_tasks_lock' of task 't' is held.
22125 + (2) The lock 'to_unlock' is held.
22126 + */
22127 +static void nested_decrease_priority_inheritance(struct task_struct* t,
22128 +												 struct task_struct* prio_inh,
22129 +												 raw_spinlock_t *to_unlock,
22130 +												 unsigned long irqflags)
22131 +{
22132 +	struct litmus_lock *blocked_lock = tsk_rt(t)->blocked_lock;
22133 +	decrease_priority_inheritance(t, prio_inh);
22134 +
22135 +	raw_spin_unlock(&tsk_rt(t)->hp_blocked_tasks_lock);  // unlock the t's heap.
22136 +
22137 +	if(blocked_lock) {
22138 +		if(blocked_lock->ops->propagate_decrease_inheritance) {
22139 +			TRACE_TASK(t, "Inheritor is blocked (...perhaps).  Checking lock %d.\n",
22140 +					   blocked_lock->ident);
22141 +
22142 +			// beware: recursion
22143 +			blocked_lock->ops->propagate_decrease_inheritance(blocked_lock, t,
22144 +															  to_unlock,
22145 +															  irqflags);
22146 +		}
22147 +		else {
22148 +			TRACE_TASK(t, "Inheritor is blocked on lock (%p) that does not support nesting!\n",
22149 +					   blocked_lock);
22150 +			unlock_fine_irqrestore(to_unlock, irqflags);
22151 +		}
22152 +	}
22153 +	else {
22154 +		TRACE_TASK(t, "is not blocked.  No propagation.\n");
22155 +		unlock_fine_irqrestore(to_unlock, irqflags);
22156 +	}
22157 +}
22158 +
22159 +
22160 +/* ******************** RSM MUTEX ********************** */
22161 +
22162 +static struct litmus_lock_ops cedf_rsm_mutex_lock_ops = {
22163 +	.lock   = rsm_mutex_lock,
22164 +	.unlock = rsm_mutex_unlock,
22165 +	.close  = rsm_mutex_close,
22166 +	.deallocate = rsm_mutex_free,
22167 +
22168 +	.propagate_increase_inheritance = rsm_mutex_propagate_increase_inheritance,
22169 +	.propagate_decrease_inheritance = rsm_mutex_propagate_decrease_inheritance,
22170 +
22171 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
22172 +	.dgl_lock = rsm_mutex_dgl_lock,
22173 +	.is_owner = rsm_mutex_is_owner,
22174 +	.enable_priority = rsm_mutex_enable_priority,
22175 +#endif
22176 +};
22177 +
22178 +static struct litmus_lock* cedf_new_rsm_mutex(void)
22179 +{
22180 +	return rsm_mutex_new(&cedf_rsm_mutex_lock_ops);
22181 +}
22182 +
22183 +/* ******************** IKGLP ********************** */
22184 +
22185 +static struct litmus_lock_ops cedf_ikglp_lock_ops = {
22186 +	.lock   = ikglp_lock,
22187 +	.unlock = ikglp_unlock,
22188 +	.close  = ikglp_close,
22189 +	.deallocate = ikglp_free,
22190 +
22191 +	// ikglp can only be an outer-most lock.
22192 +	.propagate_increase_inheritance = NULL,
22193 +	.propagate_decrease_inheritance = NULL,
22194 +};
22195 +
22196 +static struct litmus_lock* cedf_new_ikglp(void* __user arg)
22197 +{
22198 +	// assumes clusters of uniform size.
22199 +	return ikglp_new(cluster_size/num_clusters, &cedf_ikglp_lock_ops, arg);
22200 +}
22201 +
22202 +#endif  /* CONFIG_LITMUS_NESTED_LOCKING */
22203 +
22204 +
22205 +
22206 +
22207 +/* ******************** KFMLP support ********************** */
22208 +
22209 +static struct litmus_lock_ops cedf_kfmlp_lock_ops = {
22210 +	.lock   = kfmlp_lock,
22211 +	.unlock = kfmlp_unlock,
22212 +	.close  = kfmlp_close,
22213 +	.deallocate = kfmlp_free,
22214 +
22215 +	// kfmlp can only be an outer-most lock.
22216 +	.propagate_increase_inheritance = NULL,
22217 +	.propagate_decrease_inheritance = NULL,
22218 +};
22219 +
22220 +
22221 +static struct litmus_lock* cedf_new_kfmlp(void* __user arg)
22222 +{
22223 +	return kfmlp_new(&cedf_kfmlp_lock_ops, arg);
22224 +}
22225 +
22226 +
22227 +/* **** lock constructor **** */
22228 +
22229 +static long cedf_allocate_lock(struct litmus_lock **lock, int type,
22230 +								 void* __user args)
22231 +{
22232 +	int err;
22233 +
22234 +	switch (type) {
22235 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
22236 +		case RSM_MUTEX:
22237 +			*lock = cedf_new_rsm_mutex();
22238 +			break;
22239 +
22240 +		case IKGLP_SEM:
22241 +			*lock = cedf_new_ikglp(args);
22242 +			break;
22243 +#endif
22244 +		case KFMLP_SEM:
22245 +			*lock = cedf_new_kfmlp(args);
22246 +			break;
22247 +
22248 +		default:
22249 +			err = -ENXIO;
22250 +			goto UNSUPPORTED_LOCK;
22251 +	};
22252 +
22253 +	if (*lock)
22254 +		err = 0;
22255 +	else
22256 +		err = -ENOMEM;
22257 +
22258 +UNSUPPORTED_LOCK:
22259 +	return err;
22260 +}
22261 +
22262 +#endif  // CONFIG_LITMUS_LOCKING
22263 +
22264 +
22265 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
22266 +static struct affinity_observer_ops cedf_kfmlp_affinity_ops = {
22267 +	.close = kfmlp_aff_obs_close,
22268 +	.deallocate = kfmlp_aff_obs_free,
22269 +};
22270 +
22271 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
22272 +static struct affinity_observer_ops cedf_ikglp_affinity_ops = {
22273 +	.close = ikglp_aff_obs_close,
22274 +	.deallocate = ikglp_aff_obs_free,
22275 +};
22276 +#endif
22277 +
22278 +static long cedf_allocate_affinity_observer(struct affinity_observer **aff_obs,
22279 +											int type,
22280 +											void* __user args)
22281 +{
22282 +	int err;
22283 +
22284 +	switch (type) {
22285 +
22286 +		case KFMLP_SIMPLE_GPU_AFF_OBS:
22287 +			*aff_obs = kfmlp_simple_gpu_aff_obs_new(&cedf_kfmlp_affinity_ops, args);
22288 +			break;
22289 +
22290 +		case KFMLP_GPU_AFF_OBS:
22291 +			*aff_obs = kfmlp_gpu_aff_obs_new(&cedf_kfmlp_affinity_ops, args);
22292 +			break;
22293 +
22294 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
22295 +		case IKGLP_SIMPLE_GPU_AFF_OBS:
22296 +			*aff_obs = ikglp_simple_gpu_aff_obs_new(&cedf_ikglp_affinity_ops, args);
22297 +			break;
22298 +
22299 +		case IKGLP_GPU_AFF_OBS:
22300 +			*aff_obs = ikglp_gpu_aff_obs_new(&cedf_ikglp_affinity_ops, args);
22301 +			break;
22302 +#endif
22303 +		default:
22304 +			err = -ENXIO;
22305 +			goto UNSUPPORTED_AFF_OBS;
22306 +	};
22307 +
22308 +	if (*aff_obs)
22309 +		err = 0;
22310 +	else
22311 +		err = -ENOMEM;
22312 +
22313 +UNSUPPORTED_AFF_OBS:
22314 +	return err;
22315 +}
22316 +#endif
22317 +
22318 +
22319 +
22320 +
22321 +#ifdef VERBOSE_INIT
22322 +static void print_cluster_topology(cpumask_var_t mask, int cpu)
22323 +{
22324 +	int chk;
22325 +	char buf[255];
22326 +
22327 +	chk = cpulist_scnprintf(buf, 254, mask);
22328 +	buf[chk] = '\0';
22329 +	printk(KERN_INFO "CPU = %d, shared cpu(s) = %s\n", cpu, buf);
22330 +
22331 +}
22332 +#endif
22333 +
22334 +static void cleanup_cedf(void)
22335 +{
22336 +	int i;
22337 +
22338 +#ifdef CONFIG_LITMUS_NVIDIA
22339 +	shutdown_nvidia_info();
22340 +#endif
22341 +
22342 +	if (clusters_allocated) {
22343 +		for (i = 0; i < num_clusters; i++) {
22344 +			kfree(cedf[i].cpus);
22345 +			free_cpumask_var(cedf[i].cpu_map);
22346 +		}
22347 +
22348 +		kfree(cedf);
22349 +	}
22350 +}
22351 +
22352 +static long cedf_activate_plugin(void)
22353 +{
22354 +	int i, j, cpu, ccpu, cpu_count;
22355 +	cpu_entry_t *entry;
22356 +
22357 +	cpumask_var_t mask;
22358 +	int chk = 0;
22359 +
22360 +	/* de-allocate old clusters, if any */
22361 +	cleanup_cedf();
22362 +
22363 +	printk(KERN_INFO "C-EDF: Activate Plugin, cluster configuration = %d\n",
22364 +			cluster_config);
22365 +
22366 +	/* need to get cluster_size first */
22367 +	if(!zalloc_cpumask_var(&mask, GFP_ATOMIC))
22368 +		return -ENOMEM;
22369 +
22370 +	if (unlikely(cluster_config == GLOBAL_CLUSTER)) {
22371 +		cluster_size = num_online_cpus();
22372 +	} else {
22373 +		chk = get_shared_cpu_map(mask, 0, cluster_config);
22374 +		if (chk) {
22375 +			/* if chk != 0 then it is the max allowed index */
22376 +			printk(KERN_INFO "C-EDF: Cluster configuration = %d "
22377 +			       "is not supported on this hardware.\n",
22378 +			       cluster_config);
22379 +			/* User should notice that the configuration failed, so
22380 +			 * let's bail out. */
22381 +			return -EINVAL;
22382 +		}
22383 +
22384 +		cluster_size = cpumask_weight(mask);
22385 +	}
22386 +
22387 +	if ((num_online_cpus() % cluster_size) != 0) {
22388 +		/* this can't be right, some cpus are left out */
22389 +		printk(KERN_ERR "C-EDF: Trying to group %d cpus in %d!\n",
22390 +				num_online_cpus(), cluster_size);
22391 +		return -1;
22392 +	}
22393 +
22394 +	num_clusters = num_online_cpus() / cluster_size;
22395 +	printk(KERN_INFO "C-EDF: %d cluster(s) of size = %d\n",
22396 +			num_clusters, cluster_size);
22397 +
22398 +	/* initialize clusters */
22399 +	cedf = kmalloc(num_clusters * sizeof(cedf_domain_t), GFP_ATOMIC);
22400 +	for (i = 0; i < num_clusters; i++) {
22401 +
22402 +		cedf[i].cpus = kmalloc(cluster_size * sizeof(cpu_entry_t),
22403 +				GFP_ATOMIC);
22404 +		INIT_BINHEAP_HANDLE(&(cedf[i].cpu_heap), cpu_lower_prio);
22405 +		edf_domain_init(&(cedf[i].domain), NULL, cedf_release_jobs);
22406 +
22407 +
22408 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
22409 +		cedf[i].pending_tasklets.head = NULL;
22410 +		cedf[i].pending_tasklets.tail = &(cedf[i].pending_tasklets.head);
22411 +#endif
22412 +
22413 +
22414 +		if(!zalloc_cpumask_var(&cedf[i].cpu_map, GFP_ATOMIC))
22415 +			return -ENOMEM;
22416 +#ifdef CONFIG_RELEASE_MASTER
22417 +		cedf[i].domain.release_master = atomic_read(&release_master_cpu);
22418 +#endif
22419 +	}
22420 +
22421 +	/* cycle through cluster and add cpus to them */
22422 +	for (i = 0; i < num_clusters; i++) {
22423 +
22424 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
22425 +		raw_spin_lock_init(&cedf[i].dgl_lock);
22426 +#endif
22427 +
22428 +		for_each_online_cpu(cpu) {
22429 +			/* check if the cpu is already in a cluster */
22430 +			for (j = 0; j < num_clusters; j++)
22431 +				if (cpumask_test_cpu(cpu, cedf[j].cpu_map))
22432 +					break;
22433 +			/* if it is in a cluster go to next cpu */
22434 +			if (j < num_clusters &&
22435 +					cpumask_test_cpu(cpu, cedf[j].cpu_map))
22436 +				continue;
22437 +
22438 +			/* this cpu isn't in any cluster */
22439 +			/* get the shared cpus */
22440 +			if (unlikely(cluster_config == GLOBAL_CLUSTER))
22441 +				cpumask_copy(mask, cpu_online_mask);
22442 +			else
22443 +				get_shared_cpu_map(mask, cpu, cluster_config);
22444 +
22445 +			cpumask_copy(cedf[i].cpu_map, mask);
22446 +#ifdef VERBOSE_INIT
22447 +			print_cluster_topology(mask, cpu);
22448 +#endif
22449 +			/* add cpus to current cluster and init cpu_entry_t */
22450 +			cpu_count = 0;
22451 +			for_each_cpu(ccpu, cedf[i].cpu_map) {
22452 +
22453 +				entry = &per_cpu(cedf_cpu_entries, ccpu);
22454 +				cedf[i].cpus[cpu_count] = entry;
22455 +				atomic_set(&entry->will_schedule, 0);
22456 +				entry->cpu = ccpu;
22457 +				entry->cluster = &cedf[i];
22458 +
22459 +				INIT_BINHEAP_NODE(&entry->hn);
22460 +
22461 +				cpu_count++;
22462 +
22463 +				entry->linked = NULL;
22464 +				entry->scheduled = NULL;
22465 +#ifdef CONFIG_RELEASE_MASTER
22466 +				/* only add CPUs that should schedule jobs */
22467 +				if (entry->cpu != entry->cluster->domain.release_master)
22468 +#endif
22469 +					update_cpu_position(entry);
22470 +			}
22471 +			/* done with this cluster */
22472 +			break;
22473 +		}
22474 +	}
22475 +
22476 +#ifdef CONFIG_LITMUS_SOFTIRQD
22477 +	{
22478 +		/* distribute the daemons evenly across the clusters. */
22479 +		int* affinity = kmalloc(NR_LITMUS_SOFTIRQD * sizeof(int), GFP_ATOMIC);
22480 +		int num_daemons_per_cluster = NR_LITMUS_SOFTIRQD / num_clusters;
22481 +		int left_over = NR_LITMUS_SOFTIRQD % num_clusters;
22482 +
22483 +		int daemon = 0;
22484 +		for(i = 0; i < num_clusters; ++i)
22485 +		{
22486 +			int num_on_this_cluster = num_daemons_per_cluster;
22487 +			if(left_over)
22488 +			{
22489 +				++num_on_this_cluster;
22490 +				--left_over;
22491 +			}
22492 +
22493 +			for(j = 0; j < num_on_this_cluster; ++j)
22494 +			{
22495 +				// first CPU of this cluster
22496 +				affinity[daemon++] = i*cluster_size;
22497 +			}
22498 +		}
22499 +
22500 +		spawn_klitirqd(affinity);
22501 +
22502 +		kfree(affinity);
22503 +	}
22504 +#endif
22505 +
22506 +#ifdef CONFIG_LITMUS_NVIDIA
22507 +	init_nvidia_info();
22508 +#endif
22509 +
22510 +	free_cpumask_var(mask);
22511 +	clusters_allocated = 1;
22512 +	return 0;
22513 +}
22514 +
22515 +/*	Plugin object	*/
22516 +static struct sched_plugin cedf_plugin __cacheline_aligned_in_smp = {
22517 +	.plugin_name		= "C-EDF",
22518 +	.finish_switch		= cedf_finish_switch,
22519 +	.tick			= cedf_tick,
22520 +	.task_new		= cedf_task_new,
22521 +	.complete_job		= complete_job,
22522 +	.task_exit		= cedf_task_exit,
22523 +	.schedule		= cedf_schedule,
22524 +	.task_wake_up		= cedf_task_wake_up,
22525 +	.task_block		= cedf_task_block,
22526 +	.admit_task		= cedf_admit_task,
22527 +	.activate_plugin	= cedf_activate_plugin,
22528 +	.compare		= edf_higher_prio,
22529 +#ifdef CONFIG_LITMUS_LOCKING
22530 +	.allocate_lock		= cedf_allocate_lock,
22531 +	.increase_prio		= increase_priority_inheritance,
22532 +	.decrease_prio		= decrease_priority_inheritance,
22533 +#endif
22534 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
22535 +	.nested_increase_prio		= nested_increase_priority_inheritance,
22536 +	.nested_decrease_prio		= nested_decrease_priority_inheritance,
22537 +	.__compare					= __edf_higher_prio,
22538 +#endif
22539 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
22540 +	.get_dgl_spinlock = cedf_get_dgl_spinlock,
22541 +#endif
22542 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
22543 +	.allocate_aff_obs = cedf_allocate_affinity_observer,
22544 +#endif
22545 +#ifdef CONFIG_LITMUS_SOFTIRQD
22546 +	.increase_prio_klitirqd = increase_priority_inheritance_klitirqd,
22547 +	.decrease_prio_klitirqd = decrease_priority_inheritance_klitirqd,
22548 +#endif
22549 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
22550 +	.enqueue_pai_tasklet = cedf_enqueue_pai_tasklet,
22551 +	.change_prio_pai_tasklet = cedf_change_prio_pai_tasklet,
22552 +	.run_tasklets = cedf_run_tasklets,
22553 +#endif
22554 +};
22555 +
22556 +static struct proc_dir_entry *cluster_file = NULL, *cedf_dir = NULL;
22557 +
22558 +static int __init init_cedf(void)
22559 +{
22560 +	int err, fs;
22561 +
22562 +	err = register_sched_plugin(&cedf_plugin);
22563 +	if (!err) {
22564 +		fs = make_plugin_proc_dir(&cedf_plugin, &cedf_dir);
22565 +		if (!fs)
22566 +			cluster_file = create_cluster_file(cedf_dir, &cluster_config);
22567 +		else
22568 +			printk(KERN_ERR "Could not allocate C-EDF procfs dir.\n");
22569 +	}
22570 +	return err;
22571 +}
22572 +
22573 +static void clean_cedf(void)
22574 +{
22575 +	cleanup_cedf();
22576 +	if (cluster_file)
22577 +		remove_proc_entry("cluster", cedf_dir);
22578 +	if (cedf_dir)
22579 +		remove_plugin_proc_dir(&cedf_plugin);
22580 +}
22581 +
22582 +module_init(init_cedf);
22583 +module_exit(clean_cedf);
22584 diff --git a/litmus/sched_gsn_edf.c b/litmus/sched_gsn_edf.c
22585 new file mode 100644
22586 index 0000000..8c48757
22587 --- /dev/null
22588 +++ b/litmus/sched_gsn_edf.c
22589 @@ -0,0 +1,1862 @@
22590 +/*
22591 + * litmus/sched_gsn_edf.c
22592 + *
22593 + * Implementation of the GSN-EDF scheduling algorithm.
22594 + *
22595 + * This version uses the simple approach and serializes all scheduling
22596 + * decisions by the use of a queue lock. This is probably not the
22597 + * best way to do it, but it should suffice for now.
22598 + */
22599 +
22600 +#include <linux/spinlock.h>
22601 +#include <linux/percpu.h>
22602 +#include <linux/sched.h>
22603 +#include <linux/slab.h>
22604 +#include <linux/uaccess.h>
22605 +#include <linux/module.h>
22606 +
22607 +#include <litmus/litmus.h>
22608 +#include <litmus/jobs.h>
22609 +#include <litmus/sched_plugin.h>
22610 +#include <litmus/edf_common.h>
22611 +#include <litmus/sched_trace.h>
22612 +
22613 +#include <litmus/preempt.h>
22614 +
22615 +#include <litmus/bheap.h>
22616 +#include <litmus/binheap.h>
22617 +
22618 +#ifdef CONFIG_LITMUS_LOCKING
22619 +#include <litmus/kfmlp_lock.h>
22620 +#endif
22621 +
22622 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
22623 +#include <litmus/rsm_lock.h>
22624 +#include <litmus/ikglp_lock.h>
22625 +#endif
22626 +
22627 +#ifdef CONFIG_SCHED_CPU_AFFINITY
22628 +#include <litmus/affinity.h>
22629 +#endif
22630 +
22631 +#ifdef CONFIG_LITMUS_SOFTIRQD
22632 +#include <litmus/litmus_softirq.h>
22633 +#endif
22634 +
22635 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
22636 +#include <linux/interrupt.h>
22637 +#include <litmus/trace.h>
22638 +#endif
22639 +
22640 +#ifdef CONFIG_LITMUS_NVIDIA
22641 +#include <litmus/nvidia_info.h>
22642 +#endif
22643 +
22644 +#if defined(CONFIG_LITMUS_AFFINITY_LOCKING) && defined(CONFIG_LITMUS_NVIDIA)
22645 +#include <litmus/gpu_affinity.h>
22646 +#endif
22647 +
22648 +/* Overview of GSN-EDF operations.
22649 + *
22650 + * For a detailed explanation of GSN-EDF have a look at the FMLP paper. This
22651 + * description only covers how the individual operations are implemented in
22652 + * LITMUS.
22653 + *
22654 + * link_task_to_cpu(T, cpu) 	- Low-level operation to update the linkage
22655 + *                                structure (NOT the actually scheduled
22656 + *                                task). If there is another linked task To
22657 + *                                already it will set To->linked_on = NO_CPU
22658 + *                                (thereby removing its association with this
22659 + *                                CPU). However, it will not requeue the
22660 + *                                previously linked task (if any). It will set
22661 + *                                T's state to RT_F_RUNNING and check whether
22662 + *                                it is already running somewhere else. If T
22663 + *                                is scheduled somewhere else it will link
22664 + *                                it to that CPU instead (and pull the linked
22665 + *                                task to cpu). T may be NULL.
22666 + *
22667 + * unlink(T)			- Unlink removes T from all scheduler data
22668 + *                                structures. If it is linked to some CPU it
22669 + *                                will link NULL to that CPU. If it is
22670 + *                                currently queued in the gsnedf queue it will
22671 + *                                be removed from the rt_domain. It is safe to
22672 + *                                call unlink(T) if T is not linked. T may not
22673 + *                                be NULL.
22674 + *
22675 + * requeue(T)			- Requeue will insert T into the appropriate
22676 + *                                queue. If the system is in real-time mode and
22677 + *                                the T is released already, it will go into the
22678 + *                                ready queue. If the system is not in
22679 + *                                real-time mode is T, then T will go into the
22680 + *                                release queue. If T's release time is in the
22681 + *                                future, it will go into the release
22682 + *                                queue. That means that T's release time/job
22683 + *                                no/etc. has to be updated before requeu(T) is
22684 + *                                called. It is not safe to call requeue(T)
22685 + *                                when T is already queued. T may not be NULL.
22686 + *
22687 + * gsnedf_job_arrival(T)	- This is the catch all function when T enters
22688 + *                                the system after either a suspension or at a
22689 + *                                job release. It will queue T (which means it
22690 + *                                is not safe to call gsnedf_job_arrival(T) if
22691 + *                                T is already queued) and then check whether a
22692 + *                                preemption is necessary. If a preemption is
22693 + *                                necessary it will update the linkage
22694 + *                                accordingly and cause scheduled to be called
22695 + *                                (either with an IPI or need_resched). It is
22696 + *                                safe to call gsnedf_job_arrival(T) if T's
22697 + *                                next job has not been actually released yet
22698 + *                                (releast time in the future). T will be put
22699 + *                                on the release queue in that case.
22700 + *
22701 + * job_completion(T)		- Take care of everything that needs to be done
22702 + *                                to prepare T for its next release and place
22703 + *                                it in the right queue with
22704 + *                                gsnedf_job_arrival().
22705 + *
22706 + *
22707 + * When we now that T is linked to CPU then link_task_to_cpu(NULL, CPU) is
22708 + * equivalent to unlink(T). Note that if you unlink a task from a CPU none of
22709 + * the functions will automatically propagate pending task from the ready queue
22710 + * to a linked task. This is the job of the calling function ( by means of
22711 + * __take_ready).
22712 + */
22713 +
22714 +
22715 +/* cpu_entry_t - maintain the linked and scheduled state
22716 + */
22717 +typedef struct  {
22718 +	int 			cpu;
22719 +	struct task_struct*	linked;		/* only RT tasks */
22720 +	struct task_struct*	scheduled;	/* only RT tasks */
22721 +	struct binheap_node hn;
22722 +} cpu_entry_t;
22723 +DEFINE_PER_CPU(cpu_entry_t, gsnedf_cpu_entries);
22724 +
22725 +cpu_entry_t* gsnedf_cpus[NR_CPUS];
22726 +
22727 +/* the cpus queue themselves according to priority in here */
22728 +static struct binheap_handle gsnedf_cpu_heap;
22729 +
22730 +static rt_domain_t gsnedf;
22731 +#define gsnedf_lock (gsnedf.ready_lock)
22732 +
22733 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
22734 +static raw_spinlock_t dgl_lock;
22735 +
22736 +static raw_spinlock_t* gsnedf_get_dgl_spinlock(struct task_struct *t)
22737 +{
22738 +	return(&dgl_lock);
22739 +}
22740 +#endif
22741 +
22742 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
22743 +struct tasklet_head
22744 +{
22745 +	struct tasklet_struct *head;
22746 +	struct tasklet_struct **tail;
22747 +};
22748 +
22749 +struct tasklet_head gsnedf_pending_tasklets;
22750 +#endif
22751 +
22752 +
22753 +/* Uncomment this if you want to see all scheduling decisions in the
22754 + * TRACE() log.
22755 +#define WANT_ALL_SCHED_EVENTS
22756 + */
22757 +
22758 +static int cpu_lower_prio(struct binheap_node *_a, struct binheap_node *_b)
22759 +{
22760 +	cpu_entry_t *a = binheap_entry(_a, cpu_entry_t, hn);
22761 +	cpu_entry_t *b = binheap_entry(_b, cpu_entry_t, hn);
22762 +
22763 +	/* Note that a and b are inverted: we want the lowest-priority CPU at
22764 +	 * the top of the heap.
22765 +	 */
22766 +	return edf_higher_prio(b->linked, a->linked);
22767 +}
22768 +
22769 +
22770 +/* update_cpu_position - Move the cpu entry to the correct place to maintain
22771 + *                       order in the cpu queue. Caller must hold gsnedf lock.
22772 + */
22773 +static void update_cpu_position(cpu_entry_t *entry)
22774 +{
22775 +	if (likely(binheap_is_in_heap(&entry->hn))) {
22776 +		binheap_delete(&entry->hn, &gsnedf_cpu_heap);
22777 +	}
22778 +	binheap_add(&entry->hn, &gsnedf_cpu_heap, cpu_entry_t, hn);
22779 +}
22780 +
22781 +/* caller must hold gsnedf lock */
22782 +static cpu_entry_t* lowest_prio_cpu(void)
22783 +{
22784 +	return binheap_top_entry(&gsnedf_cpu_heap, cpu_entry_t, hn);
22785 +}
22786 +
22787 +
22788 +/* link_task_to_cpu - Update the link of a CPU.
22789 + *                    Handles the case where the to-be-linked task is already
22790 + *                    scheduled on a different CPU.
22791 + */
22792 +static noinline void link_task_to_cpu(struct task_struct* linked,
22793 +				      cpu_entry_t *entry)
22794 +{
22795 +	cpu_entry_t *sched;
22796 +	struct task_struct* tmp;
22797 +	int on_cpu;
22798 +
22799 +	BUG_ON(linked && !is_realtime(linked));
22800 +
22801 +	/* Currently linked task is set to be unlinked. */
22802 +	if (entry->linked) {
22803 +		entry->linked->rt_param.linked_on = NO_CPU;
22804 +	}
22805 +
22806 +	/* Link new task to CPU. */
22807 +	if (linked) {
22808 +		set_rt_flags(linked, RT_F_RUNNING);
22809 +		/* handle task is already scheduled somewhere! */
22810 +		on_cpu = linked->rt_param.scheduled_on;
22811 +		if (on_cpu != NO_CPU) {
22812 +			sched = &per_cpu(gsnedf_cpu_entries, on_cpu);
22813 +			/* this should only happen if not linked already */
22814 +			BUG_ON(sched->linked == linked);
22815 +
22816 +			/* If we are already scheduled on the CPU to which we
22817 +			 * wanted to link, we don't need to do the swap --
22818 +			 * we just link ourselves to the CPU and depend on
22819 +			 * the caller to get things right.
22820 +			 */
22821 +			if (entry != sched) {
22822 +				TRACE_TASK(linked,
22823 +					   "already scheduled on %d, updating link.\n",
22824 +					   sched->cpu);
22825 +				tmp = sched->linked;
22826 +				linked->rt_param.linked_on = sched->cpu;
22827 +				sched->linked = linked;
22828 +				update_cpu_position(sched);
22829 +				linked = tmp;
22830 +			}
22831 +		}
22832 +		if (linked) /* might be NULL due to swap */
22833 +			linked->rt_param.linked_on = entry->cpu;
22834 +	}
22835 +	entry->linked = linked;
22836 +#ifdef WANT_ALL_SCHED_EVENTS
22837 +	if (linked)
22838 +		TRACE_TASK(linked, "linked to %d.\n", entry->cpu);
22839 +	else
22840 +		TRACE("NULL linked to %d.\n", entry->cpu);
22841 +#endif
22842 +	update_cpu_position(entry);
22843 +}
22844 +
22845 +/* unlink - Make sure a task is not linked any longer to an entry
22846 + *          where it was linked before. Must hold gsnedf_lock.
22847 + */
22848 +static noinline void unlink(struct task_struct* t)
22849 +{
22850 +    	cpu_entry_t *entry;
22851 +
22852 +	if (t->rt_param.linked_on != NO_CPU) {
22853 +		/* unlink */
22854 +		entry = &per_cpu(gsnedf_cpu_entries, t->rt_param.linked_on);
22855 +		t->rt_param.linked_on = NO_CPU;
22856 +		link_task_to_cpu(NULL, entry);
22857 +	} else if (is_queued(t)) {
22858 +		/* This is an interesting situation: t is scheduled,
22859 +		 * but was just recently unlinked.  It cannot be
22860 +		 * linked anywhere else (because then it would have
22861 +		 * been relinked to this CPU), thus it must be in some
22862 +		 * queue. We must remove it from the list in this
22863 +		 * case.
22864 +		 */
22865 +		remove(&gsnedf, t);
22866 +	}
22867 +}
22868 +
22869 +
22870 +/* preempt - force a CPU to reschedule
22871 + */
22872 +static void preempt(cpu_entry_t *entry)
22873 +{
22874 +	preempt_if_preemptable(entry->scheduled, entry->cpu);
22875 +}
22876 +
22877 +/* requeue - Put an unlinked task into gsn-edf domain.
22878 + *           Caller must hold gsnedf_lock.
22879 + */
22880 +static noinline void requeue(struct task_struct* task)
22881 +{
22882 +	BUG_ON(!task);
22883 +	/* sanity check before insertion */
22884 +	BUG_ON(is_queued(task));
22885 +
22886 +	if (is_released(task, litmus_clock()))
22887 +		__add_ready(&gsnedf, task);
22888 +	else {
22889 +		/* it has got to wait */
22890 +		add_release(&gsnedf, task);
22891 +	}
22892 +}
22893 +
22894 +#ifdef CONFIG_SCHED_CPU_AFFINITY
22895 +static cpu_entry_t* gsnedf_get_nearest_available_cpu(cpu_entry_t *start)
22896 +{
22897 +	cpu_entry_t *affinity;
22898 +
22899 +	get_nearest_available_cpu(affinity, start, gsnedf_cpu_entries,
22900 +#ifdef CONFIG_RELEASE_MASTER
22901 +			gsnedf.release_master
22902 +#else
22903 +			NO_CPU
22904 +#endif
22905 +			);
22906 +
22907 +	return(affinity);
22908 +}
22909 +#endif
22910 +
22911 +/* check for any necessary preemptions */
22912 +static void check_for_preemptions(void)
22913 +{
22914 +	struct task_struct *task;
22915 +	cpu_entry_t *last;
22916 +
22917 +	for (last = lowest_prio_cpu();
22918 +	     edf_preemption_needed(&gsnedf, last->linked);
22919 +	     last = lowest_prio_cpu()) {
22920 +		/* preemption necessary */
22921 +		task = __take_ready(&gsnedf);
22922 +		TRACE("check_for_preemptions: attempting to link task %d to %d\n",
22923 +		      task->pid, last->cpu);
22924 +
22925 +#ifdef CONFIG_SCHED_CPU_AFFINITY
22926 +		{
22927 +			cpu_entry_t *affinity =
22928 +					gsnedf_get_nearest_available_cpu(
22929 +						&per_cpu(gsnedf_cpu_entries, task_cpu(task)));
22930 +			if (affinity)
22931 +				last = affinity;
22932 +			else if (last->linked)
22933 +				requeue(last->linked);
22934 +		}
22935 +#else
22936 +		if (last->linked)
22937 +			requeue(last->linked);
22938 +#endif
22939 +
22940 +		link_task_to_cpu(task, last);
22941 +		preempt(last);
22942 +	}
22943 +}
22944 +
22945 +/* gsnedf_job_arrival: task is either resumed or released */
22946 +static noinline void gsnedf_job_arrival(struct task_struct* task)
22947 +{
22948 +	BUG_ON(!task);
22949 +
22950 +	requeue(task);
22951 +	check_for_preemptions();
22952 +}
22953 +
22954 +static void gsnedf_release_jobs(rt_domain_t* rt, struct bheap* tasks)
22955 +{
22956 +	unsigned long flags;
22957 +
22958 +	raw_spin_lock_irqsave(&gsnedf_lock, flags);
22959 +
22960 +	__merge_ready(rt, tasks);
22961 +	check_for_preemptions();
22962 +
22963 +	raw_spin_unlock_irqrestore(&gsnedf_lock, flags);
22964 +}
22965 +
22966 +/* caller holds gsnedf_lock */
22967 +static noinline void job_completion(struct task_struct *t, int forced)
22968 +{
22969 +	BUG_ON(!t);
22970 +
22971 +	sched_trace_task_completion(t, forced);
22972 +
22973 +#ifdef CONFIG_LITMUS_NVIDIA
22974 +	atomic_set(&tsk_rt(t)->nv_int_count, 0);
22975 +#endif
22976 +
22977 +	TRACE_TASK(t, "job_completion().\n");
22978 +
22979 +	/* set flags */
22980 +	set_rt_flags(t, RT_F_SLEEP);
22981 +	/* prepare for next period */
22982 +	prepare_for_next_period(t);
22983 +	if (is_released(t, litmus_clock()))
22984 +		sched_trace_task_release(t);
22985 +	/* unlink */
22986 +	unlink(t);
22987 +	/* requeue
22988 +	 * But don't requeue a blocking task. */
22989 +	if (is_running(t))
22990 +		gsnedf_job_arrival(t);
22991 +}
22992 +
22993 +/* gsnedf_tick - this function is called for every local timer
22994 + *                         interrupt.
22995 + *
22996 + *                   checks whether the current task has expired and checks
22997 + *                   whether we need to preempt it if it has not expired
22998 + */
22999 +static void gsnedf_tick(struct task_struct* t)
23000 +{
23001 +	if (is_realtime(t) && budget_enforced(t) && budget_exhausted(t)) {
23002 +		if (!is_np(t)) {
23003 +			/* np tasks will be preempted when they become
23004 +			 * preemptable again
23005 +			 */
23006 +			litmus_reschedule_local();
23007 +			TRACE("gsnedf_scheduler_tick: "
23008 +			      "%d is preemptable "
23009 +			      " => FORCE_RESCHED\n", t->pid);
23010 +		} else if (is_user_np(t)) {
23011 +			TRACE("gsnedf_scheduler_tick: "
23012 +			      "%d is non-preemptable, "
23013 +			      "preemption delayed.\n", t->pid);
23014 +			request_exit_np(t);
23015 +		}
23016 +	}
23017 +}
23018 +
23019 +
23020 +
23021 +
23022 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
23023 +
23024 +
23025 +static void __do_lit_tasklet(struct tasklet_struct* tasklet, unsigned long flushed)
23026 +{
23027 +	if (!atomic_read(&tasklet->count)) {
23028 +		if(tasklet->owner) {
23029 +			sched_trace_tasklet_begin(tasklet->owner);
23030 +		}
23031 +
23032 +		if (!test_and_clear_bit(TASKLET_STATE_SCHED, &tasklet->state))
23033 +		{
23034 +			BUG();
23035 +		}
23036 +		TRACE("%s: Invoking tasklet with owner pid = %d (flushed = %d).\n",
23037 +			  __FUNCTION__,
23038 +			  (tasklet->owner) ? tasklet->owner->pid : -1,
23039 +			  (tasklet->owner) ? 0 : 1);
23040 +		tasklet->func(tasklet->data);
23041 +		tasklet_unlock(tasklet);
23042 +
23043 +		if(tasklet->owner) {
23044 +			sched_trace_tasklet_end(tasklet->owner, flushed);
23045 +		}
23046 +	}
23047 +	else {
23048 +		BUG();
23049 +	}
23050 +}
23051 +
23052 +static void do_lit_tasklets(struct task_struct* sched_task)
23053 +{
23054 +	int work_to_do = 1;
23055 +	struct tasklet_struct *tasklet = NULL;
23056 +	unsigned long flags;
23057 +
23058 +	while(work_to_do) {
23059 +
23060 +		TS_NV_SCHED_BOTISR_START;
23061 +
23062 +		// execute one tasklet that has higher priority
23063 +		raw_spin_lock_irqsave(&gsnedf_lock, flags);
23064 +
23065 +		if(gsnedf_pending_tasklets.head != NULL) {
23066 +			struct tasklet_struct *prev = NULL;
23067 +			tasklet = gsnedf_pending_tasklets.head;
23068 +
23069 +			while(tasklet && edf_higher_prio(sched_task, tasklet->owner)) {
23070 +				prev = tasklet;
23071 +				tasklet = tasklet->next;
23072 +			}
23073 +
23074 +			// remove the tasklet from the queue
23075 +			if(prev) {
23076 +				prev->next = tasklet->next;
23077 +				if(prev->next == NULL) {
23078 +					TRACE("%s: Tasklet for %d is the last element in tasklet queue.\n", __FUNCTION__, tasklet->owner->pid);
23079 +					gsnedf_pending_tasklets.tail = &(prev);
23080 +				}
23081 +			}
23082 +			else {
23083 +				gsnedf_pending_tasklets.head = tasklet->next;
23084 +				if(tasklet->next == NULL) {
23085 +					TRACE("%s: Tasklet for %d is the last element in tasklet queue.\n", __FUNCTION__, tasklet->owner->pid);
23086 +					gsnedf_pending_tasklets.tail = &(gsnedf_pending_tasklets.head);
23087 +				}
23088 +			}
23089 +		}
23090 +		else {
23091 +			TRACE("%s: Tasklet queue is empty.\n", __FUNCTION__);
23092 +		}
23093 +
23094 +		raw_spin_unlock_irqrestore(&gsnedf_lock, flags);
23095 +
23096 +		if(tasklet) {
23097 +			__do_lit_tasklet(tasklet, 0ul);
23098 +			tasklet = NULL;
23099 +		}
23100 +		else {
23101 +			work_to_do = 0;
23102 +		}
23103 +
23104 +		TS_NV_SCHED_BOTISR_END;
23105 +	}
23106 +}
23107 +
23108 +//static void do_lit_tasklets(struct task_struct* sched_task)
23109 +//{
23110 +//	int work_to_do = 1;
23111 +//	struct tasklet_struct *tasklet = NULL;
23112 +//	//struct tasklet_struct *step;
23113 +//	unsigned long flags;
23114 +//
23115 +//	while(work_to_do) {
23116 +//
23117 +//		TS_NV_SCHED_BOTISR_START;
23118 +//
23119 +//		// remove tasklet at head of list if it has higher priority.
23120 +//		raw_spin_lock_irqsave(&gsnedf_lock, flags);
23121 +//
23122 +//		if(gsnedf_pending_tasklets.head != NULL) {
23123 +//			// remove tasklet at head.
23124 +//			tasklet = gsnedf_pending_tasklets.head;
23125 +//
23126 +//			if(edf_higher_prio(tasklet->owner, sched_task)) {
23127 +//
23128 +//				if(NULL == tasklet->next) {
23129 +//					// tasklet is at the head, list only has one element
23130 +//					TRACE("%s: Tasklet for %d is the last element in tasklet queue.\n", __FUNCTION__, tasklet->owner->pid);
23131 +//					gsnedf_pending_tasklets.tail = &(gsnedf_pending_tasklets.head);
23132 +//				}
23133 +//
23134 +//				// remove the tasklet from the queue
23135 +//				gsnedf_pending_tasklets.head = tasklet->next;
23136 +//
23137 +//				TRACE("%s: Removed tasklet for %d from tasklet queue.\n", __FUNCTION__, tasklet->owner->pid);
23138 +//			}
23139 +//			else {
23140 +//				TRACE("%s: Pending tasklet (%d) does not have priority to run on this CPU (%d).\n", __FUNCTION__, tasklet->owner->pid, smp_processor_id());
23141 +//				tasklet = NULL;
23142 +//			}
23143 +//		}
23144 +//		else {
23145 +//			TRACE("%s: Tasklet queue is empty.\n", __FUNCTION__);
23146 +//		}
23147 +//
23148 +//		raw_spin_unlock_irqrestore(&gsnedf_lock, flags);
23149 +//
23150 +//		TS_NV_SCHED_BOTISR_END;
23151 +//
23152 +//		if(tasklet) {
23153 +//			__do_lit_tasklet(tasklet, 0ul);
23154 +//			tasklet = NULL;
23155 +//		}
23156 +//		else {
23157 +//			work_to_do = 0;
23158 +//		}
23159 +//	}
23160 +//
23161 +//	//TRACE("%s: exited.\n", __FUNCTION__);
23162 +//}
23163 +
23164 +static void __add_pai_tasklet(struct tasklet_struct* tasklet)
23165 +{
23166 +	struct tasklet_struct* step;
23167 +
23168 +	tasklet->next = NULL;  // make sure there are no old values floating around
23169 +
23170 +	step = gsnedf_pending_tasklets.head;
23171 +	if(step == NULL) {
23172 +		TRACE("%s: tasklet queue empty.  inserting tasklet for %d at head.\n", __FUNCTION__, tasklet->owner->pid);
23173 +		// insert at tail.
23174 +		*(gsnedf_pending_tasklets.tail) = tasklet;
23175 +		gsnedf_pending_tasklets.tail = &(tasklet->next);
23176 +	}
23177 +	else if((*(gsnedf_pending_tasklets.tail) != NULL) &&
23178 +			edf_higher_prio((*(gsnedf_pending_tasklets.tail))->owner, tasklet->owner)) {
23179 +		// insert at tail.
23180 +		TRACE("%s: tasklet belongs at end.  inserting tasklet for %d at tail.\n", __FUNCTION__, tasklet->owner->pid);
23181 +
23182 +		*(gsnedf_pending_tasklets.tail) = tasklet;
23183 +		gsnedf_pending_tasklets.tail = &(tasklet->next);
23184 +	}
23185 +	else {
23186 +		// insert the tasklet somewhere in the middle.
23187 +
23188 +        TRACE("%s: tasklet belongs somewhere in the middle.\n", __FUNCTION__);
23189 +
23190 +		while(step->next && edf_higher_prio(step->next->owner, tasklet->owner)) {
23191 +			step = step->next;
23192 +		}
23193 +
23194 +		// insert tasklet right before step->next.
23195 +
23196 +		TRACE("%s: inserting tasklet for %d between %d and %d.\n", __FUNCTION__, tasklet->owner->pid, step->owner->pid, (step->next) ? step->next->owner->pid : -1);
23197 +
23198 +		tasklet->next = step->next;
23199 +		step->next = tasklet;
23200 +
23201 +		// patch up the head if needed.
23202 +		if(gsnedf_pending_tasklets.head == step)
23203 +		{
23204 +			TRACE("%s: %d is the new tasklet queue head.\n", __FUNCTION__, tasklet->owner->pid);
23205 +			gsnedf_pending_tasklets.head = tasklet;
23206 +		}
23207 +	}
23208 +}
23209 +
23210 +static void gsnedf_run_tasklets(struct task_struct* sched_task)
23211 +{
23212 +	preempt_disable();
23213 +
23214 +	if(gsnedf_pending_tasklets.head != NULL) {
23215 +		TRACE("%s: There are tasklets to process.\n", __FUNCTION__);
23216 +		do_lit_tasklets(sched_task);
23217 +	}
23218 +
23219 +	preempt_enable_no_resched();
23220 +}
23221 +
23222 +static int gsnedf_enqueue_pai_tasklet(struct tasklet_struct* tasklet)
23223 +{
23224 +	cpu_entry_t *targetCPU = NULL;
23225 +	int thisCPU;
23226 +	int runLocal = 0;
23227 +	int runNow = 0;
23228 +	unsigned long flags;
23229 +
23230 +    if(unlikely((tasklet->owner == NULL) || !is_realtime(tasklet->owner)))
23231 +    {
23232 +        TRACE("%s: No owner associated with this tasklet!\n", __FUNCTION__);
23233 +		return 0;
23234 +    }
23235 +
23236 +
23237 +	raw_spin_lock_irqsave(&gsnedf_lock, flags);
23238 +
23239 +	thisCPU = smp_processor_id();
23240 +
23241 +#ifdef CONFIG_SCHED_CPU_AFFINITY
23242 +	{
23243 +		cpu_entry_t* affinity = NULL;
23244 +
23245 +		// use this CPU if it is in our cluster and isn't running any RT work.
23246 +		if(
23247 +#ifdef CONFIG_RELEASE_MASTER
23248 +		   (thisCPU != gsnedf.release_master) &&
23249 +#endif
23250 +		   (__get_cpu_var(gsnedf_cpu_entries).linked == NULL)) {
23251 +			affinity = &(__get_cpu_var(gsnedf_cpu_entries));
23252 +		}
23253 +		else {
23254 +			// this CPU is busy or shouldn't run tasklet in this cluster.
23255 +			// look for available near by CPUs.
23256 +			// NOTE: Affinity towards owner and not this CPU.  Is this right?
23257 +			affinity =
23258 +				gsnedf_get_nearest_available_cpu(
23259 +					&per_cpu(gsnedf_cpu_entries, task_cpu(tasklet->owner)));
23260 +		}
23261 +
23262 +		targetCPU = affinity;
23263 +	}
23264 +#endif
23265 +
23266 +	if (targetCPU == NULL) {
23267 +		targetCPU = lowest_prio_cpu();
23268 +	}
23269 +
23270 +	if (edf_higher_prio(tasklet->owner, targetCPU->linked)) {
23271 +		if (thisCPU == targetCPU->cpu) {
23272 +			TRACE("%s: Run tasklet locally (and now).\n", __FUNCTION__);
23273 +			runLocal = 1;
23274 +			runNow = 1;
23275 +		}
23276 +		else {
23277 +			TRACE("%s: Run tasklet remotely (and now).\n", __FUNCTION__);
23278 +			runLocal = 0;
23279 +			runNow = 1;
23280 +		}
23281 +	}
23282 +	else {
23283 +		runLocal = 0;
23284 +		runNow = 0;
23285 +	}
23286 +
23287 +	if(!runLocal) {
23288 +		// enqueue the tasklet
23289 +		__add_pai_tasklet(tasklet);
23290 +	}
23291 +
23292 +	raw_spin_unlock_irqrestore(&gsnedf_lock, flags);
23293 +
23294 +
23295 +	if (runLocal /*&& runNow */) {  // runNow == 1 is implied
23296 +		TRACE("%s: Running tasklet on CPU where it was received.\n", __FUNCTION__);
23297 +		__do_lit_tasklet(tasklet, 0ul);
23298 +	}
23299 +	else if (runNow /*&& !runLocal */) {  // runLocal == 0 is implied
23300 +		TRACE("%s: Triggering CPU %d to run tasklet.\n", __FUNCTION__, targetCPU->cpu);
23301 +		preempt(targetCPU);  // need to be protected by cedf_lock?
23302 +	}
23303 +	else {
23304 +		TRACE("%s: Scheduling of tasklet was deferred.\n", __FUNCTION__);
23305 +	}
23306 +
23307 +	return(1); // success
23308 +}
23309 +
23310 +static void gsnedf_change_prio_pai_tasklet(struct task_struct *old_prio,
23311 +										   struct task_struct *new_prio)
23312 +{
23313 +	struct tasklet_struct* step;
23314 +	unsigned long flags;
23315 +
23316 +	if(gsnedf_pending_tasklets.head != NULL) {
23317 +		raw_spin_lock_irqsave(&gsnedf_lock, flags);
23318 +		for(step = gsnedf_pending_tasklets.head; step != NULL; step = step->next) {
23319 +			if(step->owner == old_prio) {
23320 +				TRACE("%s: Found tasklet to change: %d\n", __FUNCTION__, step->owner->pid);
23321 +				step->owner = new_prio;
23322 +			}
23323 +		}
23324 +		raw_spin_unlock_irqrestore(&gsnedf_lock, flags);
23325 +	}
23326 +}
23327 +
23328 +#endif  // end PAI
23329 +
23330 +
23331 +/* Getting schedule() right is a bit tricky. schedule() may not make any
23332 + * assumptions on the state of the current task since it may be called for a
23333 + * number of reasons. The reasons include a scheduler_tick() determined that it
23334 + * was necessary, because sys_exit_np() was called, because some Linux
23335 + * subsystem determined so, or even (in the worst case) because there is a bug
23336 + * hidden somewhere. Thus, we must take extreme care to determine what the
23337 + * current state is.
23338 + *
23339 + * The CPU could currently be scheduling a task (or not), be linked (or not).
23340 + *
23341 + * The following assertions for the scheduled task could hold:
23342 + *
23343 + *      - !is_running(scheduled)        // the job blocks
23344 + *	- scheduled->timeslice == 0	// the job completed (forcefully)
23345 + *	- get_rt_flag() == RT_F_SLEEP	// the job completed (by syscall)
23346 + * 	- linked != scheduled		// we need to reschedule (for any reason)
23347 + * 	- is_np(scheduled)		// rescheduling must be delayed,
23348 + *					   sys_exit_np must be requested
23349 + *
23350 + * Any of these can occur together.
23351 + */
23352 +static struct task_struct* gsnedf_schedule(struct task_struct * prev)
23353 +{
23354 +	cpu_entry_t* entry = &__get_cpu_var(gsnedf_cpu_entries);
23355 +	int out_of_time, sleep, preempt, np, exists, blocks;
23356 +	struct task_struct* next = NULL;
23357 +
23358 +#ifdef CONFIG_RELEASE_MASTER
23359 +	/* Bail out early if we are the release master.
23360 +	 * The release master never schedules any real-time tasks.
23361 +	 */
23362 +	if (unlikely(gsnedf.release_master == entry->cpu)) {
23363 +		sched_state_task_picked();
23364 +		return NULL;
23365 +	}
23366 +#endif
23367 +
23368 +	raw_spin_lock(&gsnedf_lock);
23369 +
23370 +	/* sanity checking */
23371 +	BUG_ON(entry->scheduled && entry->scheduled != prev);
23372 +	BUG_ON(entry->scheduled && !is_realtime(prev));
23373 +	BUG_ON(is_realtime(prev) && !entry->scheduled);
23374 +
23375 +	/* (0) Determine state */
23376 +	exists      = entry->scheduled != NULL;
23377 +	blocks      = exists && !is_running(entry->scheduled);
23378 +	out_of_time = exists &&
23379 +				  budget_enforced(entry->scheduled) &&
23380 +				  budget_exhausted(entry->scheduled);
23381 +	np 	    = exists && is_np(entry->scheduled);
23382 +	sleep	    = exists && get_rt_flags(entry->scheduled) == RT_F_SLEEP;
23383 +	preempt     = entry->scheduled != entry->linked;
23384 +
23385 +#ifdef WANT_ALL_SCHED_EVENTS
23386 +	TRACE_TASK(prev, "invoked gsnedf_schedule.\n");
23387 +#endif
23388 +
23389 +	/*
23390 +	if (exists)
23391 +		TRACE_TASK(prev,
23392 +			   "blocks:%d out_of_time:%d np:%d sleep:%d preempt:%d "
23393 +			   "state:%d sig:%d\n",
23394 +			   blocks, out_of_time, np, sleep, preempt,
23395 +			   prev->state, signal_pending(prev));
23396 +	 */
23397 +
23398 +	if (entry->linked && preempt)
23399 +		TRACE_TASK(prev, "will be preempted by %s/%d\n",
23400 +			   entry->linked->comm, entry->linked->pid);
23401 +
23402 +	/* If a task blocks we have no choice but to reschedule.
23403 +	 */
23404 +	if (blocks) {
23405 +		unlink(entry->scheduled);
23406 +	}
23407 +
23408 +#if defined(CONFIG_LITMUS_NVIDIA) && defined(CONFIG_LITMUS_AFFINITY_LOCKING)
23409 +	if(exists && is_realtime(entry->scheduled) && tsk_rt(entry->scheduled)->held_gpus) {
23410 +		if(!blocks || tsk_rt(entry->scheduled)->suspend_gpu_tracker_on_block) {
23411 +			stop_gpu_tracker(entry->scheduled);
23412 +		}
23413 +	}
23414 +#endif
23415 +
23416 +	/* Request a sys_exit_np() call if we would like to preempt but cannot.
23417 +	 * We need to make sure to update the link structure anyway in case
23418 +	 * that we are still linked. Multiple calls to request_exit_np() don't
23419 +	 * hurt.
23420 +	 */
23421 +	if (np && (out_of_time || preempt || sleep)) {
23422 +		unlink(entry->scheduled);
23423 +		request_exit_np(entry->scheduled);
23424 +	}
23425 +
23426 +	/* Any task that is preemptable and either exhausts its execution
23427 +	 * budget or wants to sleep completes. We may have to reschedule after
23428 +	 * this. Don't do a job completion if we block (can't have timers running
23429 +	 * for blocked jobs). Preemption go first for the same reason.
23430 +	 */
23431 +	if (!np && (out_of_time || sleep) && !blocks && !preempt)
23432 +		job_completion(entry->scheduled, !sleep);
23433 +
23434 +	/* Link pending task if we became unlinked.
23435 +	 */
23436 +	if (!entry->linked)
23437 +		link_task_to_cpu(__take_ready(&gsnedf), entry);
23438 +
23439 +	/* The final scheduling decision. Do we need to switch for some reason?
23440 +	 * If linked is different from scheduled, then select linked as next.
23441 +	 */
23442 +	if ((!np || blocks) &&
23443 +	    entry->linked != entry->scheduled) {
23444 +		/* Schedule a linked job? */
23445 +		if (entry->linked) {
23446 +			entry->linked->rt_param.scheduled_on = entry->cpu;
23447 +			next = entry->linked;
23448 +			TRACE_TASK(next, "scheduled_on = P%d\n", smp_processor_id());
23449 +		}
23450 +		if (entry->scheduled) {
23451 +			/* not gonna be scheduled soon */
23452 +			entry->scheduled->rt_param.scheduled_on = NO_CPU;
23453 +			TRACE_TASK(entry->scheduled, "scheduled_on = NO_CPU\n");
23454 +		}
23455 +	}
23456 +	else
23457 +	{
23458 +		/* Only override Linux scheduler if we have a real-time task
23459 +		 * scheduled that needs to continue.
23460 +		 */
23461 +		if (exists)
23462 +			next = prev;
23463 +	}
23464 +
23465 +	sched_state_task_picked();
23466 +
23467 +	raw_spin_unlock(&gsnedf_lock);
23468 +
23469 +#ifdef WANT_ALL_SCHED_EVENTS
23470 +	TRACE("gsnedf_lock released, next=0x%p\n", next);
23471 +
23472 +	if (next)
23473 +		TRACE_TASK(next, "scheduled at %llu\n", litmus_clock());
23474 +	else if (exists && !next)
23475 +		TRACE("becomes idle at %llu.\n", litmus_clock());
23476 +#endif
23477 +
23478 +
23479 +	return next;
23480 +}
23481 +
23482 +
23483 +/* _finish_switch - we just finished the switch away from prev
23484 + */
23485 +static void gsnedf_finish_switch(struct task_struct *prev)
23486 +{
23487 +	cpu_entry_t* 	entry = &__get_cpu_var(gsnedf_cpu_entries);
23488 +
23489 +	entry->scheduled = is_realtime(current) ? current : NULL;
23490 +
23491 +#ifdef WANT_ALL_SCHED_EVENTS
23492 +	TRACE_TASK(prev, "switched away from\n");
23493 +#endif
23494 +}
23495 +
23496 +
23497 +/*	Prepare a task for running in RT mode
23498 + */
23499 +static void gsnedf_task_new(struct task_struct * t, int on_rq, int running)
23500 +{
23501 +	unsigned long 		flags;
23502 +	cpu_entry_t* 		entry;
23503 +
23504 +	TRACE("gsn edf: task new %d\n", t->pid);
23505 +
23506 +	raw_spin_lock_irqsave(&gsnedf_lock, flags);
23507 +
23508 +	/* setup job params */
23509 +	release_at(t, litmus_clock());
23510 +
23511 +	if (running) {
23512 +		entry = &per_cpu(gsnedf_cpu_entries, task_cpu(t));
23513 +		BUG_ON(entry->scheduled);
23514 +
23515 +#ifdef CONFIG_RELEASE_MASTER
23516 +		if (entry->cpu != gsnedf.release_master) {
23517 +#endif
23518 +			entry->scheduled = t;
23519 +			tsk_rt(t)->scheduled_on = task_cpu(t);
23520 +#ifdef CONFIG_RELEASE_MASTER
23521 +		} else {
23522 +			/* do not schedule on release master */
23523 +			preempt(entry); /* force resched */
23524 +			tsk_rt(t)->scheduled_on = NO_CPU;
23525 +		}
23526 +#endif
23527 +	} else {
23528 +		t->rt_param.scheduled_on = NO_CPU;
23529 +	}
23530 +	t->rt_param.linked_on          = NO_CPU;
23531 +
23532 +	gsnedf_job_arrival(t);
23533 +	raw_spin_unlock_irqrestore(&gsnedf_lock, flags);
23534 +}
23535 +
23536 +static void gsnedf_task_wake_up(struct task_struct *task)
23537 +{
23538 +	unsigned long flags;
23539 +	//lt_t now;
23540 +
23541 +	TRACE_TASK(task, "wake_up at %llu\n", litmus_clock());
23542 +
23543 +	raw_spin_lock_irqsave(&gsnedf_lock, flags);
23544 +
23545 +
23546 +#if 0  // sporadic task model
23547 +	/* We need to take suspensions because of semaphores into
23548 +	 * account! If a job resumes after being suspended due to acquiring
23549 +	 * a semaphore, it should never be treated as a new job release.
23550 +	 */
23551 +	if (get_rt_flags(task) == RT_F_EXIT_SEM) {
23552 +		set_rt_flags(task, RT_F_RUNNING);
23553 +	} else {
23554 +		now = litmus_clock();
23555 +		if (is_tardy(task, now)) {
23556 +			/* new sporadic release */
23557 +			release_at(task, now);
23558 +			sched_trace_task_release(task);
23559 +		}
23560 +		else {
23561 +			if (task->rt.time_slice) {
23562 +				/* came back in time before deadline
23563 +				*/
23564 +				set_rt_flags(task, RT_F_RUNNING);
23565 +			}
23566 +		}
23567 +	}
23568 +#else  // periodic task model
23569 +	set_rt_flags(task, RT_F_RUNNING);
23570 +#endif
23571 +
23572 +	gsnedf_job_arrival(task);
23573 +	raw_spin_unlock_irqrestore(&gsnedf_lock, flags);
23574 +}
23575 +
23576 +static void gsnedf_task_block(struct task_struct *t)
23577 +{
23578 +	// TODO: is this called on preemption??
23579 +	unsigned long flags;
23580 +
23581 +	TRACE_TASK(t, "block at %llu\n", litmus_clock());
23582 +
23583 +	/* unlink if necessary */
23584 +	raw_spin_lock_irqsave(&gsnedf_lock, flags);
23585 +
23586 +	unlink(t);
23587 +
23588 +	raw_spin_unlock_irqrestore(&gsnedf_lock, flags);
23589 +
23590 +	BUG_ON(!is_realtime(t));
23591 +}
23592 +
23593 +
23594 +static void gsnedf_task_exit(struct task_struct * t)
23595 +{
23596 +	unsigned long flags;
23597 +
23598 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
23599 +	gsnedf_change_prio_pai_tasklet(t, NULL);
23600 +#endif
23601 +
23602 +	/* unlink if necessary */
23603 +	raw_spin_lock_irqsave(&gsnedf_lock, flags);
23604 +	unlink(t);
23605 +	if (tsk_rt(t)->scheduled_on != NO_CPU) {
23606 +		gsnedf_cpus[tsk_rt(t)->scheduled_on]->scheduled = NULL;
23607 +		tsk_rt(t)->scheduled_on = NO_CPU;
23608 +	}
23609 +	raw_spin_unlock_irqrestore(&gsnedf_lock, flags);
23610 +
23611 +	BUG_ON(!is_realtime(t));
23612 +        TRACE_TASK(t, "RIP\n");
23613 +}
23614 +
23615 +
23616 +static long gsnedf_admit_task(struct task_struct* tsk)
23617 +{
23618 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
23619 +	INIT_BINHEAP_HANDLE(&tsk_rt(tsk)->hp_blocked_tasks,
23620 +						edf_max_heap_base_priority_order);
23621 +#endif
23622 +
23623 +	return 0;
23624 +}
23625 +
23626 +
23627 +
23628 +
23629 +
23630 +
23631 +#ifdef CONFIG_LITMUS_LOCKING
23632 +
23633 +#include <litmus/fdso.h>
23634 +
23635 +/* called with IRQs off */
23636 +static void __increase_priority_inheritance(struct task_struct* t,
23637 +										    struct task_struct* prio_inh)
23638 +{
23639 +	int linked_on;
23640 +	int check_preempt = 0;
23641 +
23642 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
23643 +	/* this sanity check allows for weaker locking in protocols */
23644 +	/* TODO (klitirqd): Skip this check if 't' is a proxy thread (???) */
23645 +	if(__edf_higher_prio(prio_inh, BASE, t, EFFECTIVE)) {
23646 +#endif
23647 +		TRACE_TASK(t, "inherits priority from %s/%d\n",
23648 +				   prio_inh->comm, prio_inh->pid);
23649 +		tsk_rt(t)->inh_task = prio_inh;
23650 +
23651 +		linked_on  = tsk_rt(t)->linked_on;
23652 +
23653 +		/* If it is scheduled, then we need to reorder the CPU heap. */
23654 +		if (linked_on != NO_CPU) {
23655 +			TRACE_TASK(t, "%s: linked  on %d\n",
23656 +				   __FUNCTION__, linked_on);
23657 +			/* Holder is scheduled; need to re-order CPUs.
23658 +			 * We can't use heap_decrease() here since
23659 +			 * the cpu_heap is ordered in reverse direction, so
23660 +			 * it is actually an increase. */
23661 +			binheap_delete(&gsnedf_cpus[linked_on]->hn, &gsnedf_cpu_heap);
23662 +			binheap_add(&gsnedf_cpus[linked_on]->hn,
23663 +					&gsnedf_cpu_heap, cpu_entry_t, hn);
23664 +		} else {
23665 +			/* holder may be queued: first stop queue changes */
23666 +			raw_spin_lock(&gsnedf.release_lock);
23667 +			if (is_queued(t)) {
23668 +				TRACE_TASK(t, "%s: is queued\n",
23669 +					   __FUNCTION__);
23670 +				/* We need to update the position of holder in some
23671 +				 * heap. Note that this could be a release heap if we
23672 +				 * budget enforcement is used and this job overran. */
23673 +				check_preempt =
23674 +					!bheap_decrease(edf_ready_order,
23675 +							   tsk_rt(t)->heap_node);
23676 +			} else {
23677 +				/* Nothing to do: if it is not queued and not linked
23678 +				 * then it is either sleeping or currently being moved
23679 +				 * by other code (e.g., a timer interrupt handler) that
23680 +				 * will use the correct priority when enqueuing the
23681 +				 * task. */
23682 +				TRACE_TASK(t, "%s: is NOT queued => Done.\n",
23683 +					   __FUNCTION__);
23684 +			}
23685 +			raw_spin_unlock(&gsnedf.release_lock);
23686 +
23687 +			/* If holder was enqueued in a release heap, then the following
23688 +			 * preemption check is pointless, but we can't easily detect
23689 +			 * that case. If you want to fix this, then consider that
23690 +			 * simply adding a state flag requires O(n) time to update when
23691 +			 * releasing n tasks, which conflicts with the goal to have
23692 +			 * O(log n) merges. */
23693 +			if (check_preempt) {
23694 +				/* heap_decrease() hit the top level of the heap: make
23695 +				 * sure preemption checks get the right task, not the
23696 +				 * potentially stale cache. */
23697 +				bheap_uncache_min(edf_ready_order,
23698 +						 &gsnedf.ready_queue);
23699 +				check_for_preemptions();
23700 +			}
23701 +		}
23702 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
23703 +	}
23704 +	else {
23705 +		TRACE_TASK(t, "Spurious invalid priority increase. "
23706 +				      "Inheritance request: %s/%d [eff_prio = %s/%d] to inherit from %s/%d\n"
23707 +					  "Occurance is likely okay: probably due to (hopefully safe) concurrent priority updates.\n",
23708 +				   t->comm, t->pid,
23709 +				   effective_priority(t)->comm, effective_priority(t)->pid,
23710 +				   (prio_inh) ? prio_inh->comm : "nil",
23711 +				   (prio_inh) ? prio_inh->pid : -1);
23712 +		WARN_ON(!prio_inh);
23713 +	}
23714 +#endif
23715 +}
23716 +
23717 +/* called with IRQs off */
23718 +static void increase_priority_inheritance(struct task_struct* t, struct task_struct* prio_inh)
23719 +{
23720 +	raw_spin_lock(&gsnedf_lock);
23721 +
23722 +	__increase_priority_inheritance(t, prio_inh);
23723 +
23724 +#ifdef CONFIG_LITMUS_SOFTIRQD
23725 +	if(tsk_rt(t)->cur_klitirqd != NULL)
23726 +	{
23727 +		TRACE_TASK(t, "%s/%d inherits a new priority!\n",
23728 +				tsk_rt(t)->cur_klitirqd->comm, tsk_rt(t)->cur_klitirqd->pid);
23729 +
23730 +		__increase_priority_inheritance(tsk_rt(t)->cur_klitirqd, prio_inh);
23731 +	}
23732 +#endif
23733 +
23734 +	raw_spin_unlock(&gsnedf_lock);
23735 +
23736 +#if defined(CONFIG_LITMUS_PAI_SOFTIRQD) && defined(CONFIG_LITMUS_NVIDIA)
23737 +	if(tsk_rt(t)->held_gpus) {
23738 +		int i;
23739 +		for(i = find_first_bit(&tsk_rt(t)->held_gpus, sizeof(tsk_rt(t)->held_gpus));
23740 +			i < NV_DEVICE_NUM;
23741 +			i = find_next_bit(&tsk_rt(t)->held_gpus, sizeof(tsk_rt(t)->held_gpus), i+1)) {
23742 +			pai_check_priority_increase(t, i);
23743 +		}
23744 +	}
23745 +#endif
23746 +}
23747 +
23748 +
23749 +/* called with IRQs off */
23750 +static void __decrease_priority_inheritance(struct task_struct* t,
23751 +											struct task_struct* prio_inh)
23752 +{
23753 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
23754 +	if(__edf_higher_prio(t, EFFECTIVE, prio_inh, BASE)) {
23755 +#endif
23756 +		/* A job only stops inheriting a priority when it releases a
23757 +		 * resource. Thus we can make the following assumption.*/
23758 +		if(prio_inh)
23759 +			TRACE_TASK(t, "EFFECTIVE priority decreased to %s/%d\n",
23760 +					   prio_inh->comm, prio_inh->pid);
23761 +		else
23762 +			TRACE_TASK(t, "base priority restored.\n");
23763 +
23764 +		tsk_rt(t)->inh_task = prio_inh;
23765 +
23766 +		if(tsk_rt(t)->scheduled_on != NO_CPU) {
23767 +			TRACE_TASK(t, "is scheduled.\n");
23768 +
23769 +			/* Check if rescheduling is necessary. We can't use heap_decrease()
23770 +			 * since the priority was effectively lowered. */
23771 +			unlink(t);
23772 +			gsnedf_job_arrival(t);
23773 +		}
23774 +		else {
23775 +			/* task is queued */
23776 +			raw_spin_lock(&gsnedf.release_lock);
23777 +			if (is_queued(t)) {
23778 +				TRACE_TASK(t, "is queued.\n");
23779 +
23780 +				/* decrease in priority, so we have to re-add to binomial heap */
23781 +				unlink(t);
23782 +				gsnedf_job_arrival(t);
23783 +			}
23784 +			else {
23785 +				TRACE_TASK(t, "is not in scheduler. Probably on wait queue somewhere.\n");
23786 +			}
23787 +			raw_spin_unlock(&gsnedf.release_lock);
23788 +		}
23789 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
23790 +	}
23791 +	else {
23792 +		TRACE_TASK(t, "Spurious invalid priority decrease. "
23793 +				   "Inheritance request: %s/%d [eff_prio = %s/%d] to inherit from %s/%d\n"
23794 +				   "Occurance is likely okay: probably due to (hopefully safe) concurrent priority updates.\n",
23795 +				   t->comm, t->pid,
23796 +				   effective_priority(t)->comm, effective_priority(t)->pid,
23797 +				   (prio_inh) ? prio_inh->comm : "nil",
23798 +				   (prio_inh) ? prio_inh->pid : -1);
23799 +	}
23800 +#endif
23801 +}
23802 +
23803 +static void decrease_priority_inheritance(struct task_struct* t,
23804 +										  struct task_struct* prio_inh)
23805 +{
23806 +	raw_spin_lock(&gsnedf_lock);
23807 +	__decrease_priority_inheritance(t, prio_inh);
23808 +
23809 +#ifdef CONFIG_LITMUS_SOFTIRQD
23810 +	if(tsk_rt(t)->cur_klitirqd != NULL)
23811 +	{
23812 +		TRACE_TASK(t, "%s/%d decreases in priority!\n",
23813 +				   tsk_rt(t)->cur_klitirqd->comm, tsk_rt(t)->cur_klitirqd->pid);
23814 +
23815 +		__decrease_priority_inheritance(tsk_rt(t)->cur_klitirqd, prio_inh);
23816 +	}
23817 +#endif
23818 +
23819 +	raw_spin_unlock(&gsnedf_lock);
23820 +
23821 +#if defined(CONFIG_LITMUS_PAI_SOFTIRQD) && defined(CONFIG_LITMUS_NVIDIA)
23822 +	if(tsk_rt(t)->held_gpus) {
23823 +		int i;
23824 +		for(i = find_first_bit(&tsk_rt(t)->held_gpus, sizeof(tsk_rt(t)->held_gpus));
23825 +			i < NV_DEVICE_NUM;
23826 +			i = find_next_bit(&tsk_rt(t)->held_gpus, sizeof(tsk_rt(t)->held_gpus), i+1)) {
23827 +			pai_check_priority_decrease(t, i);
23828 +		}
23829 +	}
23830 +#endif
23831 +}
23832 +
23833 +
23834 +#ifdef CONFIG_LITMUS_SOFTIRQD
23835 +/* called with IRQs off */
23836 +static void increase_priority_inheritance_klitirqd(struct task_struct* klitirqd,
23837 +											  struct task_struct* old_owner,
23838 +											  struct task_struct* new_owner)
23839 +{
23840 +	BUG_ON(!(tsk_rt(klitirqd)->is_proxy_thread));
23841 +
23842 +	raw_spin_lock(&gsnedf_lock);
23843 +
23844 +	if(old_owner != new_owner)
23845 +	{
23846 +		if(old_owner)
23847 +		{
23848 +			// unreachable?
23849 +			tsk_rt(old_owner)->cur_klitirqd = NULL;
23850 +		}
23851 +
23852 +		TRACE_TASK(klitirqd, "giving ownership to %s/%d.\n",
23853 +				   new_owner->comm, new_owner->pid);
23854 +
23855 +		tsk_rt(new_owner)->cur_klitirqd = klitirqd;
23856 +	}
23857 +
23858 +	__decrease_priority_inheritance(klitirqd, NULL);  // kludge to clear out cur prio.
23859 +
23860 +	__increase_priority_inheritance(klitirqd,
23861 +			(tsk_rt(new_owner)->inh_task == NULL) ?
23862 +				new_owner :
23863 +				tsk_rt(new_owner)->inh_task);
23864 +
23865 +	raw_spin_unlock(&gsnedf_lock);
23866 +}
23867 +
23868 +
23869 +/* called with IRQs off */
23870 +static void decrease_priority_inheritance_klitirqd(struct task_struct* klitirqd,
23871 +												   struct task_struct* old_owner,
23872 +												   struct task_struct* new_owner)
23873 +{
23874 +	BUG_ON(!(tsk_rt(klitirqd)->is_proxy_thread));
23875 +
23876 +	raw_spin_lock(&gsnedf_lock);
23877 +
23878 +    TRACE_TASK(klitirqd, "priority restored\n");
23879 +
23880 +	__decrease_priority_inheritance(klitirqd, new_owner);
23881 +
23882 +	tsk_rt(old_owner)->cur_klitirqd = NULL;
23883 +
23884 +	raw_spin_unlock(&gsnedf_lock);
23885 +}
23886 +#endif
23887 +
23888 +
23889 +
23890 +
23891 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
23892 +
23893 +/* called with IRQs off */
23894 +/* preconditions:
23895 + (1) The 'hp_blocked_tasks_lock' of task 't' is held.
23896 + (2) The lock 'to_unlock' is held.
23897 + */
23898 +static void nested_increase_priority_inheritance(struct task_struct* t,
23899 +												 struct task_struct* prio_inh,
23900 +												 raw_spinlock_t *to_unlock,
23901 +												 unsigned long irqflags)
23902 +{
23903 +	struct litmus_lock *blocked_lock = tsk_rt(t)->blocked_lock;
23904 +
23905 +	if(tsk_rt(t)->inh_task != prio_inh) { 		// shield redundent calls.
23906 +		increase_priority_inheritance(t, prio_inh);  // increase our prio.
23907 +	}
23908 +
23909 +	raw_spin_unlock(&tsk_rt(t)->hp_blocked_tasks_lock);  // unlock the t's heap.
23910 +
23911 +
23912 +	if(blocked_lock) {
23913 +		if(blocked_lock->ops->propagate_increase_inheritance) {
23914 +			TRACE_TASK(t, "Inheritor is blocked (...perhaps).  Checking lock %d.\n",
23915 +					   blocked_lock->ident);
23916 +
23917 +			// beware: recursion
23918 +			blocked_lock->ops->propagate_increase_inheritance(blocked_lock,
23919 +															  t, to_unlock,
23920 +															  irqflags);
23921 +		}
23922 +		else {
23923 +			TRACE_TASK(t, "Inheritor is blocked on lock (%d) that does not support nesting!\n",
23924 +					   blocked_lock->ident);
23925 +			unlock_fine_irqrestore(to_unlock, irqflags);
23926 +		}
23927 +	}
23928 +	else {
23929 +		TRACE_TASK(t, "is not blocked.  No propagation.\n");
23930 +		unlock_fine_irqrestore(to_unlock, irqflags);
23931 +	}
23932 +}
23933 +
23934 +/* called with IRQs off */
23935 +/* preconditions:
23936 + (1) The 'hp_blocked_tasks_lock' of task 't' is held.
23937 + (2) The lock 'to_unlock' is held.
23938 + */
23939 +static void nested_decrease_priority_inheritance(struct task_struct* t,
23940 +												 struct task_struct* prio_inh,
23941 +												 raw_spinlock_t *to_unlock,
23942 +												 unsigned long irqflags)
23943 +{
23944 +	struct litmus_lock *blocked_lock = tsk_rt(t)->blocked_lock;
23945 +	decrease_priority_inheritance(t, prio_inh);
23946 +
23947 +	raw_spin_unlock(&tsk_rt(t)->hp_blocked_tasks_lock);  // unlock the t's heap.
23948 +
23949 +	if(blocked_lock) {
23950 +		if(blocked_lock->ops->propagate_decrease_inheritance) {
23951 +			TRACE_TASK(t, "Inheritor is blocked (...perhaps).  Checking lock %d.\n",
23952 +					   blocked_lock->ident);
23953 +
23954 +			// beware: recursion
23955 +			blocked_lock->ops->propagate_decrease_inheritance(blocked_lock, t,
23956 +															  to_unlock,
23957 +															  irqflags);
23958 +		}
23959 +		else {
23960 +			TRACE_TASK(t, "Inheritor is blocked on lock (%p) that does not support nesting!\n",
23961 +					   blocked_lock);
23962 +			unlock_fine_irqrestore(to_unlock, irqflags);
23963 +		}
23964 +	}
23965 +	else {
23966 +		TRACE_TASK(t, "is not blocked.  No propagation.\n");
23967 +		unlock_fine_irqrestore(to_unlock, irqflags);
23968 +	}
23969 +}
23970 +
23971 +
23972 +/* ******************** RSM MUTEX ********************** */
23973 +
23974 +static struct litmus_lock_ops gsnedf_rsm_mutex_lock_ops = {
23975 +	.lock   = rsm_mutex_lock,
23976 +	.unlock = rsm_mutex_unlock,
23977 +	.close  = rsm_mutex_close,
23978 +	.deallocate = rsm_mutex_free,
23979 +
23980 +	.propagate_increase_inheritance = rsm_mutex_propagate_increase_inheritance,
23981 +	.propagate_decrease_inheritance = rsm_mutex_propagate_decrease_inheritance,
23982 +
23983 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
23984 +	.dgl_lock = rsm_mutex_dgl_lock,
23985 +	.is_owner = rsm_mutex_is_owner,
23986 +	.enable_priority = rsm_mutex_enable_priority,
23987 +#endif
23988 +};
23989 +
23990 +static struct litmus_lock* gsnedf_new_rsm_mutex(void)
23991 +{
23992 +	return rsm_mutex_new(&gsnedf_rsm_mutex_lock_ops);
23993 +}
23994 +
23995 +/* ******************** IKGLP ********************** */
23996 +
23997 +static struct litmus_lock_ops gsnedf_ikglp_lock_ops = {
23998 +	.lock   = ikglp_lock,
23999 +	.unlock = ikglp_unlock,
24000 +	.close  = ikglp_close,
24001 +	.deallocate = ikglp_free,
24002 +
24003 +	// ikglp can only be an outer-most lock.
24004 +	.propagate_increase_inheritance = NULL,
24005 +	.propagate_decrease_inheritance = NULL,
24006 +};
24007 +
24008 +static struct litmus_lock* gsnedf_new_ikglp(void* __user arg)
24009 +{
24010 +	return ikglp_new(num_online_cpus(), &gsnedf_ikglp_lock_ops, arg);
24011 +}
24012 +
24013 +#endif  /* CONFIG_LITMUS_NESTED_LOCKING */
24014 +
24015 +
24016 +/* ******************** KFMLP support ********************** */
24017 +
24018 +static struct litmus_lock_ops gsnedf_kfmlp_lock_ops = {
24019 +	.lock   = kfmlp_lock,
24020 +	.unlock = kfmlp_unlock,
24021 +	.close  = kfmlp_close,
24022 +	.deallocate = kfmlp_free,
24023 +
24024 +	// kfmlp can only be an outer-most lock.
24025 +	.propagate_increase_inheritance = NULL,
24026 +	.propagate_decrease_inheritance = NULL,
24027 +};
24028 +
24029 +
24030 +static struct litmus_lock* gsnedf_new_kfmlp(void* __user arg)
24031 +{
24032 +	return kfmlp_new(&gsnedf_kfmlp_lock_ops, arg);
24033 +}
24034 +
24035 +/* ******************** FMLP support ********************** */
24036 +
24037 +/* struct for semaphore with priority inheritance */
24038 +struct fmlp_semaphore {
24039 +	struct litmus_lock litmus_lock;
24040 +
24041 +	/* current resource holder */
24042 +	struct task_struct *owner;
24043 +
24044 +	/* highest-priority waiter */
24045 +	struct task_struct *hp_waiter;
24046 +
24047 +	/* FIFO queue of waiting tasks */
24048 +	wait_queue_head_t wait;
24049 +};
24050 +
24051 +static inline struct fmlp_semaphore* fmlp_from_lock(struct litmus_lock* lock)
24052 +{
24053 +	return container_of(lock, struct fmlp_semaphore, litmus_lock);
24054 +}
24055 +
24056 +/* caller is responsible for locking */
24057 +struct task_struct* find_hp_waiter(struct fmlp_semaphore *sem,
24058 +				   struct task_struct* skip)
24059 +{
24060 +	struct list_head	*pos;
24061 +	struct task_struct 	*queued, *found = NULL;
24062 +
24063 +	list_for_each(pos, &sem->wait.task_list) {
24064 +		queued  = (struct task_struct*) list_entry(pos, wait_queue_t,
24065 +							   task_list)->private;
24066 +
24067 +		/* Compare task prios, find high prio task. */
24068 +		if (queued != skip && edf_higher_prio(queued, found))
24069 +			found = queued;
24070 +	}
24071 +	return found;
24072 +}
24073 +
24074 +int gsnedf_fmlp_lock(struct litmus_lock* l)
24075 +{
24076 +	struct task_struct* t = current;
24077 +	struct fmlp_semaphore *sem = fmlp_from_lock(l);
24078 +	wait_queue_t wait;
24079 +	unsigned long flags;
24080 +
24081 +	if (!is_realtime(t))
24082 +		return -EPERM;
24083 +
24084 +	spin_lock_irqsave(&sem->wait.lock, flags);
24085 +
24086 +	if (sem->owner) {
24087 +		/* resource is not free => must suspend and wait */
24088 +
24089 +		init_waitqueue_entry(&wait, t);
24090 +
24091 +		/* FIXME: interruptible would be nice some day */
24092 +		set_task_state(t, TASK_UNINTERRUPTIBLE);
24093 +
24094 +		__add_wait_queue_tail_exclusive(&sem->wait, &wait);
24095 +
24096 +		/* check if we need to activate priority inheritance */
24097 +		if (edf_higher_prio(t, sem->hp_waiter)) {
24098 +			sem->hp_waiter = t;
24099 +			if (edf_higher_prio(t, sem->owner))
24100 +				increase_priority_inheritance(sem->owner, sem->hp_waiter);
24101 +		}
24102 +
24103 +		TS_LOCK_SUSPEND;
24104 +
24105 +		/* release lock before sleeping */
24106 +		spin_unlock_irqrestore(&sem->wait.lock, flags);
24107 +
24108 +		/* We depend on the FIFO order.  Thus, we don't need to recheck
24109 +		 * when we wake up; we are guaranteed to have the lock since
24110 +		 * there is only one wake up per release.
24111 +		 */
24112 +
24113 +		schedule();
24114 +
24115 +		TS_LOCK_RESUME;
24116 +
24117 +		/* Since we hold the lock, no other task will change
24118 +		 * ->owner. We can thus check it without acquiring the spin
24119 +		 * lock. */
24120 +		BUG_ON(sem->owner != t);
24121 +	} else {
24122 +		/* it's ours now */
24123 +		sem->owner = t;
24124 +
24125 +		spin_unlock_irqrestore(&sem->wait.lock, flags);
24126 +	}
24127 +
24128 +	return 0;
24129 +}
24130 +
24131 +int gsnedf_fmlp_unlock(struct litmus_lock* l)
24132 +{
24133 +	struct task_struct *t = current, *next;
24134 +	struct fmlp_semaphore *sem = fmlp_from_lock(l);
24135 +	unsigned long flags;
24136 +	int err = 0;
24137 +
24138 +	spin_lock_irqsave(&sem->wait.lock, flags);
24139 +
24140 +	if (sem->owner != t) {
24141 +		err = -EINVAL;
24142 +		goto out;
24143 +	}
24144 +
24145 +	/* check if there are jobs waiting for this resource */
24146 +	next = __waitqueue_remove_first(&sem->wait);
24147 +	if (next) {
24148 +		/* next becomes the resouce holder */
24149 +		sem->owner = next;
24150 +		TRACE_CUR("lock ownership passed to %s/%d\n", next->comm, next->pid);
24151 +
24152 +		/* determine new hp_waiter if necessary */
24153 +		if (next == sem->hp_waiter) {
24154 +			TRACE_TASK(next, "was highest-prio waiter\n");
24155 +			/* next has the highest priority --- it doesn't need to
24156 +			 * inherit.  However, we need to make sure that the
24157 +			 * next-highest priority in the queue is reflected in
24158 +			 * hp_waiter. */
24159 +			sem->hp_waiter = find_hp_waiter(sem, next);
24160 +			if (sem->hp_waiter)
24161 +				TRACE_TASK(sem->hp_waiter, "is new highest-prio waiter\n");
24162 +			else
24163 +				TRACE("no further waiters\n");
24164 +		} else {
24165 +			/* Well, if next is not the highest-priority waiter,
24166 +			 * then it ought to inherit the highest-priority
24167 +			 * waiter's priority. */
24168 +			increase_priority_inheritance(next, sem->hp_waiter);
24169 +		}
24170 +
24171 +		/* wake up next */
24172 +		wake_up_process(next);
24173 +	} else
24174 +		/* becomes available */
24175 +		sem->owner = NULL;
24176 +
24177 +	/* we lose the benefit of priority inheritance (if any) */
24178 +	if (tsk_rt(t)->inh_task)
24179 +		decrease_priority_inheritance(t, NULL);
24180 +
24181 +out:
24182 +	spin_unlock_irqrestore(&sem->wait.lock, flags);
24183 +
24184 +	return err;
24185 +}
24186 +
24187 +int gsnedf_fmlp_close(struct litmus_lock* l)
24188 +{
24189 +	struct task_struct *t = current;
24190 +	struct fmlp_semaphore *sem = fmlp_from_lock(l);
24191 +	unsigned long flags;
24192 +
24193 +	int owner;
24194 +
24195 +	spin_lock_irqsave(&sem->wait.lock, flags);
24196 +
24197 +	owner = sem->owner == t;
24198 +
24199 +	spin_unlock_irqrestore(&sem->wait.lock, flags);
24200 +
24201 +	if (owner)
24202 +		gsnedf_fmlp_unlock(l);
24203 +
24204 +	return 0;
24205 +}
24206 +
24207 +void gsnedf_fmlp_free(struct litmus_lock* lock)
24208 +{
24209 +	kfree(fmlp_from_lock(lock));
24210 +}
24211 +
24212 +static struct litmus_lock_ops gsnedf_fmlp_lock_ops = {
24213 +	.close  = gsnedf_fmlp_close,
24214 +	.lock   = gsnedf_fmlp_lock,
24215 +	.unlock = gsnedf_fmlp_unlock,
24216 +	.deallocate = gsnedf_fmlp_free,
24217 +
24218 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
24219 +	.propagate_increase_inheritance = NULL,
24220 +	.propagate_decrease_inheritance = NULL
24221 +#endif
24222 +};
24223 +
24224 +static struct litmus_lock* gsnedf_new_fmlp(void)
24225 +{
24226 +	struct fmlp_semaphore* sem;
24227 +
24228 +	sem = kmalloc(sizeof(*sem), GFP_KERNEL);
24229 +	if (!sem)
24230 +		return NULL;
24231 +
24232 +	sem->owner   = NULL;
24233 +	sem->hp_waiter = NULL;
24234 +	init_waitqueue_head(&sem->wait);
24235 +	sem->litmus_lock.ops = &gsnedf_fmlp_lock_ops;
24236 +
24237 +	return &sem->litmus_lock;
24238 +}
24239 +
24240 +
24241 +static long gsnedf_allocate_lock(struct litmus_lock **lock, int type,
24242 +				 void* __user args)
24243 +{
24244 +	int err;
24245 +
24246 +	switch (type) {
24247 +
24248 +	case FMLP_SEM:
24249 +		/* Flexible Multiprocessor Locking Protocol */
24250 +		*lock = gsnedf_new_fmlp();
24251 +		break;
24252 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
24253 +    case RSM_MUTEX:
24254 +		*lock = gsnedf_new_rsm_mutex();
24255 +		break;
24256 +
24257 +	case IKGLP_SEM:
24258 +		*lock = gsnedf_new_ikglp(args);
24259 +		break;
24260 +#endif
24261 +	case KFMLP_SEM:
24262 +		*lock = gsnedf_new_kfmlp(args);
24263 +		break;
24264 +	default:
24265 +		err = -ENXIO;
24266 +		goto UNSUPPORTED_LOCK;
24267 +	};
24268 +
24269 +	if (*lock)
24270 +		err = 0;
24271 +	else
24272 +		err = -ENOMEM;
24273 +
24274 +UNSUPPORTED_LOCK:
24275 +	return err;
24276 +}
24277 +
24278 +#endif  // CONFIG_LITMUS_LOCKING
24279 +
24280 +
24281 +
24282 +
24283 +
24284 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
24285 +static struct affinity_observer_ops gsnedf_kfmlp_affinity_ops = {
24286 +	.close = kfmlp_aff_obs_close,
24287 +	.deallocate = kfmlp_aff_obs_free,
24288 +};
24289 +
24290 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
24291 +static struct affinity_observer_ops gsnedf_ikglp_affinity_ops = {
24292 +	.close = ikglp_aff_obs_close,
24293 +	.deallocate = ikglp_aff_obs_free,
24294 +};
24295 +#endif
24296 +
24297 +static long gsnedf_allocate_affinity_observer(
24298 +								struct affinity_observer **aff_obs,
24299 +								int type,
24300 +								void* __user args)
24301 +{
24302 +	int err;
24303 +
24304 +	switch (type) {
24305 +
24306 +		case KFMLP_SIMPLE_GPU_AFF_OBS:
24307 +			*aff_obs = kfmlp_simple_gpu_aff_obs_new(&gsnedf_kfmlp_affinity_ops, args);
24308 +			break;
24309 +
24310 +		case KFMLP_GPU_AFF_OBS:
24311 +			*aff_obs = kfmlp_gpu_aff_obs_new(&gsnedf_kfmlp_affinity_ops, args);
24312 +			break;
24313 +
24314 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
24315 +		case IKGLP_SIMPLE_GPU_AFF_OBS:
24316 +			*aff_obs = ikglp_simple_gpu_aff_obs_new(&gsnedf_ikglp_affinity_ops, args);
24317 +			break;
24318 +
24319 +		case IKGLP_GPU_AFF_OBS:
24320 +			*aff_obs = ikglp_gpu_aff_obs_new(&gsnedf_ikglp_affinity_ops, args);
24321 +			break;
24322 +#endif
24323 +		default:
24324 +			err = -ENXIO;
24325 +			goto UNSUPPORTED_AFF_OBS;
24326 +	};
24327 +
24328 +	if (*aff_obs)
24329 +		err = 0;
24330 +	else
24331 +		err = -ENOMEM;
24332 +
24333 +UNSUPPORTED_AFF_OBS:
24334 +	return err;
24335 +}
24336 +#endif
24337 +
24338 +
24339 +
24340 +
24341 +
24342 +static long gsnedf_activate_plugin(void)
24343 +{
24344 +	int cpu;
24345 +	cpu_entry_t *entry;
24346 +
24347 +	INIT_BINHEAP_HANDLE(&gsnedf_cpu_heap, cpu_lower_prio);
24348 +#ifdef CONFIG_RELEASE_MASTER
24349 +	gsnedf.release_master = atomic_read(&release_master_cpu);
24350 +#endif
24351 +
24352 +	for_each_online_cpu(cpu) {
24353 +		entry = &per_cpu(gsnedf_cpu_entries, cpu);
24354 +		INIT_BINHEAP_NODE(&entry->hn);
24355 +		entry->linked    = NULL;
24356 +		entry->scheduled = NULL;
24357 +#ifdef CONFIG_RELEASE_MASTER
24358 +		if (cpu != gsnedf.release_master) {
24359 +#endif
24360 +			TRACE("GSN-EDF: Initializing CPU #%d.\n", cpu);
24361 +			update_cpu_position(entry);
24362 +#ifdef CONFIG_RELEASE_MASTER
24363 +		} else {
24364 +			TRACE("GSN-EDF: CPU %d is release master.\n", cpu);
24365 +		}
24366 +#endif
24367 +	}
24368 +
24369 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
24370 +	gsnedf_pending_tasklets.head = NULL;
24371 +	gsnedf_pending_tasklets.tail = &(gsnedf_pending_tasklets.head);
24372 +#endif
24373 +
24374 +#ifdef CONFIG_LITMUS_SOFTIRQD
24375 +    spawn_klitirqd(NULL);
24376 +#endif
24377 +
24378 +#ifdef CONFIG_LITMUS_NVIDIA
24379 +	init_nvidia_info();
24380 +#endif
24381 +
24382 +	return 0;
24383 +}
24384 +
24385 +/*	Plugin object	*/
24386 +static struct sched_plugin gsn_edf_plugin __cacheline_aligned_in_smp = {
24387 +	.plugin_name		= "GSN-EDF",
24388 +	.finish_switch		= gsnedf_finish_switch,
24389 +	.tick			= gsnedf_tick,
24390 +	.task_new		= gsnedf_task_new,
24391 +	.complete_job		= complete_job,
24392 +	.task_exit		= gsnedf_task_exit,
24393 +	.schedule		= gsnedf_schedule,
24394 +	.task_wake_up		= gsnedf_task_wake_up,
24395 +	.task_block		= gsnedf_task_block,
24396 +	.admit_task		= gsnedf_admit_task,
24397 +	.activate_plugin	= gsnedf_activate_plugin,
24398 +	.compare		= edf_higher_prio,
24399 +#ifdef CONFIG_LITMUS_LOCKING
24400 +	.allocate_lock		= gsnedf_allocate_lock,
24401 +	.increase_prio		= increase_priority_inheritance,
24402 +	.decrease_prio		= decrease_priority_inheritance,
24403 +#endif
24404 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
24405 +	.nested_increase_prio		= nested_increase_priority_inheritance,
24406 +	.nested_decrease_prio		= nested_decrease_priority_inheritance,
24407 +	.__compare					= __edf_higher_prio,
24408 +#endif
24409 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
24410 +	.get_dgl_spinlock = gsnedf_get_dgl_spinlock,
24411 +#endif
24412 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
24413 +	.allocate_aff_obs = gsnedf_allocate_affinity_observer,
24414 +#endif
24415 +#ifdef CONFIG_LITMUS_SOFTIRQD
24416 +	.increase_prio_klitirqd = increase_priority_inheritance_klitirqd,
24417 +	.decrease_prio_klitirqd = decrease_priority_inheritance_klitirqd,
24418 +#endif
24419 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
24420 +	.enqueue_pai_tasklet = gsnedf_enqueue_pai_tasklet,
24421 +	.change_prio_pai_tasklet = gsnedf_change_prio_pai_tasklet,
24422 +	.run_tasklets = gsnedf_run_tasklets,
24423 +#endif
24424 +};
24425 +
24426 +
24427 +static int __init init_gsn_edf(void)
24428 +{
24429 +	int cpu;
24430 +	cpu_entry_t *entry;
24431 +
24432 +	INIT_BINHEAP_HANDLE(&gsnedf_cpu_heap, cpu_lower_prio);
24433 +	/* initialize CPU state */
24434 +	for (cpu = 0; cpu < NR_CPUS; ++cpu)  {
24435 +		entry = &per_cpu(gsnedf_cpu_entries, cpu);
24436 +		gsnedf_cpus[cpu] = entry;
24437 +		entry->cpu 	 = cpu;
24438 +
24439 +		INIT_BINHEAP_NODE(&entry->hn);
24440 +	}
24441 +
24442 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
24443 +	raw_spin_lock_init(&dgl_lock);
24444 +#endif
24445 +
24446 +	edf_domain_init(&gsnedf, NULL, gsnedf_release_jobs);
24447 +	return register_sched_plugin(&gsn_edf_plugin);
24448 +}
24449 +
24450 +
24451 +module_init(init_gsn_edf);
24452 diff --git a/litmus/sched_litmus.c b/litmus/sched_litmus.c
24453 new file mode 100644
24454 index 0000000..9a6fe48
24455 --- /dev/null
24456 +++ b/litmus/sched_litmus.c
24457 @@ -0,0 +1,327 @@
24458 +/* This file is included from kernel/sched.c */
24459 +
24460 +#include <litmus/litmus.h>
24461 +#include <litmus/budget.h>
24462 +#include <litmus/sched_plugin.h>
24463 +#include <litmus/preempt.h>
24464 +
24465 +static void update_time_litmus(struct rq *rq, struct task_struct *p)
24466 +{
24467 +	u64 delta = rq->clock - p->se.exec_start;
24468 +	if (unlikely((s64)delta < 0))
24469 +		delta = 0;
24470 +	/* per job counter */
24471 +	p->rt_param.job_params.exec_time += delta;
24472 +	/* task counter */
24473 +	p->se.sum_exec_runtime += delta;
24474 +	/* sched_clock() */
24475 +	p->se.exec_start = rq->clock;
24476 +	cpuacct_charge(p, delta);
24477 +}
24478 +
24479 +static void double_rq_lock(struct rq *rq1, struct rq *rq2);
24480 +static void double_rq_unlock(struct rq *rq1, struct rq *rq2);
24481 +
24482 +/*
24483 + * litmus_tick gets called by scheduler_tick() with HZ freq
24484 + * Interrupts are disabled
24485 + */
24486 +static void litmus_tick(struct rq *rq, struct task_struct *p)
24487 +{
24488 +	TS_PLUGIN_TICK_START;
24489 +
24490 +	if (is_realtime(p))
24491 +		update_time_litmus(rq, p);
24492 +
24493 +	/* plugin tick */
24494 +	litmus->tick(p);
24495 +
24496 +	TS_PLUGIN_TICK_END;
24497 +
24498 +	return;
24499 +}
24500 +
24501 +static struct task_struct *
24502 +litmus_schedule(struct rq *rq, struct task_struct *prev)
24503 +{
24504 +	struct rq* other_rq;
24505 +	struct task_struct *next;
24506 +
24507 +	long was_running;
24508 +	lt_t _maybe_deadlock = 0;
24509 +
24510 +	/* let the plugin schedule */
24511 +	next = litmus->schedule(prev);
24512 +
24513 +	sched_state_plugin_check();
24514 +
24515 +	/* check if a global plugin pulled a task from a different RQ */
24516 +	if (next && task_rq(next) != rq) {
24517 +		/* we need to migrate the task */
24518 +		other_rq = task_rq(next);
24519 +		TRACE_TASK(next, "migrate from %d\n", other_rq->cpu);
24520 +
24521 +		/* while we drop the lock, the prev task could change its
24522 +		 * state
24523 +		 */
24524 +		was_running = is_running(prev);
24525 +		mb();
24526 +		raw_spin_unlock(&rq->lock);
24527 +
24528 +		/* Don't race with a concurrent switch.  This could deadlock in
24529 +		 * the case of cross or circular migrations.  It's the job of
24530 +		 * the plugin to make sure that doesn't happen.
24531 +		 */
24532 +		TRACE_TASK(next, "stack_in_use=%d\n",
24533 +			   next->rt_param.stack_in_use);
24534 +		if (next->rt_param.stack_in_use != NO_CPU) {
24535 +			TRACE_TASK(next, "waiting to deschedule\n");
24536 +			_maybe_deadlock = litmus_clock();
24537 +		}
24538 +		while (next->rt_param.stack_in_use != NO_CPU) {
24539 +			cpu_relax();
24540 +			mb();
24541 +			if (next->rt_param.stack_in_use == NO_CPU)
24542 +				TRACE_TASK(next,"descheduled. Proceeding.\n");
24543 +
24544 +			if (lt_before(_maybe_deadlock + 10000000,
24545 +				      litmus_clock())) {
24546 +				/* We've been spinning for 10ms.
24547 +				 * Something can't be right!
24548 +				 * Let's abandon the task and bail out; at least
24549 +				 * we will have debug info instead of a hard
24550 +				 * deadlock.
24551 +				 */
24552 +				TRACE_TASK(next,"stack too long in use. "
24553 +					   "Deadlock?\n");
24554 +				next = NULL;
24555 +
24556 +				/* bail out */
24557 +				raw_spin_lock(&rq->lock);
24558 +				return next;
24559 +			}
24560 +		}
24561 +#ifdef  __ARCH_WANT_UNLOCKED_CTXSW
24562 +		if (next->oncpu)
24563 +		{
24564 +			TRACE_TASK(next, "waiting for !oncpu");
24565 +		}
24566 +		while (next->oncpu) {
24567 +			cpu_relax();
24568 +			mb();
24569 +		}
24570 +#endif
24571 +		double_rq_lock(rq, other_rq);
24572 +		mb();
24573 +		if (is_realtime(prev) && is_running(prev) != was_running) {
24574 +			TRACE_TASK(prev,
24575 +				   "state changed while we dropped"
24576 +				   " the lock: is_running=%d, was_running=%d\n",
24577 +				   is_running(prev), was_running);
24578 +			if (is_running(prev) && !was_running) {
24579 +				/* prev task became unblocked
24580 +				 * we need to simulate normal sequence of events
24581 +				 * to scheduler plugins.
24582 +				 */
24583 +				litmus->task_block(prev);
24584 +				litmus->task_wake_up(prev);
24585 +			}
24586 +		}
24587 +
24588 +		set_task_cpu(next, smp_processor_id());
24589 +
24590 +		/* DEBUG: now that we have the lock we need to make sure a
24591 +		 *  couple of things still hold:
24592 +		 *  - it is still a real-time task
24593 +		 *  - it is still runnable (could have been stopped)
24594 +		 * If either is violated, then the active plugin is
24595 +		 * doing something wrong.
24596 +		 */
24597 +		if (!is_realtime(next) || !is_running(next)) {
24598 +			/* BAD BAD BAD */
24599 +			TRACE_TASK(next,"BAD: migration invariant FAILED: "
24600 +				   "rt=%d running=%d\n",
24601 +				   is_realtime(next),
24602 +				   is_running(next));
24603 +			/* drop the task */
24604 +			next = NULL;
24605 +		}
24606 +		/* release the other CPU's runqueue, but keep ours */
24607 +		raw_spin_unlock(&other_rq->lock);
24608 +	}
24609 +	if (next) {
24610 +		next->rt_param.stack_in_use = rq->cpu;
24611 +		next->se.exec_start = rq->clock;
24612 +	}
24613 +
24614 +	update_enforcement_timer(next);
24615 +	return next;
24616 +}
24617 +
24618 +static void enqueue_task_litmus(struct rq *rq, struct task_struct *p,
24619 +				int flags)
24620 +{
24621 +	if (flags & ENQUEUE_WAKEUP) {
24622 +		sched_trace_task_resume(p);
24623 +		tsk_rt(p)->present = 1;
24624 +		/* LITMUS^RT plugins need to update the state
24625 +		 * _before_ making it available in global structures.
24626 +		 * Linux gets away with being lazy about the task state
24627 +		 * update. We can't do that, hence we update the task
24628 +		 * state already here.
24629 +		 *
24630 +		 * WARNING: this needs to be re-evaluated when porting
24631 +		 *          to newer kernel versions.
24632 +		 */
24633 +		p->state = TASK_RUNNING;
24634 +		litmus->task_wake_up(p);
24635 +
24636 +		rq->litmus.nr_running++;
24637 +	} else
24638 +		TRACE_TASK(p, "ignoring an enqueue, not a wake up.\n");
24639 +}
24640 +
24641 +static void dequeue_task_litmus(struct rq *rq, struct task_struct *p,
24642 +				int flags)
24643 +{
24644 +	if (flags & DEQUEUE_SLEEP) {
24645 +		litmus->task_block(p);
24646 +		tsk_rt(p)->present = 0;
24647 +		sched_trace_task_block(p);
24648 +
24649 +		rq->litmus.nr_running--;
24650 +	} else
24651 +		TRACE_TASK(p, "ignoring a dequeue, not going to sleep.\n");
24652 +}
24653 +
24654 +static void yield_task_litmus(struct rq *rq)
24655 +{
24656 +	BUG_ON(rq->curr != current);
24657 +	/* sched_yield() is called to trigger delayed preemptions.
24658 +	 * Thus, mark the current task as needing to be rescheduled.
24659 +	 * This will cause the scheduler plugin to be invoked, which can
24660 +	 * then determine if a preemption is still required.
24661 +	 */
24662 +	clear_exit_np(current);
24663 +	litmus_reschedule_local();
24664 +}
24665 +
24666 +/* Plugins are responsible for this.
24667 + */
24668 +static void check_preempt_curr_litmus(struct rq *rq, struct task_struct *p, int flags)
24669 +{
24670 +}
24671 +
24672 +static void put_prev_task_litmus(struct rq *rq, struct task_struct *p)
24673 +{
24674 +}
24675 +
24676 +static void pre_schedule_litmus(struct rq *rq, struct task_struct *prev)
24677 +{
24678 +	update_time_litmus(rq, prev);
24679 +	if (!is_running(prev))
24680 +		tsk_rt(prev)->present = 0;
24681 +}
24682 +
24683 +/* pick_next_task_litmus() - litmus_schedule() function
24684 + *
24685 + * return the next task to be scheduled
24686 + */
24687 +static struct task_struct *pick_next_task_litmus(struct rq *rq)
24688 +{
24689 +	/* get the to-be-switched-out task (prev) */
24690 +	struct task_struct *prev = rq->litmus.prev;
24691 +	struct task_struct *next;
24692 +
24693 +	/* if not called from schedule() but from somewhere
24694 +	 * else (e.g., migration), return now!
24695 +	 */
24696 +	if(!rq->litmus.prev)
24697 +		return NULL;
24698 +
24699 +	rq->litmus.prev = NULL;
24700 +
24701 +	TS_PLUGIN_SCHED_START;
24702 +	next = litmus_schedule(rq, prev);
24703 +	TS_PLUGIN_SCHED_END;
24704 +
24705 +	return next;
24706 +}
24707 +
24708 +static void task_tick_litmus(struct rq *rq, struct task_struct *p, int queued)
24709 +{
24710 +	/* nothing to do; tick related tasks are done by litmus_tick() */
24711 +	return;
24712 +}
24713 +
24714 +static void switched_to_litmus(struct rq *rq, struct task_struct *p)
24715 +{
24716 +}
24717 +
24718 +static void prio_changed_litmus(struct rq *rq, struct task_struct *p,
24719 +				int oldprio)
24720 +{
24721 +}
24722 +
24723 +unsigned int get_rr_interval_litmus(struct rq *rq, struct task_struct *p)
24724 +{
24725 +	/* return infinity */
24726 +	return 0;
24727 +}
24728 +
24729 +/* This is called when a task became a real-time task, either due to a SCHED_*
24730 + * class transition or due to PI mutex inheritance. We don't handle Linux PI
24731 + * mutex inheritance yet (and probably never will). Use LITMUS provided
24732 + * synchronization primitives instead.
24733 + */
24734 +static void set_curr_task_litmus(struct rq *rq)
24735 +{
24736 +	rq->curr->se.exec_start = rq->clock;
24737 +}
24738 +
24739 +
24740 +#ifdef CONFIG_SMP
24741 +/* execve tries to rebalance task in this scheduling domain.
24742 + * We don't care about the scheduling domain; can gets called from
24743 + * exec, fork, wakeup.
24744 + */
24745 +static int
24746 +select_task_rq_litmus(struct task_struct *p, int sd_flag, int flags)
24747 +{
24748 +	/* preemption is already disabled.
24749 +	 * We don't want to change cpu here
24750 +	 */
24751 +	return task_cpu(p);
24752 +}
24753 +#endif
24754 +
24755 +static const struct sched_class litmus_sched_class = {
24756 +	/* From 34f971f6 the stop/migrate worker threads have a class on
24757 +	 * their own, which is the highest prio class. We don't support
24758 +	 * cpu-hotplug or cpu throttling. Allows Litmus to use up to 1.0
24759 +	 * CPU capacity.
24760 +	 */
24761 +	.next			= &stop_sched_class,
24762 +	.enqueue_task		= enqueue_task_litmus,
24763 +	.dequeue_task		= dequeue_task_litmus,
24764 +	.yield_task		= yield_task_litmus,
24765 +
24766 +	.check_preempt_curr	= check_preempt_curr_litmus,
24767 +
24768 +	.pick_next_task		= pick_next_task_litmus,
24769 +	.put_prev_task		= put_prev_task_litmus,
24770 +
24771 +#ifdef CONFIG_SMP
24772 +	.select_task_rq		= select_task_rq_litmus,
24773 +
24774 +	.pre_schedule		= pre_schedule_litmus,
24775 +#endif
24776 +
24777 +	.set_curr_task          = set_curr_task_litmus,
24778 +	.task_tick		= task_tick_litmus,
24779 +
24780 +	.get_rr_interval	= get_rr_interval_litmus,
24781 +
24782 +	.prio_changed		= prio_changed_litmus,
24783 +	.switched_to		= switched_to_litmus,
24784 +};
24785 diff --git a/litmus/sched_pfair.c b/litmus/sched_pfair.c
24786 new file mode 100644
24787 index 0000000..16f1065
24788 --- /dev/null
24789 +++ b/litmus/sched_pfair.c
24790 @@ -0,0 +1,1067 @@
24791 +/*
24792 + * kernel/sched_pfair.c
24793 + *
24794 + * Implementation of the PD^2 pfair scheduling algorithm. This
24795 + * implementation realizes "early releasing," i.e., it is work-conserving.
24796 + *
24797 + */
24798 +
24799 +#include <asm/div64.h>
24800 +#include <linux/delay.h>
24801 +#include <linux/module.h>
24802 +#include <linux/spinlock.h>
24803 +#include <linux/percpu.h>
24804 +#include <linux/sched.h>
24805 +#include <linux/list.h>
24806 +#include <linux/slab.h>
24807 +
24808 +#include <litmus/litmus.h>
24809 +#include <litmus/jobs.h>
24810 +#include <litmus/preempt.h>
24811 +#include <litmus/rt_domain.h>
24812 +#include <litmus/sched_plugin.h>
24813 +#include <litmus/sched_trace.h>
24814 +
24815 +#include <litmus/bheap.h>
24816 +
24817 +/* to configure the cluster size */
24818 +#include <litmus/litmus_proc.h>
24819 +
24820 +#include <litmus/clustered.h>
24821 +
24822 +static enum cache_level pfair_cluster_level = GLOBAL_CLUSTER;
24823 +
24824 +struct subtask {
24825 +	/* measured in quanta relative to job release */
24826 +	quanta_t release;
24827 +        quanta_t deadline;
24828 +	quanta_t overlap; /* called "b bit" by PD^2 */
24829 +	quanta_t group_deadline;
24830 +};
24831 +
24832 +struct pfair_param   {
24833 +	quanta_t	quanta;       /* number of subtasks */
24834 +	quanta_t	cur;          /* index of current subtask */
24835 +
24836 +	quanta_t	release;      /* in quanta */
24837 +	quanta_t	period;       /* in quanta */
24838 +
24839 +	quanta_t	last_quantum; /* when scheduled last */
24840 +	int		last_cpu;     /* where scheduled last */
24841 +
24842 +	struct pfair_cluster* cluster; /* where this task is scheduled */
24843 +
24844 +	struct subtask subtasks[0];   /* allocate together with pfair_param */
24845 +};
24846 +
24847 +#define tsk_pfair(tsk) ((tsk)->rt_param.pfair)
24848 +
24849 +struct pfair_state {
24850 +	struct cluster_cpu topology;
24851 +
24852 +	volatile quanta_t cur_tick;    /* updated by the CPU that is advancing
24853 +				        * the time */
24854 +	volatile quanta_t local_tick;  /* What tick is the local CPU currently
24855 +				        * executing? Updated only by the local
24856 +				        * CPU. In QEMU, this may lag behind the
24857 +				        * current tick. In a real system, with
24858 +				        * proper timers and aligned quanta,
24859 +				        * that should only be the case for a
24860 +				        * very short time after the time
24861 +				        * advanced. With staggered quanta, it
24862 +				        * will lag for the duration of the
24863 +				        * offset.
24864 +					*/
24865 +
24866 +	struct task_struct* linked;    /* the task that should be executing */
24867 +	struct task_struct* local;     /* the local copy of linked          */
24868 +	struct task_struct* scheduled; /* what is actually scheduled        */
24869 +
24870 +	lt_t offset;			/* stagger offset */
24871 +	unsigned int missed_updates;
24872 +	unsigned int missed_quanta;
24873 +};
24874 +
24875 +struct pfair_cluster {
24876 +	struct scheduling_cluster topology;
24877 +
24878 +	/* The "global" time in this cluster. */
24879 +	quanta_t pfair_time; /* the "official" PFAIR clock */
24880 +
24881 +	/* The ready queue for this cluster. */
24882 +	rt_domain_t pfair;
24883 +
24884 +	/* The set of jobs that should have their release enacted at the next
24885 +	 * quantum boundary.
24886 +	 */
24887 +	struct bheap release_queue;
24888 +	raw_spinlock_t release_lock;
24889 +};
24890 +
24891 +#define RT_F_REQUEUE 0x2
24892 +
24893 +static inline struct pfair_cluster* cpu_cluster(struct pfair_state* state)
24894 +{
24895 +	return container_of(state->topology.cluster, struct pfair_cluster, topology);
24896 +}
24897 +
24898 +static inline int cpu_id(struct pfair_state* state)
24899 +{
24900 +	return state->topology.id;
24901 +}
24902 +
24903 +static inline struct pfair_state* from_cluster_list(struct list_head* pos)
24904 +{
24905 +	return list_entry(pos, struct pfair_state, topology.cluster_list);
24906 +}
24907 +
24908 +static inline struct pfair_cluster* from_domain(rt_domain_t* rt)
24909 +{
24910 +	return container_of(rt, struct pfair_cluster, pfair);
24911 +}
24912 +
24913 +static inline raw_spinlock_t* cluster_lock(struct pfair_cluster* cluster)
24914 +{
24915 +	/* The ready_lock is used to serialize all scheduling events. */
24916 +	return &cluster->pfair.ready_lock;
24917 +}
24918 +
24919 +static inline raw_spinlock_t* cpu_lock(struct pfair_state* state)
24920 +{
24921 +	return cluster_lock(cpu_cluster(state));
24922 +}
24923 +
24924 +DEFINE_PER_CPU(struct pfair_state, pfair_state);
24925 +struct pfair_state* *pstate; /* short cut */
24926 +
24927 +static struct pfair_cluster* pfair_clusters;
24928 +static int num_pfair_clusters;
24929 +
24930 +/* Enable for lots of trace info.
24931 + * #define PFAIR_DEBUG
24932 + */
24933 +
24934 +#ifdef PFAIR_DEBUG
24935 +#define PTRACE_TASK(t, f, args...)  TRACE_TASK(t, f, ## args)
24936 +#define PTRACE(f, args...) TRACE(f, ## args)
24937 +#else
24938 +#define PTRACE_TASK(t, f, args...)
24939 +#define PTRACE(f, args...)
24940 +#endif
24941 +
24942 +/* gcc will inline all of these accessor functions... */
24943 +static struct subtask* cur_subtask(struct task_struct* t)
24944 +{
24945 +	return tsk_pfair(t)->subtasks + tsk_pfair(t)->cur;
24946 +}
24947 +
24948 +static quanta_t cur_deadline(struct task_struct* t)
24949 +{
24950 +	return cur_subtask(t)->deadline +  tsk_pfair(t)->release;
24951 +}
24952 +
24953 +static quanta_t cur_release(struct task_struct* t)
24954 +{
24955 +	/* This is early releasing: only the release of the first subtask
24956 +	 * counts. */
24957 +	return tsk_pfair(t)->release;
24958 +}
24959 +
24960 +static quanta_t cur_overlap(struct task_struct* t)
24961 +{
24962 +	return cur_subtask(t)->overlap;
24963 +}
24964 +
24965 +static quanta_t cur_group_deadline(struct task_struct* t)
24966 +{
24967 +	quanta_t gdl = cur_subtask(t)->group_deadline;
24968 +	if (gdl)
24969 +		return gdl + tsk_pfair(t)->release;
24970 +	else
24971 +		return gdl;
24972 +}
24973 +
24974 +
24975 +static int pfair_higher_prio(struct task_struct* first,
24976 +			     struct task_struct* second)
24977 +{
24978 +	return  /* first task must exist */
24979 +		first && (
24980 +		/* Does the second task exist and is it a real-time task?  If
24981 +		 * not, the first task (which is a RT task) has higher
24982 +		 * priority.
24983 +		 */
24984 +		!second || !is_realtime(second)  ||
24985 +
24986 +		/* Is the (subtask) deadline of the first task earlier?
24987 +		 * Then it has higher priority.
24988 +		 */
24989 +		time_before(cur_deadline(first), cur_deadline(second)) ||
24990 +
24991 +		/* Do we have a deadline tie?
24992 +		 * Then break by B-bit.
24993 +		 */
24994 +		(cur_deadline(first) == cur_deadline(second) &&
24995 +		 (cur_overlap(first) > cur_overlap(second) ||
24996 +
24997 +		/* Do we have a B-bit tie?
24998 +		 * Then break by group deadline.
24999 +		 */
25000 +		(cur_overlap(first) == cur_overlap(second) &&
25001 +		 (time_after(cur_group_deadline(first),
25002 +			     cur_group_deadline(second)) ||
25003 +
25004 +		/* Do we have a group deadline tie?
25005 +		 * Then break by PID, which are unique.
25006 +		 */
25007 +		(cur_group_deadline(first) ==
25008 +		 cur_group_deadline(second) &&
25009 +		 first->pid < second->pid))))));
25010 +}
25011 +
25012 +int pfair_ready_order(struct bheap_node* a, struct bheap_node* b)
25013 +{
25014 +	return pfair_higher_prio(bheap2task(a), bheap2task(b));
25015 +}
25016 +
25017 +static void pfair_release_jobs(rt_domain_t* rt, struct bheap* tasks)
25018 +{
25019 +	struct pfair_cluster* cluster = from_domain(rt);
25020 +	unsigned long flags;
25021 +
25022 +	raw_spin_lock_irqsave(&cluster->release_lock, flags);
25023 +
25024 +	bheap_union(pfair_ready_order, &cluster->release_queue, tasks);
25025 +
25026 +	raw_spin_unlock_irqrestore(&cluster->release_lock, flags);
25027 +}
25028 +
25029 +static void prepare_release(struct task_struct* t, quanta_t at)
25030 +{
25031 +	tsk_pfair(t)->release    = at;
25032 +	tsk_pfair(t)->cur        = 0;
25033 +}
25034 +
25035 +/* pull released tasks from the release queue */
25036 +static void poll_releases(struct pfair_cluster* cluster)
25037 +{
25038 +	raw_spin_lock(&cluster->release_lock);
25039 +	__merge_ready(&cluster->pfair, &cluster->release_queue);
25040 +	raw_spin_unlock(&cluster->release_lock);
25041 +}
25042 +
25043 +static void check_preempt(struct task_struct* t)
25044 +{
25045 +	int cpu = NO_CPU;
25046 +	if (tsk_rt(t)->linked_on != tsk_rt(t)->scheduled_on &&
25047 +	    tsk_rt(t)->present) {
25048 +		/* the task can be scheduled and
25049 +		 * is not scheduled where it ought to be scheduled
25050 +		 */
25051 +		cpu = tsk_rt(t)->linked_on != NO_CPU ?
25052 +			tsk_rt(t)->linked_on         :
25053 +			tsk_rt(t)->scheduled_on;
25054 +		PTRACE_TASK(t, "linked_on:%d, scheduled_on:%d\n",
25055 +			   tsk_rt(t)->linked_on, tsk_rt(t)->scheduled_on);
25056 +		/* preempt */
25057 +		litmus_reschedule(cpu);
25058 +	}
25059 +}
25060 +
25061 +/* caller must hold pfair.ready_lock */
25062 +static void drop_all_references(struct task_struct *t)
25063 +{
25064 +        int cpu;
25065 +        struct pfair_state* s;
25066 +	struct pfair_cluster* cluster;
25067 +        if (bheap_node_in_heap(tsk_rt(t)->heap_node)) {
25068 +                /* It must be in the ready queue; drop references isn't called
25069 +		 * when the job is in a release queue. */
25070 +		cluster = tsk_pfair(t)->cluster;
25071 +                bheap_delete(pfair_ready_order, &cluster->pfair.ready_queue,
25072 +                            tsk_rt(t)->heap_node);
25073 +        }
25074 +        for (cpu = 0; cpu < num_online_cpus(); cpu++) {
25075 +                s = &per_cpu(pfair_state, cpu);
25076 +                if (s->linked == t)
25077 +                        s->linked = NULL;
25078 +                if (s->local  == t)
25079 +                        s->local  = NULL;
25080 +                if (s->scheduled  == t)
25081 +                        s->scheduled = NULL;
25082 +        }
25083 +	/* make sure we don't have a stale linked_on field */
25084 +	tsk_rt(t)->linked_on = NO_CPU;
25085 +}
25086 +
25087 +static void pfair_prepare_next_period(struct task_struct* t)
25088 +{
25089 +	struct pfair_param* p = tsk_pfair(t);
25090 +
25091 +	prepare_for_next_period(t);
25092 +	get_rt_flags(t) = RT_F_RUNNING;
25093 +	p->release += p->period;
25094 +}
25095 +
25096 +/* returns 1 if the task needs to go the release queue */
25097 +static int advance_subtask(quanta_t time, struct task_struct* t, int cpu)
25098 +{
25099 +	struct pfair_param* p = tsk_pfair(t);
25100 +	int to_relq;
25101 +	p->cur = (p->cur + 1) % p->quanta;
25102 +	if (!p->cur) {
25103 +		if (tsk_rt(t)->present) {
25104 +			/* The job overran; we start a new budget allocation. */
25105 +			pfair_prepare_next_period(t);
25106 +		} else {
25107 +			/* remove task from system until it wakes */
25108 +			drop_all_references(t);
25109 +			tsk_rt(t)->flags = RT_F_REQUEUE;
25110 +			TRACE_TASK(t, "on %d advanced to subtask %lu (not present)\n",
25111 +				   cpu, p->cur);
25112 +			return 0;
25113 +		}
25114 +	}
25115 +	to_relq = time_after(cur_release(t), time);
25116 +	TRACE_TASK(t, "on %d advanced to subtask %lu -> to_relq=%d (cur_release:%lu time:%lu)\n",
25117 +		   cpu, p->cur, to_relq, cur_release(t), time);
25118 +	return to_relq;
25119 +}
25120 +
25121 +static void advance_subtasks(struct pfair_cluster *cluster, quanta_t time)
25122 +{
25123 +	struct task_struct* l;
25124 +	struct pfair_param* p;
25125 +	struct list_head* pos;
25126 +	struct pfair_state* cpu;
25127 +
25128 +	list_for_each(pos, &cluster->topology.cpus) {
25129 +		cpu = from_cluster_list(pos);
25130 +		l = cpu->linked;
25131 +		cpu->missed_updates += cpu->linked != cpu->local;
25132 +		if (l) {
25133 +			p = tsk_pfair(l);
25134 +			p->last_quantum = time;
25135 +			p->last_cpu     =  cpu_id(cpu);
25136 +			if (advance_subtask(time, l, cpu_id(cpu))) {
25137 +				//cpu->linked = NULL;
25138 +				PTRACE_TASK(l, "should go to release queue. "
25139 +					    "scheduled_on=%d present=%d\n",
25140 +					    tsk_rt(l)->scheduled_on,
25141 +					    tsk_rt(l)->present);
25142 +			}
25143 +		}
25144 +	}
25145 +}
25146 +
25147 +static int target_cpu(quanta_t time, struct task_struct* t, int default_cpu)
25148 +{
25149 +	int cpu;
25150 +	if (tsk_rt(t)->scheduled_on != NO_CPU) {
25151 +		/* always observe scheduled_on linkage */
25152 +		default_cpu = tsk_rt(t)->scheduled_on;
25153 +	} else if (tsk_pfair(t)->last_quantum == time - 1) {
25154 +		/* back2back quanta */
25155 +		/* Only observe last_quantum if no scheduled_on is in the way.
25156 +		 * This should only kick in if a CPU missed quanta, and that
25157 +		 * *should* only happen in QEMU.
25158 +		 */
25159 +		cpu = tsk_pfair(t)->last_cpu;
25160 +		if (!pstate[cpu]->linked ||
25161 +		    tsk_rt(pstate[cpu]->linked)->scheduled_on != cpu) {
25162 +			default_cpu = cpu;
25163 +		}
25164 +	}
25165 +	return default_cpu;
25166 +}
25167 +
25168 +/* returns one if linking was redirected */
25169 +static int pfair_link(quanta_t time, int cpu,
25170 +		      struct task_struct* t)
25171 +{
25172 +	int target = target_cpu(time, t, cpu);
25173 +	struct task_struct* prev  = pstate[cpu]->linked;
25174 +	struct task_struct* other;
25175 +	struct pfair_cluster* cluster = cpu_cluster(pstate[cpu]);
25176 +
25177 +	if (target != cpu) {
25178 +		BUG_ON(pstate[target]->topology.cluster != pstate[cpu]->topology.cluster);
25179 +		other = pstate[target]->linked;
25180 +		pstate[target]->linked = t;
25181 +		tsk_rt(t)->linked_on   = target;
25182 +		if (!other)
25183 +			/* linked ok, but reschedule this CPU */
25184 +			return 1;
25185 +		if (target < cpu) {
25186 +			/* link other to cpu instead */
25187 +			tsk_rt(other)->linked_on = cpu;
25188 +			pstate[cpu]->linked      = other;
25189 +			if (prev) {
25190 +				/* prev got pushed back into the ready queue */
25191 +				tsk_rt(prev)->linked_on = NO_CPU;
25192 +				__add_ready(&cluster->pfair, prev);
25193 +			}
25194 +			/* we are done with this cpu */
25195 +			return 0;
25196 +		} else {
25197 +			/* re-add other, it's original CPU was not considered yet */
25198 +			tsk_rt(other)->linked_on = NO_CPU;
25199 +			__add_ready(&cluster->pfair, other);
25200 +			/* reschedule this CPU */
25201 +			return 1;
25202 +		}
25203 +	} else {
25204 +		pstate[cpu]->linked  = t;
25205 +		tsk_rt(t)->linked_on = cpu;
25206 +		if (prev) {
25207 +			/* prev got pushed back into the ready queue */
25208 +			tsk_rt(prev)->linked_on = NO_CPU;
25209 +			__add_ready(&cluster->pfair, prev);
25210 +		}
25211 +		/* we are done with this CPU */
25212 +		return 0;
25213 +	}
25214 +}
25215 +
25216 +static void schedule_subtasks(struct pfair_cluster *cluster, quanta_t time)
25217 +{
25218 +	int retry;
25219 +	struct list_head *pos;
25220 +	struct pfair_state *cpu_state;
25221 +
25222 +	list_for_each(pos, &cluster->topology.cpus) {
25223 +		cpu_state = from_cluster_list(pos);
25224 +		retry = 1;
25225 +#ifdef CONFIG_RELEASE_MASTER
25226 +		/* skip release master */
25227 +		if (cluster->pfair.release_master == cpu_id(cpu_state))
25228 +			continue;
25229 +#endif
25230 +		while (retry) {
25231 +			if (pfair_higher_prio(__peek_ready(&cluster->pfair),
25232 +					      cpu_state->linked))
25233 +				retry = pfair_link(time, cpu_id(cpu_state),
25234 +						   __take_ready(&cluster->pfair));
25235 +			else
25236 +				retry = 0;
25237 +		}
25238 +	}
25239 +}
25240 +
25241 +static void schedule_next_quantum(struct pfair_cluster *cluster, quanta_t time)
25242 +{
25243 +	struct pfair_state *cpu;
25244 +	struct list_head* pos;
25245 +
25246 +	/* called with interrupts disabled */
25247 +	PTRACE("--- Q %lu at %llu PRE-SPIN\n",
25248 +	       time, litmus_clock());
25249 +	raw_spin_lock(cluster_lock(cluster));
25250 +	PTRACE("<<< Q %lu at %llu\n",
25251 +	       time, litmus_clock());
25252 +
25253 +	sched_trace_quantum_boundary();
25254 +
25255 +	advance_subtasks(cluster, time);
25256 +	poll_releases(cluster);
25257 +	schedule_subtasks(cluster, time);
25258 +
25259 +	list_for_each(pos, &cluster->topology.cpus) {
25260 +		cpu = from_cluster_list(pos);
25261 +		if (cpu->linked)
25262 +			PTRACE_TASK(cpu->linked,
25263 +				    " linked on %d.\n", cpu_id(cpu));
25264 +		else
25265 +			PTRACE("(null) linked on %d.\n", cpu_id(cpu));
25266 +	}
25267 +	/* We are done. Advance time. */
25268 +	mb();
25269 +	list_for_each(pos, &cluster->topology.cpus) {
25270 +		cpu = from_cluster_list(pos);
25271 +		if (cpu->local_tick != cpu->cur_tick) {
25272 +			TRACE("BAD Quantum not acked on %d "
25273 +			      "(l:%lu c:%lu p:%lu)\n",
25274 +			      cpu_id(cpu),
25275 +			      cpu->local_tick,
25276 +			      cpu->cur_tick,
25277 +			      cluster->pfair_time);
25278 +			cpu->missed_quanta++;
25279 +		}
25280 +		cpu->cur_tick = time;
25281 +	}
25282 +	PTRACE(">>> Q %lu at %llu\n",
25283 +	       time, litmus_clock());
25284 +	raw_spin_unlock(cluster_lock(cluster));
25285 +}
25286 +
25287 +static noinline void wait_for_quantum(quanta_t q, struct pfair_state* state)
25288 +{
25289 +	quanta_t loc;
25290 +
25291 +	goto first; /* skip mb() on first iteration */
25292 +	do {
25293 +		cpu_relax();
25294 +		mb();
25295 +	first:	loc = state->cur_tick;
25296 +		/* FIXME: what if loc > cur? */
25297 +	} while (time_before(loc, q));
25298 +	PTRACE("observed cur_tick:%lu >= q:%lu\n",
25299 +	       loc, q);
25300 +}
25301 +
25302 +static quanta_t current_quantum(struct pfair_state* state)
25303 +{
25304 +	lt_t t = litmus_clock() - state->offset;
25305 +	return time2quanta(t, FLOOR);
25306 +}
25307 +
25308 +static void catchup_quanta(quanta_t from, quanta_t target,
25309 +			   struct pfair_state* state)
25310 +{
25311 +	quanta_t cur = from, time;
25312 +	TRACE("+++< BAD catching up quanta from %lu to %lu\n",
25313 +	      from, target);
25314 +	while (time_before(cur, target)) {
25315 +		wait_for_quantum(cur, state);
25316 +		cur++;
25317 +		time = cmpxchg(&cpu_cluster(state)->pfair_time,
25318 +			       cur - 1,   /* expected */
25319 +			       cur        /* next     */
25320 +			);
25321 +		if (time == cur - 1)
25322 +			schedule_next_quantum(cpu_cluster(state), cur);
25323 +	}
25324 +	TRACE("+++> catching up done\n");
25325 +}
25326 +
25327 +/* pfair_tick - this function is called for every local timer
25328 + *                         interrupt.
25329 + */
25330 +static void pfair_tick(struct task_struct* t)
25331 +{
25332 +	struct pfair_state* state = &__get_cpu_var(pfair_state);
25333 +	quanta_t time, cur;
25334 +	int retry = 10;
25335 +
25336 +	do {
25337 +		cur  = current_quantum(state);
25338 +		PTRACE("q %lu at %llu\n", cur, litmus_clock());
25339 +
25340 +		/* Attempt to advance time. First CPU to get here
25341 +		 * will prepare the next quantum.
25342 +		 */
25343 +		time = cmpxchg(&cpu_cluster(state)->pfair_time,
25344 +			       cur - 1,   /* expected */
25345 +			       cur        /* next     */
25346 +			);
25347 +		if (time == cur - 1) {
25348 +			/* exchange succeeded */
25349 +			wait_for_quantum(cur - 1, state);
25350 +			schedule_next_quantum(cpu_cluster(state), cur);
25351 +			retry = 0;
25352 +		} else if (time_before(time, cur - 1)) {
25353 +			/* the whole system missed a tick !? */
25354 +			catchup_quanta(time, cur, state);
25355 +			retry--;
25356 +		} else if (time_after(time, cur)) {
25357 +			/* our timer lagging behind!? */
25358 +			TRACE("BAD pfair_time:%lu > cur:%lu\n", time, cur);
25359 +			retry--;
25360 +		} else {
25361 +			/* Some other CPU already started scheduling
25362 +			 * this quantum. Let it do its job and then update.
25363 +			 */
25364 +			retry = 0;
25365 +		}
25366 +	} while (retry);
25367 +
25368 +	/* Spin locally until time advances. */
25369 +	wait_for_quantum(cur, state);
25370 +
25371 +	/* copy assignment */
25372 +	/* FIXME: what if we race with a future update? Corrupted state? */
25373 +	state->local      = state->linked;
25374 +	/* signal that we are done */
25375 +	mb();
25376 +	state->local_tick = state->cur_tick;
25377 +
25378 +	if (state->local != current
25379 +	    && (is_realtime(current) || is_present(state->local)))
25380 +		litmus_reschedule_local();
25381 +}
25382 +
25383 +static int safe_to_schedule(struct task_struct* t, int cpu)
25384 +{
25385 +	int where = tsk_rt(t)->scheduled_on;
25386 +	if (where != NO_CPU && where != cpu) {
25387 +		TRACE_TASK(t, "BAD: can't be scheduled on %d, "
25388 +			   "scheduled already on %d.\n", cpu, where);
25389 +		return 0;
25390 +	} else
25391 +		return tsk_rt(t)->present && get_rt_flags(t) == RT_F_RUNNING;
25392 +}
25393 +
25394 +static struct task_struct* pfair_schedule(struct task_struct * prev)
25395 +{
25396 +	struct pfair_state* state = &__get_cpu_var(pfair_state);
25397 +	struct pfair_cluster* cluster = cpu_cluster(state);
25398 +	int blocks, completion, out_of_time;
25399 +	struct task_struct* next = NULL;
25400 +
25401 +#ifdef CONFIG_RELEASE_MASTER
25402 +	/* Bail out early if we are the release master.
25403 +	 * The release master never schedules any real-time tasks.
25404 +	 */
25405 +	if (unlikely(cluster->pfair.release_master == cpu_id(state))) {
25406 +		sched_state_task_picked();
25407 +		return NULL;
25408 +	}
25409 +#endif
25410 +
25411 +	raw_spin_lock(cpu_lock(state));
25412 +
25413 +	blocks      = is_realtime(prev) && !is_running(prev);
25414 +	completion  = is_realtime(prev) && get_rt_flags(prev) == RT_F_SLEEP;
25415 +	out_of_time = is_realtime(prev) && time_after(cur_release(prev),
25416 +						      state->local_tick);
25417 +
25418 +	if (is_realtime(prev))
25419 +	    PTRACE_TASK(prev, "blocks:%d completion:%d out_of_time:%d\n",
25420 +			blocks, completion, out_of_time);
25421 +
25422 +	if (completion) {
25423 +		sched_trace_task_completion(prev, 0);
25424 +		pfair_prepare_next_period(prev);
25425 +		prepare_release(prev, cur_release(prev));
25426 +	}
25427 +
25428 +	if (!blocks && (completion || out_of_time)) {
25429 +		drop_all_references(prev);
25430 +		sched_trace_task_release(prev);
25431 +		add_release(&cluster->pfair, prev);
25432 +	}
25433 +
25434 +	if (state->local && safe_to_schedule(state->local, cpu_id(state)))
25435 +		next = state->local;
25436 +
25437 +	if (prev != next) {
25438 +		tsk_rt(prev)->scheduled_on = NO_CPU;
25439 +		if (next)
25440 +			tsk_rt(next)->scheduled_on = cpu_id(state);
25441 +	}
25442 +	sched_state_task_picked();
25443 +	raw_spin_unlock(cpu_lock(state));
25444 +
25445 +	if (next)
25446 +		TRACE_TASK(next, "scheduled rel=%lu at %lu (%llu)\n",
25447 +			   tsk_pfair(next)->release, cpu_cluster(state)->pfair_time, litmus_clock());
25448 +	else if (is_realtime(prev))
25449 +		TRACE("Becomes idle at %lu (%llu)\n", cpu_cluster(state)->pfair_time, litmus_clock());
25450 +
25451 +	return next;
25452 +}
25453 +
25454 +static void pfair_task_new(struct task_struct * t, int on_rq, int running)
25455 +{
25456 +	unsigned long flags;
25457 +	struct pfair_cluster* cluster;
25458 +
25459 +	TRACE("pfair: task new %d state:%d\n", t->pid, t->state);
25460 +
25461 +	cluster = tsk_pfair(t)->cluster;
25462 +
25463 +	raw_spin_lock_irqsave(cluster_lock(cluster), flags);
25464 +
25465 +	prepare_release(t, cluster->pfair_time + 1);
25466 +
25467 +	t->rt_param.scheduled_on = NO_CPU;
25468 +
25469 +	if (running) {
25470 +#ifdef CONFIG_RELEASE_MASTER
25471 +		if (task_cpu(t) != cluster->pfair.release_master)
25472 +#endif
25473 +			t->rt_param.scheduled_on = task_cpu(t);
25474 +		__add_ready(&cluster->pfair, t);
25475 +	}
25476 +
25477 +	check_preempt(t);
25478 +
25479 +	raw_spin_unlock_irqrestore(cluster_lock(cluster), flags);
25480 +}
25481 +
25482 +static void pfair_task_wake_up(struct task_struct *t)
25483 +{
25484 +	unsigned long flags;
25485 +	lt_t now;
25486 +	int requeue = 0;
25487 +	struct pfair_cluster* cluster;
25488 +
25489 +	cluster = tsk_pfair(t)->cluster;
25490 +
25491 +	TRACE_TASK(t, "wakes at %llu, release=%lu, pfair_time:%lu\n",
25492 +		   litmus_clock(), cur_release(t), cluster->pfair_time);
25493 +
25494 +	raw_spin_lock_irqsave(cluster_lock(cluster), flags);
25495 +
25496 +	/* If a task blocks and wakes before its next job release,
25497 +	 * then it may resume if it is currently linked somewhere
25498 +	 * (as if it never blocked at all). Otherwise, we have a
25499 +	 * new sporadic job release.
25500 +	 */
25501 +	requeue = tsk_rt(t)->flags == RT_F_REQUEUE;
25502 +	now = litmus_clock();
25503 +	if (lt_before(get_deadline(t), now)) {
25504 +		TRACE_TASK(t, "sporadic release!\n");
25505 +		release_at(t, now);
25506 +		prepare_release(t, time2quanta(now, CEIL));
25507 +		sched_trace_task_release(t);
25508 +	}
25509 +
25510 +	/* only add to ready queue if the task isn't still linked somewhere */
25511 +	if (requeue) {
25512 +		TRACE_TASK(t, "requeueing required\n");
25513 +		tsk_rt(t)->flags = RT_F_RUNNING;
25514 +		__add_ready(&cluster->pfair, t);
25515 +	}
25516 +
25517 +	check_preempt(t);
25518 +
25519 +	raw_spin_unlock_irqrestore(cluster_lock(cluster), flags);
25520 +	TRACE_TASK(t, "wake up done at %llu\n", litmus_clock());
25521 +}
25522 +
25523 +static void pfair_task_block(struct task_struct *t)
25524 +{
25525 +	BUG_ON(!is_realtime(t));
25526 +	TRACE_TASK(t, "blocks at %llu, state:%d\n",
25527 +		   litmus_clock(), t->state);
25528 +}
25529 +
25530 +static void pfair_task_exit(struct task_struct * t)
25531 +{
25532 +	unsigned long flags;
25533 +	struct pfair_cluster *cluster;
25534 +
25535 +	BUG_ON(!is_realtime(t));
25536 +
25537 +	cluster = tsk_pfair(t)->cluster;
25538 +
25539 +	/* Remote task from release or ready queue, and ensure
25540 +	 * that it is not the scheduled task for ANY CPU. We
25541 +	 * do this blanket check because occassionally when
25542 +	 * tasks exit while blocked, the task_cpu of the task
25543 +	 * might not be the same as the CPU that the PFAIR scheduler
25544 +	 * has chosen for it.
25545 +	 */
25546 +	raw_spin_lock_irqsave(cluster_lock(cluster), flags);
25547 +
25548 +	TRACE_TASK(t, "RIP, state:%d\n", t->state);
25549 +	drop_all_references(t);
25550 +
25551 +	raw_spin_unlock_irqrestore(cluster_lock(cluster), flags);
25552 +
25553 +	kfree(t->rt_param.pfair);
25554 +	t->rt_param.pfair = NULL;
25555 +}
25556 +
25557 +
25558 +static void pfair_release_at(struct task_struct* task, lt_t start)
25559 +{
25560 +	unsigned long flags;
25561 +	quanta_t release;
25562 +
25563 +	struct pfair_cluster *cluster;
25564 +
25565 +	cluster = tsk_pfair(task)->cluster;
25566 +
25567 +	BUG_ON(!is_realtime(task));
25568 +
25569 +	raw_spin_lock_irqsave(cluster_lock(cluster), flags);
25570 +	release_at(task, start);
25571 +	release = time2quanta(start, CEIL);
25572 +
25573 +	TRACE_TASK(task, "sys release at %lu\n", release);
25574 +
25575 +	drop_all_references(task);
25576 +	prepare_release(task, release);
25577 +	add_release(&cluster->pfair, task);
25578 +
25579 +	raw_spin_unlock_irqrestore(cluster_lock(cluster), flags);
25580 +}
25581 +
25582 +static void init_subtask(struct subtask* sub, unsigned long i,
25583 +			 lt_t quanta, lt_t period)
25584 +{
25585 +	/* since i is zero-based, the formulas are shifted by one */
25586 +	lt_t tmp;
25587 +
25588 +	/* release */
25589 +	tmp = period * i;
25590 +	do_div(tmp, quanta); /* floor */
25591 +	sub->release = (quanta_t) tmp;
25592 +
25593 +	/* deadline */
25594 +	tmp = period * (i + 1);
25595 +	if (do_div(tmp, quanta)) /* ceil */
25596 +		tmp++;
25597 +	sub->deadline = (quanta_t) tmp;
25598 +
25599 +	/* next release */
25600 +	tmp = period * (i + 1);
25601 +	do_div(tmp, quanta); /* floor */
25602 +	sub->overlap =  sub->deadline - (quanta_t) tmp;
25603 +
25604 +	/* Group deadline.
25605 +	 * Based on the formula given in Uma's thesis.
25606 +	 */
25607 +	if (2 * quanta >= period) {
25608 +		/* heavy */
25609 +		tmp = (sub->deadline - (i + 1)) * period;
25610 +		if (period > quanta &&
25611 +		    do_div(tmp, (period - quanta))) /* ceil */
25612 +			tmp++;
25613 +		sub->group_deadline = (quanta_t) tmp;
25614 +	} else
25615 +		sub->group_deadline = 0;
25616 +}
25617 +
25618 +static void dump_subtasks(struct task_struct* t)
25619 +{
25620 +	unsigned long i;
25621 +	for (i = 0; i < t->rt_param.pfair->quanta; i++)
25622 +		TRACE_TASK(t, "SUBTASK %lu: rel=%lu dl=%lu bbit:%lu gdl:%lu\n",
25623 +			   i + 1,
25624 +			   t->rt_param.pfair->subtasks[i].release,
25625 +			   t->rt_param.pfair->subtasks[i].deadline,
25626 +			   t->rt_param.pfair->subtasks[i].overlap,
25627 +			   t->rt_param.pfair->subtasks[i].group_deadline);
25628 +}
25629 +
25630 +static long pfair_admit_task(struct task_struct* t)
25631 +{
25632 +	lt_t quanta;
25633 +	lt_t period;
25634 +	s64  quantum_length = ktime_to_ns(tick_period);
25635 +	struct pfair_param* param;
25636 +	unsigned long i;
25637 +
25638 +	/* first check that the task is in the right cluster */
25639 +	if (cpu_cluster(pstate[tsk_rt(t)->task_params.cpu]) !=
25640 +	    cpu_cluster(pstate[task_cpu(t)]))
25641 +		return -EINVAL;
25642 +
25643 +	/* Pfair is a tick-based method, so the time
25644 +	 * of interest is jiffies. Calculate tick-based
25645 +	 * times for everything.
25646 +	 * (Ceiling of exec cost, floor of period.)
25647 +	 */
25648 +
25649 +	quanta = get_exec_cost(t);
25650 +	period = get_rt_period(t);
25651 +
25652 +	quanta = time2quanta(get_exec_cost(t), CEIL);
25653 +
25654 +	if (do_div(period, quantum_length))
25655 +		printk(KERN_WARNING
25656 +		       "The period of %s/%d is not a multiple of %llu.\n",
25657 +		       t->comm, t->pid, (unsigned long long) quantum_length);
25658 +
25659 +	if (quanta == period) {
25660 +		/* special case: task has weight 1.0 */
25661 +		printk(KERN_INFO
25662 +		       "Admitting weight 1.0 task. (%s/%d, %llu, %llu).\n",
25663 +		       t->comm, t->pid, quanta, period);
25664 +		quanta = 1;
25665 +		period = 1;
25666 +	}
25667 +
25668 +	param = kmalloc(sizeof(*param) +
25669 +			quanta * sizeof(struct subtask), GFP_ATOMIC);
25670 +
25671 +	if (!param)
25672 +		return -ENOMEM;
25673 +
25674 +	param->quanta  = quanta;
25675 +	param->cur     = 0;
25676 +	param->release = 0;
25677 +	param->period  = period;
25678 +
25679 +	param->cluster = cpu_cluster(pstate[tsk_rt(t)->task_params.cpu]);
25680 +
25681 +	for (i = 0; i < quanta; i++)
25682 +		init_subtask(param->subtasks + i, i, quanta, period);
25683 +
25684 +	if (t->rt_param.pfair)
25685 +		/* get rid of stale allocation */
25686 +		kfree(t->rt_param.pfair);
25687 +
25688 +	t->rt_param.pfair = param;
25689 +
25690 +	/* spew out some debug info */
25691 +	dump_subtasks(t);
25692 +
25693 +	return 0;
25694 +}
25695 +
25696 +static void pfair_init_cluster(struct pfair_cluster* cluster)
25697 +{
25698 +	rt_domain_init(&cluster->pfair, pfair_ready_order, NULL, pfair_release_jobs);
25699 +	bheap_init(&cluster->release_queue);
25700 +	raw_spin_lock_init(&cluster->release_lock);
25701 +	INIT_LIST_HEAD(&cluster->topology.cpus);
25702 +}
25703 +
25704 +static void cleanup_clusters(void)
25705 +{
25706 +	int i;
25707 +
25708 +	if (num_pfair_clusters)
25709 +		kfree(pfair_clusters);
25710 +	pfair_clusters = NULL;
25711 +	num_pfair_clusters = 0;
25712 +
25713 +	/* avoid stale pointers */
25714 +	for (i = 0; i < num_online_cpus(); i++) {
25715 +		pstate[i]->topology.cluster = NULL;
25716 +		printk("P%d missed %u updates and %u quanta.\n", cpu_id(pstate[i]),
25717 +		       pstate[i]->missed_updates, pstate[i]->missed_quanta);
25718 +	}
25719 +}
25720 +
25721 +static long pfair_activate_plugin(void)
25722 +{
25723 +	int err, i;
25724 +	struct pfair_state* state;
25725 +	struct pfair_cluster* cluster ;
25726 +	quanta_t now;
25727 +	int cluster_size;
25728 +	struct cluster_cpu* cpus[NR_CPUS];
25729 +	struct scheduling_cluster* clust[NR_CPUS];
25730 +
25731 +	cluster_size = get_cluster_size(pfair_cluster_level);
25732 +
25733 +	if (cluster_size <= 0 || num_online_cpus() % cluster_size != 0)
25734 +		return -EINVAL;
25735 +
25736 +	num_pfair_clusters = num_online_cpus() / cluster_size;
25737 +
25738 +	pfair_clusters = kzalloc(num_pfair_clusters * sizeof(struct pfair_cluster), GFP_ATOMIC);
25739 +	if (!pfair_clusters) {
25740 +		num_pfair_clusters = 0;
25741 +		printk(KERN_ERR "Could not allocate Pfair clusters!\n");
25742 +		return -ENOMEM;
25743 +	}
25744 +
25745 +	state = &__get_cpu_var(pfair_state);
25746 +	now = current_quantum(state);
25747 +	TRACE("Activating PFAIR at q=%lu\n", now);
25748 +
25749 +	for (i = 0; i < num_pfair_clusters; i++) {
25750 +		cluster = &pfair_clusters[i];
25751 +		pfair_init_cluster(cluster);
25752 +		cluster->pfair_time = now;
25753 +		clust[i] = &cluster->topology;
25754 +#ifdef CONFIG_RELEASE_MASTER
25755 +		cluster->pfair.release_master = atomic_read(&release_master_cpu);
25756 +#endif
25757 +	}
25758 +
25759 +	for (i = 0; i < num_online_cpus(); i++)  {
25760 +		state = &per_cpu(pfair_state, i);
25761 +		state->cur_tick   = now;
25762 +		state->local_tick = now;
25763 +		state->missed_quanta = 0;
25764 +		state->missed_updates = 0;
25765 +		state->offset     = cpu_stagger_offset(i);
25766 +		printk(KERN_ERR "cpus[%d] set; %d\n", i, num_online_cpus());
25767 +		cpus[i] = &state->topology;
25768 +	}
25769 +
25770 +	err = assign_cpus_to_clusters(pfair_cluster_level, clust, num_pfair_clusters,
25771 +				      cpus, num_online_cpus());
25772 +
25773 +	if (err < 0)
25774 +		cleanup_clusters();
25775 +
25776 +	return err;
25777 +}
25778 +
25779 +static long pfair_deactivate_plugin(void)
25780 +{
25781 +	cleanup_clusters();
25782 +	return 0;
25783 +}
25784 +
25785 +/*	Plugin object	*/
25786 +static struct sched_plugin pfair_plugin __cacheline_aligned_in_smp = {
25787 +	.plugin_name		= "PFAIR",
25788 +	.tick			= pfair_tick,
25789 +	.task_new		= pfair_task_new,
25790 +	.task_exit		= pfair_task_exit,
25791 +	.schedule		= pfair_schedule,
25792 +	.task_wake_up		= pfair_task_wake_up,
25793 +	.task_block		= pfair_task_block,
25794 +	.admit_task		= pfair_admit_task,
25795 +	.release_at		= pfair_release_at,
25796 +	.complete_job		= complete_job,
25797 +	.activate_plugin	= pfair_activate_plugin,
25798 +	.deactivate_plugin	= pfair_deactivate_plugin,
25799 +};
25800 +
25801 +
25802 +static struct proc_dir_entry *cluster_file = NULL, *pfair_dir = NULL;
25803 +
25804 +static int __init init_pfair(void)
25805 +{
25806 +	int cpu, err, fs;
25807 +	struct pfair_state *state;
25808 +
25809 +	/*
25810 +	 * initialize short_cut for per-cpu pfair state;
25811 +	 * there may be a problem here if someone removes a cpu
25812 +	 * while we are doing this initialization... and if cpus
25813 +	 * are added / removed later... but we don't support CPU hotplug atm anyway.
25814 +	 */
25815 +	pstate = kmalloc(sizeof(struct pfair_state*) * num_online_cpus(), GFP_KERNEL);
25816 +
25817 +	/* initialize CPU state */
25818 +	for (cpu = 0; cpu < num_online_cpus(); cpu++)  {
25819 +		state = &per_cpu(pfair_state, cpu);
25820 +		state->topology.id = cpu;
25821 +		state->cur_tick   = 0;
25822 +		state->local_tick = 0;
25823 +		state->linked     = NULL;
25824 +		state->local      = NULL;
25825 +		state->scheduled  = NULL;
25826 +		state->missed_quanta = 0;
25827 +		state->offset     = cpu_stagger_offset(cpu);
25828 +		pstate[cpu] = state;
25829 +	}
25830 +
25831 +	pfair_clusters = NULL;
25832 +	num_pfair_clusters = 0;
25833 +
25834 +	err = register_sched_plugin(&pfair_plugin);
25835 +	if (!err) {
25836 +		fs = make_plugin_proc_dir(&pfair_plugin, &pfair_dir);
25837 +		if (!fs)
25838 +			cluster_file = create_cluster_file(pfair_dir, &pfair_cluster_level);
25839 +		else
25840 +			printk(KERN_ERR "Could not allocate PFAIR procfs dir.\n");
25841 +	}
25842 +
25843 +	return err;
25844 +}
25845 +
25846 +static void __exit clean_pfair(void)
25847 +{
25848 +	kfree(pstate);
25849 +
25850 +	if (cluster_file)
25851 +		remove_proc_entry("cluster", pfair_dir);
25852 +	if (pfair_dir)
25853 +		remove_plugin_proc_dir(&pfair_plugin);
25854 +}
25855 +
25856 +module_init(init_pfair);
25857 +module_exit(clean_pfair);
25858 diff --git a/litmus/sched_plugin.c b/litmus/sched_plugin.c
25859 new file mode 100644
25860 index 0000000..245e41c
25861 --- /dev/null
25862 +++ b/litmus/sched_plugin.c
25863 @@ -0,0 +1,360 @@
25864 +/* sched_plugin.c -- core infrastructure for the scheduler plugin system
25865 + *
25866 + * This file includes the initialization of the plugin system, the no-op Linux
25867 + * scheduler plugin, some dummy functions, and some helper functions.
25868 + */
25869 +
25870 +#include <linux/list.h>
25871 +#include <linux/spinlock.h>
25872 +#include <linux/sched.h>
25873 +
25874 +#include <litmus/litmus.h>
25875 +#include <litmus/sched_plugin.h>
25876 +#include <litmus/preempt.h>
25877 +#include <litmus/jobs.h>
25878 +
25879 +#ifdef CONFIG_LITMUS_NVIDIA
25880 +#include <litmus/nvidia_info.h>
25881 +#endif
25882 +
25883 +/*
25884 + * Generic function to trigger preemption on either local or remote cpu
25885 + * from scheduler plugins. The key feature is that this function is
25886 + * non-preemptive section aware and does not invoke the scheduler / send
25887 + * IPIs if the to-be-preempted task is actually non-preemptive.
25888 + */
25889 +void preempt_if_preemptable(struct task_struct* t, int cpu)
25890 +{
25891 +	/* t is the real-time task executing on CPU on_cpu If t is NULL, then
25892 +	 * on_cpu is currently scheduling background work.
25893 +	 */
25894 +
25895 +	int reschedule = 0;
25896 +
25897 +	if (!t)
25898 +		/* move non-real-time task out of the way */
25899 +		reschedule = 1;
25900 +	else {
25901 +		if (smp_processor_id() == cpu) {
25902 +			/* local CPU case */
25903 +			/* check if we need to poke userspace */
25904 +			if (is_user_np(t))
25905 +				/* Yes, poke it. This doesn't have to be atomic since
25906 +				 * the task is definitely not executing. */
25907 +				request_exit_np(t);
25908 +			else if (!is_kernel_np(t))
25909 +				/* only if we are allowed to preempt the
25910 +				 * currently-executing task */
25911 +				reschedule = 1;
25912 +		} else {
25913 +			/* Remote CPU case.  Only notify if it's not a kernel
25914 +			 * NP section and if we didn't set the userspace
25915 +			 * flag. */
25916 +			reschedule = !(is_kernel_np(t) || request_exit_np_atomic(t));
25917 +		}
25918 +	}
25919 +	if (likely(reschedule))
25920 +		litmus_reschedule(cpu);
25921 +}
25922 +
25923 +
25924 +/*************************************************************
25925 + *                   Dummy plugin functions                  *
25926 + *************************************************************/
25927 +
25928 +static void litmus_dummy_finish_switch(struct task_struct * prev)
25929 +{
25930 +}
25931 +
25932 +static struct task_struct* litmus_dummy_schedule(struct task_struct * prev)
25933 +{
25934 +	sched_state_task_picked();
25935 +	return NULL;
25936 +}
25937 +
25938 +static void litmus_dummy_tick(struct task_struct* tsk)
25939 +{
25940 +}
25941 +
25942 +static long litmus_dummy_admit_task(struct task_struct* tsk)
25943 +{
25944 +	printk(KERN_CRIT "LITMUS^RT: Linux plugin rejects %s/%d.\n",
25945 +		tsk->comm, tsk->pid);
25946 +	return -EINVAL;
25947 +}
25948 +
25949 +static void litmus_dummy_task_new(struct task_struct *t, int on_rq, int running)
25950 +{
25951 +}
25952 +
25953 +static void litmus_dummy_task_wake_up(struct task_struct *task)
25954 +{
25955 +}
25956 +
25957 +static void litmus_dummy_task_block(struct task_struct *task)
25958 +{
25959 +}
25960 +
25961 +static void litmus_dummy_task_exit(struct task_struct *task)
25962 +{
25963 +}
25964 +
25965 +static long litmus_dummy_complete_job(void)
25966 +{
25967 +	return -ENOSYS;
25968 +}
25969 +
25970 +static long litmus_dummy_activate_plugin(void)
25971 +{
25972 +#ifdef CONFIG_LITMUS_NVIDIA
25973 +	shutdown_nvidia_info();
25974 +#endif
25975 +	return 0;
25976 +}
25977 +
25978 +static long litmus_dummy_deactivate_plugin(void)
25979 +{
25980 +	return 0;
25981 +}
25982 +
25983 +static int litmus_dummy_compare(struct task_struct* a, struct task_struct* b)
25984 +{
25985 +	TRACE_CUR("WARNING: Dummy compare function called!\n");
25986 +	return 0;
25987 +}
25988 +
25989 +#ifdef CONFIG_LITMUS_LOCKING
25990 +static long litmus_dummy_allocate_lock(struct litmus_lock **lock, int type,
25991 +				       void* __user config)
25992 +{
25993 +	return -ENXIO;
25994 +}
25995 +
25996 +static void litmus_dummy_increase_prio(struct task_struct* t, struct task_struct* prio_inh)
25997 +{
25998 +}
25999 +
26000 +static void litmus_dummy_decrease_prio(struct task_struct* t, struct task_struct* prio_inh)
26001 +{
26002 +}
26003 +#endif
26004 +
26005 +#ifdef CONFIG_LITMUS_SOFTIRQD
26006 +static void litmus_dummy_increase_prio_klitirq(struct task_struct* klitirqd,
26007 +                                       struct task_struct* old_owner,
26008 +                                       struct task_struct* new_owner)
26009 +{
26010 +}
26011 +
26012 +static void litmus_dummy_decrease_prio_klitirqd(struct task_struct* klitirqd,
26013 +                                                struct task_struct* old_owner)
26014 +{
26015 +}
26016 +#endif
26017 +
26018 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
26019 +static int litmus_dummy_enqueue_pai_tasklet(struct tasklet_struct* t)
26020 +{
26021 +	TRACE("%s: PAI Tasklet unsupported in this plugin!!!!!!\n", __FUNCTION__);
26022 +	return(0); // failure.
26023 +}
26024 +
26025 +static void litmus_dummy_change_prio_pai_tasklet(struct task_struct *old_prio,
26026 +												 struct task_struct *new_prio)
26027 +{
26028 +	TRACE("%s: PAI Tasklet unsupported in this plugin!!!!!!\n", __FUNCTION__);
26029 +}
26030 +
26031 +static void litmus_dummy_run_tasklets(struct task_struct* t)
26032 +{
26033 +	//TRACE("%s: PAI Tasklet unsupported in this plugin!!!!!!\n", __FUNCTION__);
26034 +}
26035 +#endif
26036 +
26037 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
26038 +static void litmus_dummy_nested_increase_prio(struct task_struct* t, struct task_struct* prio_inh,
26039 +											raw_spinlock_t *to_unlock, unsigned long irqflags)
26040 +{
26041 +}
26042 +
26043 +static void litmus_dummy_nested_decrease_prio(struct task_struct* t, struct task_struct* prio_inh,
26044 +											raw_spinlock_t *to_unlock, unsigned long irqflags)
26045 +{
26046 +}
26047 +
26048 +static int litmus_dummy___compare(struct task_struct* a, comparison_mode_t a_mod,
26049 +								  struct task_struct* b, comparison_mode_t b_mode)
26050 +{
26051 +	TRACE_CUR("WARNING: Dummy compare function called!\n");
26052 +	return 0;
26053 +}
26054 +#endif
26055 +
26056 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
26057 +static raw_spinlock_t* litmus_dummy_get_dgl_spinlock(struct task_struct *t)
26058 +{
26059 +	return NULL;
26060 +}
26061 +#endif
26062 +
26063 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
26064 +static long litmus_dummy_allocate_aff_obs(struct affinity_observer **aff_obs,
26065 +									   int type,
26066 +									   void* __user config)
26067 +{
26068 +	return -ENXIO;
26069 +}
26070 +#endif
26071 +
26072 +
26073 +/* The default scheduler plugin. It doesn't do anything and lets Linux do its
26074 + * job.
26075 + */
26076 +struct sched_plugin linux_sched_plugin = {
26077 +	.plugin_name = "Linux",
26078 +	.tick = litmus_dummy_tick,
26079 +	.task_new   = litmus_dummy_task_new,
26080 +	.task_exit = litmus_dummy_task_exit,
26081 +	.task_wake_up = litmus_dummy_task_wake_up,
26082 +	.task_block = litmus_dummy_task_block,
26083 +	.complete_job = litmus_dummy_complete_job,
26084 +	.schedule = litmus_dummy_schedule,
26085 +	.finish_switch = litmus_dummy_finish_switch,
26086 +	.activate_plugin = litmus_dummy_activate_plugin,
26087 +	.deactivate_plugin = litmus_dummy_deactivate_plugin,
26088 +	.compare = litmus_dummy_compare,
26089 +#ifdef CONFIG_LITMUS_LOCKING
26090 +	.allocate_lock = litmus_dummy_allocate_lock,
26091 +	.increase_prio = litmus_dummy_increase_prio,
26092 +	.decrease_prio = litmus_dummy_decrease_prio,
26093 +#endif
26094 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
26095 +	.nested_increase_prio = litmus_dummy_nested_increase_prio,
26096 +	.nested_decrease_prio = litmus_dummy_nested_decrease_prio,
26097 +	.__compare = litmus_dummy___compare,
26098 +#endif
26099 +#ifdef CONFIG_LITMUS_SOFTIRQD
26100 +	.increase_prio_klitirqd = litmus_dummy_increase_prio_klitirqd,
26101 +	.decrease_prio_klitirqd = litmus_dummy_decrease_prio_klitirqd,
26102 +#endif
26103 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
26104 +	.enqueue_pai_tasklet = litmus_dummy_enqueue_pai_tasklet,
26105 +	.change_prio_pai_tasklet = litmus_dummy_change_prio_pai_tasklet,
26106 +	.run_tasklets = litmus_dummy_run_tasklets,
26107 +#endif
26108 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
26109 +	.get_dgl_spinlock = litmus_dummy_get_dgl_spinlock,
26110 +#endif
26111 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
26112 +	.allocate_aff_obs = litmus_dummy_allocate_aff_obs,
26113 +#endif
26114 +
26115 +	.admit_task = litmus_dummy_admit_task
26116 +};
26117 +
26118 +/*
26119 + *	The reference to current plugin that is used to schedule tasks within
26120 + *	the system. It stores references to actual function implementations
26121 + *	Should be initialized by calling "init_***_plugin()"
26122 + */
26123 +struct sched_plugin *litmus = &linux_sched_plugin;
26124 +
26125 +/* the list of registered scheduling plugins */
26126 +static LIST_HEAD(sched_plugins);
26127 +static DEFINE_RAW_SPINLOCK(sched_plugins_lock);
26128 +
26129 +#define CHECK(func) {\
26130 +	if (!plugin->func) \
26131 +		plugin->func = litmus_dummy_ ## func;}
26132 +
26133 +/* FIXME: get reference to module  */
26134 +int register_sched_plugin(struct sched_plugin* plugin)
26135 +{
26136 +	printk(KERN_INFO "Registering LITMUS^RT plugin %s.\n",
26137 +	       plugin->plugin_name);
26138 +
26139 +	/* make sure we don't trip over null pointers later */
26140 +	CHECK(finish_switch);
26141 +	CHECK(schedule);
26142 +	CHECK(tick);
26143 +	CHECK(task_wake_up);
26144 +	CHECK(task_exit);
26145 +	CHECK(task_block);
26146 +	CHECK(task_new);
26147 +	CHECK(complete_job);
26148 +	CHECK(activate_plugin);
26149 +	CHECK(deactivate_plugin);
26150 +	CHECK(compare);
26151 +#ifdef CONFIG_LITMUS_LOCKING
26152 +	CHECK(allocate_lock);
26153 +	CHECK(increase_prio);
26154 +	CHECK(decrease_prio);
26155 +#endif
26156 +#ifdef CONFIG_LITMUS_NESTED_LOCKING
26157 +	CHECK(nested_increase_prio);
26158 +	CHECK(nested_decrease_prio);
26159 +	CHECK(__compare);
26160 +#endif
26161 +#ifdef CONFIG_LITMUS_SOFTIRQD
26162 +	CHECK(increase_prio_klitirqd);
26163 +	CHECK(decrease_prio_klitirqd);
26164 +#endif
26165 +#ifdef CONFIG_LITMUS_PAI_SOFTIRQD
26166 +	CHECK(enqueue_pai_tasklet);
26167 +	CHECK(change_prio_pai_tasklet);
26168 +	CHECK(run_tasklets);
26169 +#endif
26170 +#ifdef CONFIG_LITMUS_DGL_SUPPORT
26171 +	CHECK(get_dgl_spinlock);
26172 +#endif
26173 +#ifdef CONFIG_LITMUS_AFFINITY_LOCKING
26174 +	CHECK(allocate_aff_obs);
26175 +#endif
26176 +	CHECK(admit_task);
26177 +
26178 +	if (!plugin->release_at)
26179 +		plugin->release_at = release_at;
26180 +
26181 +	raw_spin_lock(&sched_plugins_lock);
26182 +	list_add(&plugin->list, &sched_plugins);
26183 +	raw_spin_unlock(&sched_plugins_lock);
26184 +
26185 +	return 0;
26186 +}
26187 +
26188 +
26189 +/* FIXME: reference counting, etc. */
26190 +struct sched_plugin* find_sched_plugin(const char* name)
26191 +{
26192 +	struct list_head *pos;
26193 +	struct sched_plugin *plugin;
26194 +
26195 +	raw_spin_lock(&sched_plugins_lock);
26196 +	list_for_each(pos, &sched_plugins) {
26197 +		plugin = list_entry(pos, struct sched_plugin, list);
26198 +		if (!strcmp(plugin->plugin_name, name))
26199 +		    goto out_unlock;
26200 +	}
26201 +	plugin = NULL;
26202 +
26203 +out_unlock:
26204 +	raw_spin_unlock(&sched_plugins_lock);
26205 +	return plugin;
26206 +}
26207 +
26208 +int print_sched_plugins(char* buf, int max)
26209 +{
26210 +	int count = 0;
26211 +	struct list_head *pos;
26212 +	struct sched_plugin *plugin;
26213 +
26214 +	raw_spin_lock(&sched_plugins_lock);
26215 +	list_for_each(pos, &sched_plugins) {
26216 +		plugin = list_entry(pos, struct sched_plugin, list);
26217 +		count += snprintf(buf + count, max - count, "%s\n", plugin->plugin_name);
26218 +		if (max - count <= 0)
26219 +			break;
26220 +	}
26221 +	raw_spin_unlock(&sched_plugins_lock);
26222 +	return 	count;
26223 +}
26224 diff --git a/litmus/sched_psn_edf.c b/litmus/sched_psn_edf.c
26225 new file mode 100644
26226 index 0000000..8e4a22d
26227 --- /dev/null
26228 +++ b/litmus/sched_psn_edf.c
26229 @@ -0,0 +1,645 @@
26230 +/*
26231 + * kernel/sched_psn_edf.c
26232 + *
26233 + * Implementation of the PSN-EDF scheduler plugin.
26234 + * Based on kern/sched_part_edf.c and kern/sched_gsn_edf.c.
26235 + *
26236 + * Suspensions and non-preemptable sections are supported.
26237 + * Priority inheritance is not supported.
26238 + */
26239 +
26240 +#include <linux/percpu.h>
26241 +#include <linux/sched.h>
26242 +#include <linux/list.h>
26243 +#include <linux/spinlock.h>
26244 +#include <linux/module.h>
26245 +
26246 +#include <litmus/litmus.h>
26247 +#include <litmus/jobs.h>
26248 +#include <litmus/preempt.h>
26249 +#include <litmus/sched_plugin.h>
26250 +#include <litmus/edf_common.h>
26251 +#include <litmus/sched_trace.h>
26252 +#include <litmus/trace.h>
26253 +
26254 +typedef struct {
26255 +	rt_domain_t 		domain;
26256 +	int          		cpu;
26257 +	struct task_struct* 	scheduled; /* only RT tasks */
26258 +/*
26259 + * scheduling lock slock
26260 + * protects the domain and serializes scheduling decisions
26261 + */
26262 +#define slock domain.ready_lock
26263 +
26264 +} psnedf_domain_t;
26265 +
26266 +DEFINE_PER_CPU(psnedf_domain_t, psnedf_domains);
26267 +
26268 +#define local_edf		(&__get_cpu_var(psnedf_domains).domain)
26269 +#define local_pedf		(&__get_cpu_var(psnedf_domains))
26270 +#define remote_edf(cpu)		(&per_cpu(psnedf_domains, cpu).domain)
26271 +#define remote_pedf(cpu)	(&per_cpu(psnedf_domains, cpu))
26272 +#define task_edf(task)		remote_edf(get_partition(task))
26273 +#define task_pedf(task)		remote_pedf(get_partition(task))
26274 +
26275 +
26276 +static void psnedf_domain_init(psnedf_domain_t* pedf,
26277 +			       check_resched_needed_t check,
26278 +			       release_jobs_t release,
26279 +			       int cpu)
26280 +{
26281 +	edf_domain_init(&pedf->domain, check, release);
26282 +	pedf->cpu      		= cpu;
26283 +	pedf->scheduled		= NULL;
26284 +}
26285 +
26286 +static void requeue(struct task_struct* t, rt_domain_t *edf)
26287 +{
26288 +	if (t->state != TASK_RUNNING)
26289 +		TRACE_TASK(t, "requeue: !TASK_RUNNING\n");
26290 +
26291 +	set_rt_flags(t, RT_F_RUNNING);
26292 +	if (is_released(t, litmus_clock()))
26293 +		__add_ready(edf, t);
26294 +	else
26295 +		add_release(edf, t); /* it has got to wait */
26296 +}
26297 +
26298 +/* we assume the lock is being held */
26299 +static void preempt(psnedf_domain_t *pedf)
26300 +{
26301 +	preempt_if_preemptable(pedf->scheduled, pedf->cpu);
26302 +}
26303 +
26304 +#ifdef CONFIG_LITMUS_LOCKING
26305 +
26306 +static void boost_priority(struct task_struct* t)
26307 +{
26308 +	unsigned long		flags;
26309 +	psnedf_domain_t* 	pedf = task_pedf(t);
26310 +	lt_t			now;
26311 +
26312 +	raw_spin_lock_irqsave(&pedf->slock, flags);
26313 +	now = litmus_clock();
26314 +
26315 +	TRACE_TASK(t, "priority boosted at %llu\n", now);
26316 +
26317 +	tsk_rt(t)->priority_boosted = 1;
26318 +	tsk_rt(t)->boost_start_time = now;
26319 +
26320 +	if (pedf->scheduled != t) {
26321 +		/* holder may be queued: first stop queue changes */
26322 +		raw_spin_lock(&pedf->domain.release_lock);
26323 +		if (is_queued(t) &&
26324 +		    /* If it is queued, then we need to re-order. */
26325 +		    bheap_decrease(edf_ready_order, tsk_rt(t)->heap_node) &&
26326 +		    /* If we bubbled to the top, then we need to check for preemptions. */
26327 +		    edf_preemption_needed(&pedf->domain, pedf->scheduled))
26328 +				preempt(pedf);
26329 +		raw_spin_unlock(&pedf->domain.release_lock);
26330 +	} /* else: nothing to do since the job is not queued while scheduled */
26331 +
26332 +	raw_spin_unlock_irqrestore(&pedf->slock, flags);
26333 +}
26334 +
26335 +static void unboost_priority(struct task_struct* t)
26336 +{
26337 +	unsigned long		flags;
26338 +	psnedf_domain_t* 	pedf = task_pedf(t);
26339 +	lt_t			now;
26340 +
26341 +	raw_spin_lock_irqsave(&pedf->slock, flags);
26342 +	now = litmus_clock();
26343 +
26344 +	/* assumption: this only happens when the job is scheduled */
26345 +	BUG_ON(pedf->scheduled != t);
26346 +
26347 +	TRACE_TASK(t, "priority restored at %llu\n", now);
26348 +
26349 +	/* priority boosted jobs must be scheduled */
26350 +	BUG_ON(pedf->scheduled != t);
26351 +
26352 +	tsk_rt(t)->priority_boosted = 0;
26353 +	tsk_rt(t)->boost_start_time = 0;
26354 +
26355 +	/* check if this changes anything */
26356 +	if (edf_preemption_needed(&pedf->domain, pedf->scheduled))
26357 +		preempt(pedf);
26358 +
26359 +	raw_spin_unlock_irqrestore(&pedf->slock, flags);
26360 +}
26361 +
26362 +#endif
26363 +
26364 +/* This check is trivial in partioned systems as we only have to consider
26365 + * the CPU of the partition.
26366 + */
26367 +static int psnedf_check_resched(rt_domain_t *edf)
26368 +{
26369 +	psnedf_domain_t *pedf = container_of(edf, psnedf_domain_t, domain);
26370 +
26371 +	/* because this is a callback from rt_domain_t we already hold
26372 +	 * the necessary lock for the ready queue
26373 +	 */
26374 +	if (edf_preemption_needed(edf, pedf->scheduled)) {
26375 +		preempt(pedf);
26376 +		return 1;
26377 +	} else
26378 +		return 0;
26379 +}
26380 +
26381 +static void job_completion(struct task_struct* t, int forced)
26382 +{
26383 +	sched_trace_task_completion(t,forced);
26384 +	TRACE_TASK(t, "job_completion().\n");
26385 +
26386 +	set_rt_flags(t, RT_F_SLEEP);
26387 +	prepare_for_next_period(t);
26388 +}
26389 +
26390 +static void psnedf_tick(struct task_struct *t)
26391 +{
26392 +	psnedf_domain_t *pedf = local_pedf;
26393 +
26394 +	/* Check for inconsistency. We don't need the lock for this since
26395 +	 * ->scheduled is only changed in schedule, which obviously is not
26396 +	 *  executing in parallel on this CPU
26397 +	 */
26398 +	BUG_ON(is_realtime(t) && t != pedf->scheduled);
26399 +
26400 +	if (is_realtime(t) && budget_enforced(t) && budget_exhausted(t)) {
26401 +		if (!is_np(t)) {
26402 +			litmus_reschedule_local();
26403 +			TRACE("psnedf_scheduler_tick: "
26404 +			      "%d is preemptable "
26405 +			      " => FORCE_RESCHED\n", t->pid);
26406 +		} else if (is_user_np(t)) {
26407 +			TRACE("psnedf_scheduler_tick: "
26408 +			      "%d is non-preemptable, "
26409 +			      "preemption delayed.\n", t->pid);
26410 +			request_exit_np(t);
26411 +		}
26412 +	}
26413 +}
26414 +
26415 +static struct task_struct* psnedf_schedule(struct task_struct * prev)
26416 +{
26417 +	psnedf_domain_t* 	pedf = local_pedf;
26418 +	rt_domain_t*		edf  = &pedf->domain;
26419 +	struct task_struct*	next;
26420 +
26421 +	int 			out_of_time, sleep, preempt,
26422 +				np, exists, blocks, resched;
26423 +
26424 +	raw_spin_lock(&pedf->slock);
26425 +
26426 +	/* sanity checking
26427 +	 * differently from gedf, when a task exits (dead)
26428 +	 * pedf->schedule may be null and prev _is_ realtime
26429 +	 */
26430 +	BUG_ON(pedf->scheduled && pedf->scheduled != prev);
26431 +	BUG_ON(pedf->scheduled && !is_realtime(prev));
26432 +
26433 +	/* (0) Determine state */
26434 +	exists      = pedf->scheduled != NULL;
26435 +	blocks      = exists && !is_running(pedf->scheduled);
26436 +	out_of_time = exists &&
26437 +				  budget_enforced(pedf->scheduled) &&
26438 +				  budget_exhausted(pedf->scheduled);
26439 +	np 	    = exists && is_np(pedf->scheduled);
26440 +	sleep	    = exists && get_rt_flags(pedf->scheduled) == RT_F_SLEEP;
26441 +	preempt     = edf_preemption_needed(edf, prev);
26442 +
26443 +	/* If we need to preempt do so.
26444 +	 * The following checks set resched to 1 in case of special
26445 +	 * circumstances.
26446 +	 */
26447 +	resched = preempt;
26448 +
26449 +	/* If a task blocks we have no choice but to reschedule.
26450 +	 */
26451 +	if (blocks)
26452 +		resched = 1;
26453 +
26454 +	/* Request a sys_exit_np() call if we would like to preempt but cannot.
26455 +	 * Multiple calls to request_exit_np() don't hurt.
26456 +	 */
26457 +	if (np && (out_of_time || preempt || sleep))
26458 +		request_exit_np(pedf->scheduled);
26459 +
26460 +	/* Any task that is preemptable and either exhausts its execution
26461 +	 * budget or wants to sleep completes. We may have to reschedule after
26462 +	 * this.
26463 +	 */
26464 +	if (!np && (out_of_time || sleep) && !blocks) {
26465 +		job_completion(pedf->scheduled, !sleep);
26466 +		resched = 1;
26467 +	}
26468 +
26469 +	/* The final scheduling decision. Do we need to switch for some reason?
26470 +	 * Switch if we are in RT mode and have no task or if we need to
26471 +	 * resched.
26472 +	 */
26473 +	next = NULL;
26474 +	if ((!np || blocks) && (resched || !exists)) {
26475 +		/* When preempting a task that does not block, then
26476 +		 * re-insert it into either the ready queue or the
26477 +		 * release queue (if it completed). requeue() picks
26478 +		 * the appropriate queue.
26479 +		 */
26480 +		if (pedf->scheduled && !blocks)
26481 +			requeue(pedf->scheduled, edf);
26482 +		next = __take_ready(edf);
26483 +	} else
26484 +		/* Only override Linux scheduler if we have a real-time task
26485 +		 * scheduled that needs to continue.
26486 +		 */
26487 +		if (exists)
26488 +			next = prev;
26489 +
26490 +	if (next) {
26491 +		TRACE_TASK(next, "scheduled at %llu\n", litmus_clock());
26492 +		set_rt_flags(next, RT_F_RUNNING);
26493 +	} else {
26494 +		TRACE("becoming idle at %llu\n", litmus_clock());
26495 +	}
26496 +
26497 +	pedf->scheduled = next;
26498 +	sched_state_task_picked();
26499 +	raw_spin_unlock(&pedf->slock);
26500 +
26501 +	return next;
26502 +}
26503 +
26504 +
26505 +/*	Prepare a task for running in RT mode
26506 + */
26507 +static void psnedf_task_new(struct task_struct * t, int on_rq, int running)
26508 +{
26509 +	rt_domain_t* 		edf  = task_edf(t);
26510 +	psnedf_domain_t* 	pedf = task_pedf(t);
26511 +	unsigned long		flags;
26512 +
26513 +	TRACE_TASK(t, "psn edf: task new, cpu = %d\n",
26514 +		   t->rt_param.task_params.cpu);
26515 +
26516 +	/* setup job parameters */
26517 +	release_at(t, litmus_clock());
26518 +
26519 +	/* The task should be running in the queue, otherwise signal
26520 +	 * code will try to wake it up with fatal consequences.
26521 +	 */
26522 +	raw_spin_lock_irqsave(&pedf->slock, flags);
26523 +	if (running) {
26524 +		/* there shouldn't be anything else running at the time */
26525 +		BUG_ON(pedf->scheduled);
26526 +		pedf->scheduled = t;
26527 +	} else {
26528 +		requeue(t, edf);
26529 +		/* maybe we have to reschedule */
26530 +		preempt(pedf);
26531 +	}
26532 +	raw_spin_unlock_irqrestore(&pedf->slock, flags);
26533 +}
26534 +
26535 +static void psnedf_task_wake_up(struct task_struct *task)
26536 +{
26537 +	unsigned long		flags;
26538 +	psnedf_domain_t* 	pedf = task_pedf(task);
26539 +	rt_domain_t* 		edf  = task_edf(task);
26540 +	lt_t			now;
26541 +
26542 +	TRACE_TASK(task, "wake_up at %llu\n", litmus_clock());
26543 +	raw_spin_lock_irqsave(&pedf->slock, flags);
26544 +	BUG_ON(is_queued(task));
26545 +	now = litmus_clock();
26546 +	if (is_tardy(task, now)
26547 +#ifdef CONFIG_LITMUS_LOCKING
26548 +	/* We need to take suspensions because of semaphores into
26549 +	 * account! If a job resumes after being suspended due to acquiring
26550 +	 * a semaphore, it should never be treated as a new job release.
26551 +	 */
26552 +	    && !is_priority_boosted(task)
26553 +#endif
26554 +		) {
26555 +		/* new sporadic release */
26556 +		release_at(task, now);
26557 +		sched_trace_task_release(task);
26558 +	}
26559 +
26560 +	/* Only add to ready queue if it is not the currently-scheduled
26561 +	 * task. This could be the case if a task was woken up concurrently
26562 +	 * on a remote CPU before the executing CPU got around to actually
26563 +	 * de-scheduling the task, i.e., wake_up() raced with schedule()
26564 +	 * and won.
26565 +	 */
26566 +	if (pedf->scheduled != task)
26567 +		requeue(task, edf);
26568 +
26569 +	raw_spin_unlock_irqrestore(&pedf->slock, flags);
26570 +	TRACE_TASK(task, "wake up done\n");
26571 +}
26572 +
26573 +static void psnedf_task_block(struct task_struct *t)
26574 +{
26575 +	/* only running tasks can block, thus t is in no queue */
26576 +	TRACE_TASK(t, "block at %llu, state=%d\n", litmus_clock(), t->state);
26577 +
26578 +	BUG_ON(!is_realtime(t));
26579 +	BUG_ON(is_queued(t));
26580 +}
26581 +
26582 +static void psnedf_task_exit(struct task_struct * t)
26583 +{
26584 +	unsigned long flags;
26585 +	psnedf_domain_t* 	pedf = task_pedf(t);
26586 +	rt_domain_t*		edf;
26587 +
26588 +	raw_spin_lock_irqsave(&pedf->slock, flags);
26589 +	if (is_queued(t)) {
26590 +		/* dequeue */
26591 +		edf  = task_edf(t);
26592 +		remove(edf, t);
26593 +	}
26594 +	if (pedf->scheduled == t)
26595 +		pedf->scheduled = NULL;
26596 +
26597 +	TRACE_TASK(t, "RIP, now reschedule\n");
26598 +
26599 +	preempt(pedf);
26600 +	raw_spin_unlock_irqrestore(&pedf->slock, flags);
26601 +}
26602 +
26603 +#ifdef CONFIG_LITMUS_LOCKING
26604 +
26605 +#include <litmus/fdso.h>
26606 +#include <litmus/srp.h>
26607 +
26608 +/* ******************** SRP support ************************ */
26609 +
26610 +static unsigned int psnedf_get_srp_prio(struct task_struct* t)
26611 +{
26612 +	/* assumes implicit deadlines */
26613 +	return get_rt_period(t);
26614 +}
26615 +
26616 +/* ******************** FMLP support ********************** */
26617 +
26618 +/* struct for semaphore with priority inheritance */
26619 +struct fmlp_semaphore {
26620 +	struct litmus_lock litmus_lock;
26621 +
26622 +	/* current resource holder */
26623 +	struct task_struct *owner;
26624 +
26625 +	/* FIFO queue of waiting tasks */
26626 +	wait_queue_head_t wait;
26627 +};
26628 +
26629 +static inline struct fmlp_semaphore* fmlp_from_lock(struct litmus_lock* lock)
26630 +{
26631 +	return container_of(lock, struct fmlp_semaphore, litmus_lock);
26632 +}
26633 +int psnedf_fmlp_lock(struct litmus_lock* l)
26634 +{
26635 +	struct task_struct* t = current;
26636 +	struct fmlp_semaphore *sem = fmlp_from_lock(l);
26637 +	wait_queue_t wait;
26638 +	unsigned long flags;
26639 +
26640 +	if (!is_realtime(t))
26641 +		return -EPERM;
26642 +
26643 +	spin_lock_irqsave(&sem->wait.lock, flags);
26644 +
26645 +	if (sem->owner) {
26646 +		/* resource is not free => must suspend and wait */
26647 +
26648 +		init_waitqueue_entry(&wait, t);
26649 +
26650 +		/* FIXME: interruptible would be nice some day */
26651 +		set_task_state(t, TASK_UNINTERRUPTIBLE);
26652 +
26653 +		__add_wait_queue_tail_exclusive(&sem->wait, &wait);
26654 +
26655 +		TS_LOCK_SUSPEND;
26656 +
26657 +		/* release lock before sleeping */
26658 +		spin_unlock_irqrestore(&sem->wait.lock, flags);
26659 +
26660 +		/* We depend on the FIFO order.  Thus, we don't need to recheck
26661 +		 * when we wake up; we are guaranteed to have the lock since
26662 +		 * there is only one wake up per release.
26663 +		 */
26664 +
26665 +		schedule();
26666 +
26667 +		TS_LOCK_RESUME;
26668 +
26669 +		/* Since we hold the lock, no other task will change
26670 +		 * ->owner. We can thus check it without acquiring the spin
26671 +		 * lock. */
26672 +		BUG_ON(sem->owner != t);
26673 +	} else {
26674 +		/* it's ours now */
26675 +		sem->owner = t;
26676 +
26677 +		/* mark the task as priority-boosted. */
26678 +		boost_priority(t);
26679 +
26680 +		spin_unlock_irqrestore(&sem->wait.lock, flags);
26681 +	}
26682 +
26683 +	return 0;
26684 +}
26685 +
26686 +int psnedf_fmlp_unlock(struct litmus_lock* l)
26687 +{
26688 +	struct task_struct *t = current, *next;
26689 +	struct fmlp_semaphore *sem = fmlp_from_lock(l);
26690 +	unsigned long flags;
26691 +	int err = 0;
26692 +
26693 +	spin_lock_irqsave(&sem->wait.lock, flags);
26694 +
26695 +	if (sem->owner != t) {
26696 +		err = -EINVAL;
26697 +		goto out;
26698 +	}
26699 +
26700 +	/* we lose the benefit of priority boosting */
26701 +
26702 +	unboost_priority(t);
26703 +
26704 +	/* check if there are jobs waiting for this resource */
26705 +	next = __waitqueue_remove_first(&sem->wait);
26706 +	if (next) {
26707 +		/* boost next job */
26708 +		boost_priority(next);
26709 +
26710 +		/* next becomes the resouce holder */
26711 +		sem->owner = next;
26712 +
26713 +		/* wake up next */
26714 +		wake_up_process(next);
26715 +	} else
26716 +		/* resource becomes available */
26717 +		sem->owner = NULL;
26718 +
26719 +out:
26720 +	spin_unlock_irqrestore(&sem->wait.lock, flags);
26721 +	return err;
26722 +}
26723 +
26724 +int psnedf_fmlp_close(struct litmus_lock* l)
26725 +{
26726 +	struct task_struct *t = current;
26727 +	struct fmlp_semaphore *sem = fmlp_from_lock(l);
26728 +	unsigned long flags;
26729 +
26730 +	int owner;
26731 +
26732 +	spin_lock_irqsave(&sem->wait.lock, flags);
26733 +
26734 +	owner = sem->owner == t;
26735 +
26736 +	spin_unlock_irqrestore(&sem->wait.lock, flags);
26737 +
26738 +	if (owner)
26739 +		psnedf_fmlp_unlock(l);
26740 +
26741 +	return 0;
26742 +}
26743 +
26744 +void psnedf_fmlp_free(struct litmus_lock* lock)
26745 +{
26746 +	kfree(fmlp_from_lock(lock));
26747 +}
26748 +
26749 +static struct litmus_lock_ops psnedf_fmlp_lock_ops = {
26750 +	.close  = psnedf_fmlp_close,
26751 +	.lock   = psnedf_fmlp_lock,
26752 +	.unlock = psnedf_fmlp_unlock,
26753 +	.deallocate = psnedf_fmlp_free,
26754 +};
26755 +
26756 +static struct litmus_lock* psnedf_new_fmlp(void)
26757 +{
26758 +	struct fmlp_semaphore* sem;
26759 +
26760 +	sem = kmalloc(sizeof(*sem), GFP_KERNEL);
26761 +	if (!sem)
26762 +		return NULL;
26763 +
26764 +	sem->owner   = NULL;
26765 +	init_waitqueue_head(&sem->wait);
26766 +	sem->litmus_lock.ops = &psnedf_fmlp_lock_ops;
26767 +
26768 +	return &sem->litmus_lock;
26769 +}
26770 +
26771 +/* **** lock constructor **** */
26772 +
26773 +
26774 +static long psnedf_allocate_lock(struct litmus_lock **lock, int type,
26775 +				 void* __user unused)
26776 +{
26777 +	int err = -ENXIO;
26778 +	struct srp_semaphore* srp;
26779 +
26780 +	/* PSN-EDF currently supports the SRP for local resources and the FMLP
26781 +	 * for global resources. */
26782 +	switch (type) {
26783 +	case FMLP_SEM:
26784 +		/* Flexible Multiprocessor Locking Protocol */
26785 +		*lock = psnedf_new_fmlp();
26786 +		if (*lock)
26787 +			err = 0;
26788 +		else
26789 +			err = -ENOMEM;
26790 +		break;
26791 +
26792 +	case SRP_SEM:
26793 +		/* Baker's Stack Resource Policy */
26794 +		srp = allocate_srp_semaphore();
26795 +		if (srp) {
26796 +			*lock = &srp->litmus_lock;
26797 +			err = 0;
26798 +		} else
26799 +			err = -ENOMEM;
26800 +		break;
26801 +	};
26802 +
26803 +	return err;
26804 +}
26805 +
26806 +#endif
26807 +
26808 +
26809 +static long psnedf_activate_plugin(void)
26810 +{
26811 +#ifdef CONFIG_RELEASE_MASTER
26812 +	int cpu;
26813 +
26814 +	for_each_online_cpu(cpu) {
26815 +		remote_edf(cpu)->release_master = atomic_read(&release_master_cpu);
26816 +	}
26817 +#endif
26818 +
26819 +#ifdef CONFIG_LITMUS_LOCKING
26820 +	get_srp_prio = psnedf_get_srp_prio;
26821 +#endif
26822 +
26823 +	return 0;
26824 +}
26825 +
26826 +static long psnedf_admit_task(struct task_struct* tsk)
26827 +{
26828 +	if (task_cpu(tsk) == tsk->rt_param.task_params.cpu
26829 +#ifdef CONFIG_RELEASE_MASTER
26830 +	    /* don't allow tasks on release master CPU */
26831 +	     && task_cpu(tsk) != remote_edf(task_cpu(tsk))->release_master
26832 +#endif
26833 +		)
26834 +		return 0;
26835 +	else
26836 +		return -EINVAL;
26837 +}
26838 +
26839 +/*	Plugin object	*/
26840 +static struct sched_plugin psn_edf_plugin __cacheline_aligned_in_smp = {
26841 +	.plugin_name		= "PSN-EDF",
26842 +	.tick			= psnedf_tick,
26843 +	.task_new		= psnedf_task_new,
26844 +	.complete_job		= complete_job,
26845 +	.task_exit		= psnedf_task_exit,
26846 +	.schedule		= psnedf_schedule,
26847 +	.task_wake_up		= psnedf_task_wake_up,
26848 +	.task_block		= psnedf_task_block,
26849 +	.admit_task		= psnedf_admit_task,
26850 +	.activate_plugin	= psnedf_activate_plugin,
26851 +#ifdef CONFIG_LITMUS_LOCKING
26852 +	.allocate_lock		= psnedf_allocate_lock,
26853 +#endif
26854 +};
26855 +
26856 +
26857 +static int __init init_psn_edf(void)
26858 +{
26859 +	int i;
26860 +
26861 +	/* We do not really want to support cpu hotplug, do we? ;)
26862 +	 * However, if we are so crazy to do so,
26863 +	 * we cannot use num_online_cpu()
26864 +	 */
26865 +	for (i = 0; i < num_online_cpus(); i++) {
26866 +		psnedf_domain_init(remote_pedf(i),
26867 +				   psnedf_check_resched,
26868 +				   NULL, i);
26869 +	}
26870 +	return register_sched_plugin(&psn_edf_plugin);
26871 +}
26872 +
26873 +module_init(init_psn_edf);
26874 +
26875 diff --git a/litmus/sched_task_trace.c b/litmus/sched_task_trace.c
26876 new file mode 100644
26877 index 0000000..f7f5753
26878 --- /dev/null
26879 +++ b/litmus/sched_task_trace.c
26880 @@ -0,0 +1,509 @@
26881 +/*
26882 + * sched_task_trace.c -- record scheduling events to a byte stream
26883 + */
26884 +
26885 +#define NO_TASK_TRACE_DECLS
26886 +
26887 +#include <linux/module.h>
26888 +#include <linux/sched.h>
26889 +#include <linux/percpu.h>
26890 +#include <linux/hardirq.h>
26891 +
26892 +#include <litmus/ftdev.h>
26893 +#include <litmus/litmus.h>
26894 +
26895 +#include <litmus/sched_trace.h>
26896 +#include <litmus/feather_trace.h>
26897 +#include <litmus/ftdev.h>
26898 +
26899 +
26900 +#define NUM_EVENTS		(1 << (CONFIG_SCHED_TASK_TRACE_SHIFT+11))
26901 +
26902 +#define now() litmus_clock()
26903 +
26904 +struct local_buffer {
26905 +	struct st_event_record record[NUM_EVENTS];
26906 +	char   flag[NUM_EVENTS];
26907 +	struct ft_buffer ftbuf;
26908 +};
26909 +
26910 +DEFINE_PER_CPU(struct local_buffer, st_event_buffer);
26911 +
26912 +static struct ftdev st_dev;
26913 +
26914 +static int st_dev_can_open(struct ftdev *dev, unsigned int cpu)
26915 +{
26916 +	return cpu_online(cpu) ? 0 : -ENODEV;
26917 +}
26918 +
26919 +static int __init init_sched_task_trace(void)
26920 +{
26921 +	struct local_buffer* buf;
26922 +	int i, ok = 0, err;
26923 +	printk("Allocated %u sched_trace_xxx() events per CPU "
26924 +	       "(buffer size: %d bytes)\n",
26925 +	       NUM_EVENTS, (int) sizeof(struct local_buffer));
26926 +
26927 +	err = ftdev_init(&st_dev, THIS_MODULE,
26928 +			num_online_cpus(), "sched_trace");
26929 +	if (err)
26930 +		goto err_out;
26931 +
26932 +	for (i = 0; i < st_dev.minor_cnt; i++) {
26933 +		buf = &per_cpu(st_event_buffer, i);
26934 +		ok += init_ft_buffer(&buf->ftbuf, NUM_EVENTS,
26935 +				     sizeof(struct st_event_record),
26936 +				     buf->flag,
26937 +				     buf->record);
26938 +		st_dev.minor[i].buf = &buf->ftbuf;
26939 +	}
26940 +	if (ok == st_dev.minor_cnt) {
26941 +		st_dev.can_open = st_dev_can_open;
26942 +		err = register_ftdev(&st_dev);
26943 +		if (err)
26944 +			goto err_dealloc;
26945 +	} else {
26946 +		err = -EINVAL;
26947 +		goto err_dealloc;
26948 +	}
26949 +
26950 +	return 0;
26951 +
26952 +err_dealloc:
26953 +	ftdev_exit(&st_dev);
26954 +err_out:
26955 +	printk(KERN_WARNING "Could not register sched_trace module\n");
26956 +	return err;
26957 +}
26958 +
26959 +static void __exit exit_sched_task_trace(void)
26960 +{
26961 +	ftdev_exit(&st_dev);
26962 +}
26963 +
26964 +module_init(init_sched_task_trace);
26965 +module_exit(exit_sched_task_trace);
26966 +
26967 +
26968 +static inline struct st_event_record* get_record(u8 type, struct task_struct* t)
26969 +{
26970 +	struct st_event_record* rec = NULL;
26971 +	struct local_buffer* buf;
26972 +
26973 +	buf = &get_cpu_var(st_event_buffer);
26974 +	if (ft_buffer_start_write(&buf->ftbuf, (void**) &rec)) {
26975 +		rec->hdr.type = type;
26976 +		rec->hdr.cpu  = smp_processor_id();
26977 +		rec->hdr.pid  = t ? t->pid : 0;
26978 +		rec->hdr.job  = t ? t->rt_param.job_params.job_no : 0;
26979 +	} else {
26980 +		put_cpu_var(st_event_buffer);
26981 +	}
26982 +	/* rec will be NULL if it failed */
26983 +	return rec;
26984 +}
26985 +
26986 +static inline void put_record(struct st_event_record* rec)
26987 +{
26988 +	struct local_buffer* buf;
26989 +	buf = &__get_cpu_var(st_event_buffer);
26990 +	ft_buffer_finish_write(&buf->ftbuf, rec);
26991 +	put_cpu_var(st_event_buffer);
26992 +}
26993 +
26994 +feather_callback void do_sched_trace_task_name(unsigned long id, unsigned long _task)
26995 +{
26996 +	struct task_struct *t = (struct task_struct*) _task;
26997 +	struct st_event_record* rec = get_record(ST_NAME, t);
26998 +	int i;
26999 +	if (rec) {
27000 +		for (i = 0; i < min(TASK_COMM_LEN, ST_NAME_LEN); i++)
27001 +			rec->data.name.cmd[i] = t->comm[i];
27002 +		put_record(rec);
27003 +	}
27004 +}
27005 +
27006 +feather_callback void do_sched_trace_task_param(unsigned long id, unsigned long _task)
27007 +{
27008 +	struct task_struct *t = (struct task_struct*) _task;
27009 +	struct st_event_record* rec = get_record(ST_PARAM, t);
27010 +	if (rec) {
27011 +		rec->data.param.wcet      = get_exec_cost(t);
27012 +		rec->data.param.period    = get_rt_period(t);
27013 +		rec->data.param.phase     = get_rt_phase(t);
27014 +		rec->data.param.partition = get_partition(t);
27015 +		rec->data.param.class     = get_class(t);
27016 +		put_record(rec);
27017 +	}
27018 +}
27019 +
27020 +feather_callback void do_sched_trace_task_release(unsigned long id, unsigned long _task)
27021 +{
27022 +	struct task_struct *t = (struct task_struct*) _task;
27023 +	struct st_event_record* rec = get_record(ST_RELEASE, t);
27024 +	if (rec) {
27025 +		rec->data.release.release  = get_release(t);
27026 +		rec->data.release.deadline = get_deadline(t);
27027 +		put_record(rec);
27028 +	}
27029 +}
27030 +
27031 +/* skipped: st_assigned_data, we don't use it atm */
27032 +
27033 +feather_callback void do_sched_trace_task_switch_to(unsigned long id,
27034 +						    unsigned long _task)
27035 +{
27036 +	struct task_struct *t = (struct task_struct*) _task;
27037 +	struct st_event_record* rec;
27038 +	//if (is_realtime(t))  /* comment out to trace EVERYTHING */
27039 +	{
27040 +		rec = get_record(ST_SWITCH_TO, t);
27041 +		if (rec) {
27042 +			rec->data.switch_to.when      = now();
27043 +			rec->data.switch_to.exec_time = get_exec_time(t);
27044 +			put_record(rec);
27045 +		}
27046 +	}
27047 +}
27048 +
27049 +feather_callback void do_sched_trace_task_switch_away(unsigned long id,
27050 +						      unsigned long _task)
27051 +{
27052 +	struct task_struct *t = (struct task_struct*) _task;
27053 +	struct st_event_record* rec;
27054 +	//if (is_realtime(t))  /* comment out to trace EVERYTHING */
27055 +	{
27056 +		rec = get_record(ST_SWITCH_AWAY, t);
27057 +		if (rec) {
27058 +			rec->data.switch_away.when      = now();
27059 +			rec->data.switch_away.exec_time = get_exec_time(t);
27060 +			put_record(rec);
27061 +		}
27062 +	}
27063 +}
27064 +
27065 +feather_callback void do_sched_trace_task_completion(unsigned long id,
27066 +						     unsigned long _task,
27067 +						     unsigned long forced)
27068 +{
27069 +	struct task_struct *t = (struct task_struct*) _task;
27070 +	struct st_event_record* rec = get_record(ST_COMPLETION, t);
27071 +	if (rec) {
27072 +		rec->data.completion.when   = now();
27073 +		rec->data.completion.forced = forced;
27074 +#ifdef LITMUS_NVIDIA
27075 +		rec->data.completion.nv_int_count = (u16)atomic_read(&tsk_rt(t)->nv_int_count);
27076 +#endif
27077 +		put_record(rec);
27078 +	}
27079 +}
27080 +
27081 +feather_callback void do_sched_trace_task_block(unsigned long id,
27082 +						unsigned long _task)
27083 +{
27084 +	struct task_struct *t = (struct task_struct*) _task;
27085 +	struct st_event_record* rec = get_record(ST_BLOCK, t);
27086 +	if (rec) {
27087 +		rec->data.block.when      = now();
27088 +		put_record(rec);
27089 +	}
27090 +}
27091 +
27092 +feather_callback void do_sched_trace_task_resume(unsigned long id,
27093 +						 unsigned long _task)
27094 +{
27095 +	struct task_struct *t = (struct task_struct*) _task;
27096 +	struct st_event_record* rec = get_record(ST_RESUME, t);
27097 +	if (rec) {
27098 +		rec->data.resume.when      = now();
27099 +		put_record(rec);
27100 +	}
27101 +}
27102 +
27103 +feather_callback void do_sched_trace_sys_release(unsigned long id,
27104 +						 unsigned long _start)
27105 +{
27106 +	lt_t *start = (lt_t*) _start;
27107 +	struct st_event_record* rec = get_record(ST_SYS_RELEASE, NULL);
27108 +	if (rec) {
27109 +		rec->data.sys_release.when    = now();
27110 +		rec->data.sys_release.release = *start;
27111 +		put_record(rec);
27112 +	}
27113 +}
27114 +
27115 +feather_callback void do_sched_trace_action(unsigned long id,
27116 +					    unsigned long _task,
27117 +					    unsigned long action)
27118 +{
27119 +	struct task_struct *t = (struct task_struct*) _task;
27120 +	struct st_event_record* rec = get_record(ST_ACTION, t);
27121 +
27122 +	if (rec) {
27123 +		rec->data.action.when   = now();
27124 +		rec->data.action.action = action;
27125 +		put_record(rec);
27126 +	}
27127 +}
27128 +
27129 +
27130 +
27131 +
27132 +feather_callback void do_sched_trace_prediction_err(unsigned long id,
27133 +													unsigned long _task,
27134 +													unsigned long _distance,
27135 +													unsigned long _rel_err)
27136 +{
27137 +	struct task_struct *t = (struct task_struct*) _task;
27138 +	struct st_event_record *rec = get_record(ST_PREDICTION_ERR, t);
27139 +
27140 +	if (rec) {
27141 +		gpu_migration_dist_t* distance = (gpu_migration_dist_t*) _distance;
27142 +		fp_t* rel_err = (fp_t*) _rel_err;
27143 +
27144 +		rec->data.prediction_err.distance = *distance;
27145 +		rec->data.prediction_err.rel_err = rel_err->val;
27146 +		put_record(rec);
27147 +	}
27148 +}
27149 +
27150 +
27151 +feather_callback void do_sched_trace_migration(unsigned long id,
27152 +													unsigned long _task,
27153 +													unsigned long _mig_info)
27154 +{
27155 +	struct task_struct *t = (struct task_struct*) _task;
27156 +	struct st_event_record *rec = get_record(ST_MIGRATION, t);
27157 +
27158 +	if (rec) {
27159 +		struct migration_info* mig_info = (struct migration_info*) _mig_info;
27160 +
27161 +		rec->hdr.extra = mig_info->distance;
27162 +		rec->data.migration.observed = mig_info->observed;
27163 +		rec->data.migration.estimated = mig_info->estimated;
27164 +
27165 +		put_record(rec);
27166 +	}
27167 +}
27168 +
27169 +
27170 +
27171 +
27172 +
27173 +
27174 +
27175 +
27176 +
27177 +feather_callback void do_sched_trace_tasklet_release(unsigned long id,
27178 +												   unsigned long _owner)
27179 +{
27180 +	struct task_struct *t = (struct task_struct*) _owner;
27181 +	struct st_event_record *rec = get_record(ST_TASKLET_RELEASE, t);
27182 +
27183 +	if (rec) {
27184 +		rec->data.tasklet_release.when = now();
27185 +		put_record(rec);
27186 +	}
27187 +}
27188 +
27189 +
27190 +feather_callback void do_sched_trace_tasklet_begin(unsigned long id,
27191 +												   unsigned long _owner)
27192 +{
27193 +	struct task_struct *t = (struct task_struct*) _owner;
27194 +	struct st_event_record *rec = get_record(ST_TASKLET_BEGIN, t);
27195 +
27196 +	if (rec) {
27197 +		rec->data.tasklet_begin.when = now();
27198 +
27199 +		if(!in_interrupt())
27200 +			rec->data.tasklet_begin.exe_pid = current->pid;
27201 +		else
27202 +			rec->data.tasklet_begin.exe_pid = 0;
27203 +
27204 +		put_record(rec);
27205 +	}
27206 +}
27207 +EXPORT_SYMBOL(do_sched_trace_tasklet_begin);
27208 +
27209 +
27210 +feather_callback void do_sched_trace_tasklet_end(unsigned long id,
27211 +												 unsigned long _owner,
27212 +												 unsigned long _flushed)
27213 +{
27214 +	struct task_struct *t = (struct task_struct*) _owner;
27215 +	struct st_event_record *rec = get_record(ST_TASKLET_END, t);
27216 +
27217 +	if (rec) {
27218 +		rec->data.tasklet_end.when = now();
27219 +		rec->data.tasklet_end.flushed = _flushed;
27220 +
27221 +		if(!in_interrupt())
27222 +			rec->data.tasklet_end.exe_pid = current->pid;
27223 +		else
27224 +			rec->data.tasklet_end.exe_pid = 0;
27225 +
27226 +		put_record(rec);
27227 +	}
27228 +}
27229 +EXPORT_SYMBOL(do_sched_trace_tasklet_end);
27230 +
27231 +
27232 +feather_callback void do_sched_trace_work_release(unsigned long id,
27233 +													 unsigned long _owner)
27234 +{
27235 +	struct task_struct *t = (struct task_struct*) _owner;
27236 +	struct st_event_record *rec = get_record(ST_WORK_RELEASE, t);
27237 +
27238 +	if (rec) {
27239 +		rec->data.work_release.when = now();
27240 +		put_record(rec);
27241 +	}
27242 +}
27243 +
27244 +
27245 +feather_callback void do_sched_trace_work_begin(unsigned long id,
27246 +												unsigned long _owner,
27247 +												unsigned long _exe)
27248 +{
27249 +	struct task_struct *t = (struct task_struct*) _owner;
27250 +	struct st_event_record *rec = get_record(ST_WORK_BEGIN, t);
27251 +
27252 +	if (rec) {
27253 +		struct task_struct *exe = (struct task_struct*) _exe;
27254 +		rec->data.work_begin.exe_pid = exe->pid;
27255 +		rec->data.work_begin.when = now();
27256 +		put_record(rec);
27257 +	}
27258 +}
27259 +EXPORT_SYMBOL(do_sched_trace_work_begin);
27260 +
27261 +
27262 +feather_callback void do_sched_trace_work_end(unsigned long id,
27263 +											  unsigned long _owner,
27264 +											  unsigned long _exe,
27265 +											  unsigned long _flushed)
27266 +{
27267 +	struct task_struct *t = (struct task_struct*) _owner;
27268 +	struct st_event_record *rec = get_record(ST_WORK_END, t);
27269 +
27270 +	if (rec) {
27271 +		struct task_struct *exe = (struct task_struct*) _exe;
27272 +		rec->data.work_end.exe_pid = exe->pid;
27273 +		rec->data.work_end.flushed = _flushed;
27274 +		rec->data.work_end.when = now();
27275 +		put_record(rec);
27276 +	}
27277 +}
27278 +EXPORT_SYMBOL(do_sched_trace_work_end);
27279 +
27280 +
27281 +feather_callback void do_sched_trace_eff_prio_change(unsigned long id,
27282 +											  unsigned long _task,
27283 +											  unsigned long _inh)
27284 +{
27285 +	struct task_struct *t = (struct task_struct*) _task;
27286 +	struct st_event_record *rec = get_record(ST_EFF_PRIO_CHANGE, t);
27287 +
27288 +	if (rec) {
27289 +		struct task_struct *inh = (struct task_struct*) _inh;
27290 +		rec->data.effective_priority_change.when = now();
27291 +		rec->data.effective_priority_change.inh_pid = (inh != NULL) ?
27292 +			inh->pid :
27293 +			0xffff;
27294 +
27295 +		put_record(rec);
27296 +	}
27297 +}
27298 +
27299 +/* pray for no nesting of nv interrupts on same CPU... */
27300 +struct tracing_interrupt_map
27301 +{
27302 +	int active;
27303 +	int count;
27304 +	unsigned long data[128]; // assume nesting less than 128...
27305 +	unsigned long serial[128];
27306 +};
27307 +DEFINE_PER_CPU(struct tracing_interrupt_map, active_interrupt_tracing);
27308 +
27309 +
27310 +DEFINE_PER_CPU(u32, intCounter);
27311 +
27312 +feather_callback void do_sched_trace_nv_interrupt_begin(unsigned long id,
27313 +												unsigned long _device)
27314 +{
27315 +	struct st_event_record *rec;
27316 +	u32 serialNum;
27317 +
27318 +	{
27319 +		u32* serial;
27320 +		struct tracing_interrupt_map* int_map = &per_cpu(active_interrupt_tracing, smp_processor_id());
27321 +		if(!int_map->active == 0xcafebabe)
27322 +		{
27323 +			int_map->count++;
27324 +		}
27325 +		else
27326 +		{
27327 +			int_map->active = 0xcafebabe;
27328 +			int_map->count = 1;
27329 +		}
27330 +		//int_map->data[int_map->count-1] = _device;
27331 +
27332 +		serial = &per_cpu(intCounter, smp_processor_id());
27333 +		*serial += num_online_cpus();
27334 +		serialNum = *serial;
27335 +		int_map->serial[int_map->count-1] = serialNum;
27336 +	}
27337 +
27338 +	rec = get_record(ST_NV_INTERRUPT_BEGIN, NULL);
27339 +	if(rec) {
27340 +		u32 device = _device;
27341 +		rec->data.nv_interrupt_begin.when = now();
27342 +		rec->data.nv_interrupt_begin.device = device;
27343 +		rec->data.nv_interrupt_begin.serialNumber = serialNum;
27344 +		put_record(rec);
27345 +	}
27346 +}
27347 +EXPORT_SYMBOL(do_sched_trace_nv_interrupt_begin);
27348 +
27349 +/*
27350 +int is_interrupt_tracing_active(void)
27351 +{
27352 +	struct tracing_interrupt_map* int_map = &per_cpu(active_interrupt_tracing, smp_processor_id());
27353 +	if(int_map->active == 0xcafebabe)
27354 +		return 1;
27355 +	return 0;
27356 +}
27357 +*/
27358 +
27359 +feather_callback void do_sched_trace_nv_interrupt_end(unsigned long id, unsigned long _device)
27360 +{
27361 +	struct tracing_interrupt_map* int_map = &per_cpu(active_interrupt_tracing, smp_processor_id());
27362 +	if(int_map->active == 0xcafebabe)
27363 +	{
27364 +		struct st_event_record *rec = get_record(ST_NV_INTERRUPT_END, NULL);
27365 +
27366 +		int_map->count--;
27367 +		if(int_map->count == 0)
27368 +			int_map->active = 0;
27369 +
27370 +		if(rec) {
27371 +			u32 device = _device;
27372 +			rec->data.nv_interrupt_end.when = now();
27373 +			//rec->data.nv_interrupt_end.device = int_map->data[int_map->count];
27374 +			rec->data.nv_interrupt_end.device = device;
27375 +			rec->data.nv_interrupt_end.serialNumber = int_map->serial[int_map->count];
27376 +			put_record(rec);
27377 +		}
27378 +	}
27379 +}
27380 +EXPORT_SYMBOL(do_sched_trace_nv_interrupt_end);
27381 +
27382 +
27383 +
27384 +
27385 +
27386 +
27387 +
27388 +
27389 +
27390 diff --git a/litmus/sched_trace.c b/litmus/sched_trace.c
27391 new file mode 100644
27392 index 0000000..f4171fd
27393 --- /dev/null
27394 +++ b/litmus/sched_trace.c
27395 @@ -0,0 +1,252 @@
27396 +/*
27397 + * sched_trace.c -- record scheduling events to a byte stream.
27398 + */
27399 +#include <linux/spinlock.h>
27400 +#include <linux/mutex.h>
27401 +
27402 +#include <linux/fs.h>
27403 +#include <linux/slab.h>
27404 +#include <linux/miscdevice.h>
27405 +#include <asm/uaccess.h>
27406 +#include <linux/module.h>
27407 +#include <linux/sysrq.h>
27408 +
27409 +#include <linux/kfifo.h>
27410 +
27411 +#include <litmus/sched_trace.h>
27412 +#include <litmus/litmus.h>
27413 +
27414 +#define SCHED_TRACE_NAME "litmus/log"
27415 +
27416 +/* Compute size of TRACE() buffer */
27417 +#define LITMUS_TRACE_BUF_SIZE (1 << CONFIG_SCHED_DEBUG_TRACE_SHIFT)
27418 +
27419 +/* Max length of one read from the buffer */
27420 +#define MAX_READ_LEN (64 * 1024)
27421 +
27422 +/* Max length for one write --- by TRACE() --- to the buffer. This is used to
27423 + * allocate a per-cpu buffer for printf() formatting. */
27424 +#define MSG_SIZE 255
27425 +
27426 +
27427 +static DEFINE_MUTEX(reader_mutex);
27428 +static atomic_t reader_cnt = ATOMIC_INIT(0);
27429 +static DEFINE_KFIFO(debug_buffer, char, LITMUS_TRACE_BUF_SIZE);
27430 +
27431 +
27432 +static DEFINE_RAW_SPINLOCK(log_buffer_lock);
27433 +static DEFINE_PER_CPU(char[MSG_SIZE], fmt_buffer);
27434 +
27435 +/*
27436 + * sched_trace_log_message - Write to the trace buffer (log_buffer)
27437 + *
27438 + * This is the only function accessing the log_buffer from inside the
27439 + * kernel for writing.
27440 + * Concurrent access to sched_trace_log_message must be serialized using
27441 + * log_buffer_lock
27442 + * The maximum length of a formatted message is 255
27443 + */
27444 +void sched_trace_log_message(const char* fmt, ...)
27445 +{
27446 +	unsigned long 	flags;
27447 +	va_list 	args;
27448 +	size_t		len;
27449 +	char*		buf;
27450 +
27451 +	if (!atomic_read(&reader_cnt))
27452 +		/* early exit if nobody is listening */
27453 +		return;
27454 +
27455 +	va_start(args, fmt);
27456 +	local_irq_save(flags);
27457 +
27458 +	/* format message */
27459 +	buf = __get_cpu_var(fmt_buffer);
27460 +	len = vscnprintf(buf, MSG_SIZE, fmt, args);
27461 +
27462 +	raw_spin_lock(&log_buffer_lock);
27463 +	/* Don't copy the trailing null byte, we don't want null bytes in a
27464 +	 * text file.
27465 +	 */
27466 +	kfifo_in(&debug_buffer, buf, len);
27467 +	raw_spin_unlock(&log_buffer_lock);
27468 +
27469 +	local_irq_restore(flags);
27470 +	va_end(args);
27471 +}
27472 +
27473 +
27474 +/*
27475 + * log_read - Read the trace buffer
27476 + *
27477 + * This function is called as a file operation from userspace.
27478 + * Readers can sleep. Access is serialized through reader_mutex
27479 + */
27480 +static ssize_t log_read(struct file *filp,
27481 +			char __user *to, size_t len,
27482 +			loff_t *f_pos)
27483 +{
27484 +	/* we ignore f_pos, this is strictly sequential */
27485 +
27486 +	ssize_t error = -EINVAL;
27487 +	char* mem;
27488 +
27489 +	if (mutex_lock_interruptible(&reader_mutex)) {
27490 +		error = -ERESTARTSYS;
27491 +		goto out;
27492 +	}
27493 +
27494 +	if (len > MAX_READ_LEN)
27495 +		len = MAX_READ_LEN;
27496 +
27497 +	mem = kmalloc(len, GFP_KERNEL);
27498 +	if (!mem) {
27499 +		error = -ENOMEM;
27500 +		goto out_unlock;
27501 +	}
27502 +
27503 +	error = kfifo_out(&debug_buffer, mem, len);
27504 +	while (!error) {
27505 +		set_current_state(TASK_INTERRUPTIBLE);
27506 +		schedule_timeout(110);
27507 +		if (signal_pending(current))
27508 +			error = -ERESTARTSYS;
27509 +		else
27510 +			error = kfifo_out(&debug_buffer, mem, len);
27511 +	}
27512 +
27513 +	if (error > 0 && copy_to_user(to, mem, error))
27514 +		error = -EFAULT;
27515 +
27516 +	kfree(mem);
27517 + out_unlock:
27518 +	mutex_unlock(&reader_mutex);
27519 + out:
27520 +	return error;
27521 +}
27522 +
27523 +/*
27524 + * Enable redirection of printk() messages to the trace buffer.
27525 + * Defined in kernel/printk.c
27526 + */
27527 +extern int trace_override;
27528 +extern int trace_recurse;
27529 +
27530 +/*
27531 + * log_open - open the global log message ring buffer.
27532 + */
27533 +static int log_open(struct inode *in, struct file *filp)
27534 +{
27535 +	int error = -EINVAL;
27536 +
27537 +	if (mutex_lock_interruptible(&reader_mutex)) {
27538 +		error = -ERESTARTSYS;
27539 +		goto out;
27540 +	}
27541 +
27542 +	atomic_inc(&reader_cnt);
27543 +	error = 0;
27544 +
27545 +	printk(KERN_DEBUG
27546 +	       "sched_trace kfifo with buffer starting at: 0x%p\n",
27547 +	       debug_buffer.buf);
27548 +
27549 +	/* override printk() */
27550 +	trace_override++;
27551 +
27552 +	mutex_unlock(&reader_mutex);
27553 + out:
27554 +	return error;
27555 +}
27556 +
27557 +static int log_release(struct inode *in, struct file *filp)
27558 +{
27559 +	int error = -EINVAL;
27560 +
27561 +	if (mutex_lock_interruptible(&reader_mutex)) {
27562 +		error = -ERESTARTSYS;
27563 +		goto out;
27564 +	}
27565 +
27566 +	atomic_dec(&reader_cnt);
27567 +
27568 +	/* release printk() overriding */
27569 +	trace_override--;
27570 +
27571 +	printk(KERN_DEBUG "sched_trace kfifo released\n");
27572 +
27573 +	mutex_unlock(&reader_mutex);
27574 + out:
27575 +	return error;
27576 +}
27577 +
27578 +/*
27579 + * log_fops  - The file operations for accessing the global LITMUS log message
27580 + *             buffer.
27581 + *
27582 + * Except for opening the device file it uses the same operations as trace_fops.
27583 + */
27584 +static struct file_operations log_fops = {
27585 +	.owner   = THIS_MODULE,
27586 +	.open    = log_open,
27587 +	.release = log_release,
27588 +	.read    = log_read,
27589 +};
27590 +
27591 +static struct miscdevice litmus_log_dev = {
27592 +	.name    = SCHED_TRACE_NAME,
27593 +	.minor   = MISC_DYNAMIC_MINOR,
27594 +	.fops    = &log_fops,
27595 +};
27596 +
27597 +#ifdef CONFIG_MAGIC_SYSRQ
27598 +void dump_trace_buffer(int max)
27599 +{
27600 +	char line[80];
27601 +	int len;
27602 +	int count = 0;
27603 +
27604 +	/* potential, but very unlikely, race... */
27605 +	trace_recurse = 1;
27606 +	while ((max == 0 || count++ < max) &&
27607 +	       (len = kfifo_out(&debug_buffer, line, sizeof(line - 1))) > 0) {
27608 +		line[len] = '\0';
27609 +		printk("%s", line);
27610 +	}
27611 +	trace_recurse = 0;
27612 +}
27613 +
27614 +static void sysrq_dump_trace_buffer(int key)
27615 +{
27616 +	dump_trace_buffer(100);
27617 +}
27618 +
27619 +static struct sysrq_key_op sysrq_dump_trace_buffer_op = {
27620 +	.handler	= sysrq_dump_trace_buffer,
27621 +	.help_msg	= "dump-trace-buffer(Y)",
27622 +	.action_msg	= "writing content of TRACE() buffer",
27623 +};
27624 +#endif
27625 +
27626 +static int __init init_sched_trace(void)
27627 +{
27628 +	printk("Initializing TRACE() device\n");
27629 +
27630 +#ifdef CONFIG_MAGIC_SYSRQ
27631 +	/* offer some debugging help */
27632 +	if (!register_sysrq_key('y', &sysrq_dump_trace_buffer_op))
27633 +		printk("Registered dump-trace-buffer(Y) magic sysrq.\n");
27634 +	else
27635 +		printk("Could not register dump-trace-buffer(Y) magic sysrq.\n");
27636 +#endif
27637 +
27638 +	return misc_register(&litmus_log_dev);
27639 +}
27640 +
27641 +static void __exit exit_sched_trace(void)
27642 +{
27643 +	misc_deregister(&litmus_log_dev);
27644 +}
27645 +
27646 +module_init(init_sched_trace);
27647 +module_exit(exit_sched_trace);
27648 diff --git a/litmus/sched_trace_external.c b/litmus/sched_trace_external.c
27649 new file mode 100644
27650 index 0000000..cf8e1d7
27651 --- /dev/null
27652 +++ b/litmus/sched_trace_external.c
27653 @@ -0,0 +1,64 @@
27654 +#include <linux/module.h>
27655 +
27656 +#include <litmus/trace.h>
27657 +#include <litmus/sched_trace.h>
27658 +#include <litmus/litmus.h>
27659 +
27660 +void __sched_trace_tasklet_begin_external(struct task_struct* t)
27661 +{
27662 +	sched_trace_tasklet_begin(t);
27663 +}
27664 +EXPORT_SYMBOL(__sched_trace_tasklet_begin_external);
27665 +
27666 +void __sched_trace_tasklet_end_external(struct task_struct* t, unsigned long flushed)
27667 +{
27668 +	sched_trace_tasklet_end(t, flushed);
27669 +}
27670 +EXPORT_SYMBOL(__sched_trace_tasklet_end_external);
27671 +
27672 +
27673 +
27674 +void __sched_trace_work_begin_external(struct task_struct* t, struct task_struct* e)
27675 +{
27676 +	sched_trace_work_begin(t, e);
27677 +}
27678 +EXPORT_SYMBOL(__sched_trace_work_begin_external);
27679 +
27680 +void __sched_trace_work_end_external(struct task_struct* t, struct task_struct* e, unsigned long f)
27681 +{
27682 +	sched_trace_work_end(t, e, f);
27683 +}
27684 +EXPORT_SYMBOL(__sched_trace_work_end_external);
27685 +
27686 +
27687 +
27688 +void __sched_trace_nv_interrupt_begin_external(u32 device)
27689 +{
27690 +	//unsigned long _device = device;
27691 +	sched_trace_nv_interrupt_begin((unsigned long)device);
27692 +}
27693 +EXPORT_SYMBOL(__sched_trace_nv_interrupt_begin_external);
27694 +
27695 +void __sched_trace_nv_interrupt_end_external(u32 device)
27696 +{
27697 +	//unsigned long _device = device;
27698 +	sched_trace_nv_interrupt_end((unsigned long)device);
27699 +}
27700 +EXPORT_SYMBOL(__sched_trace_nv_interrupt_end_external);
27701 +
27702 +
27703 +#ifdef CONFIG_LITMUS_NVIDIA
27704 +
27705 +#define EXX_TS(evt) \
27706 +void __##evt(void) { evt; } \
27707 +EXPORT_SYMBOL(__##evt);
27708 +
27709 +EXX_TS(TS_NV_TOPISR_START)
27710 +EXX_TS(TS_NV_TOPISR_END)
27711 +EXX_TS(TS_NV_BOTISR_START)
27712 +EXX_TS(TS_NV_BOTISR_END)
27713 +EXX_TS(TS_NV_RELEASE_BOTISR_START)
27714 +EXX_TS(TS_NV_RELEASE_BOTISR_END)
27715 +
27716 +#endif
27717 +
27718 diff --git a/litmus/srp.c b/litmus/srp.c
27719 new file mode 100644
27720 index 0000000..2ed4ec1
27721 --- /dev/null
27722 +++ b/litmus/srp.c
27723 @@ -0,0 +1,295 @@
27724 +/* ************************************************************************** */
27725 +/*                          STACK RESOURCE POLICY                             */
27726 +/* ************************************************************************** */
27727 +
27728 +#include <asm/atomic.h>
27729 +#include <linux/sched.h>
27730 +#include <linux/wait.h>
27731 +
27732 +#include <litmus/litmus.h>
27733 +#include <litmus/sched_plugin.h>
27734 +#include <litmus/fdso.h>
27735 +#include <litmus/trace.h>
27736 +
27737 +
27738 +#ifdef CONFIG_LITMUS_LOCKING
27739 +
27740 +#include <litmus/srp.h>
27741 +
27742 +srp_prioritization_t get_srp_prio;
27743 +
27744 +struct srp {
27745 +	struct list_head	ceiling;
27746 +	wait_queue_head_t	ceiling_blocked;
27747 +};
27748 +#define system_ceiling(srp) list2prio(srp->ceiling.next)
27749 +#define ceiling2sem(c) container_of(c, struct srp_semaphore, ceiling)
27750 +
27751 +#define UNDEF_SEM -2
27752 +
27753 +atomic_t srp_objects_in_use = ATOMIC_INIT(0);
27754 +
27755 +DEFINE_PER_CPU(struct srp, srp);
27756 +
27757 +/* Initialize SRP semaphores at boot time. */
27758 +static int __init srp_init(void)
27759 +{
27760 +	int i;
27761 +
27762 +	printk("Initializing SRP per-CPU ceilings...");
27763 +	for (i = 0; i < NR_CPUS; i++) {
27764 +		init_waitqueue_head(&per_cpu(srp, i).ceiling_blocked);
27765 +		INIT_LIST_HEAD(&per_cpu(srp, i).ceiling);
27766 +	}
27767 +	printk(" done!\n");
27768 +
27769 +	return 0;
27770 +}
27771 +module_init(srp_init);
27772 +
27773 +/* SRP task priority comparison function. Smaller numeric values have higher
27774 + * priority, tie-break is PID. Special case: priority == 0 <=> no priority
27775 + */
27776 +static int srp_higher_prio(struct srp_priority* first,
27777 +			   struct srp_priority* second)
27778 +{
27779 +	if (!first->priority)
27780 +		return 0;
27781 +	else
27782 +		return  !second->priority ||
27783 +			first->priority < second->priority || (
27784 +			first->priority == second->priority &&
27785 +			first->pid < second->pid);
27786 +}
27787 +
27788 +
27789 +static int srp_exceeds_ceiling(struct task_struct* first,
27790 +			       struct srp* srp)
27791 +{
27792 +	struct srp_priority prio;
27793 +
27794 +	if (list_empty(&srp->ceiling))
27795 +		return 1;
27796 +	else {
27797 +		prio.pid = first->pid;
27798 +		prio.priority = get_srp_prio(first);
27799 +		return srp_higher_prio(&prio, system_ceiling(srp)) ||
27800 +			ceiling2sem(system_ceiling(srp))->owner == first;
27801 +	}
27802 +}
27803 +
27804 +static void srp_add_prio(struct srp* srp, struct srp_priority* prio)
27805 +{
27806 +	struct list_head *pos;
27807 +	if (in_list(&prio->list)) {
27808 +		printk(KERN_CRIT "WARNING: SRP violation detected, prio is already in "
27809 +		       "ceiling list! cpu=%d, srp=%p\n", smp_processor_id(), ceiling2sem(prio));
27810 +		return;
27811 +	}
27812 +	list_for_each(pos, &srp->ceiling)
27813 +		if (unlikely(srp_higher_prio(prio, list2prio(pos)))) {
27814 +			__list_add(&prio->list, pos->prev, pos);
27815 +			return;
27816 +		}
27817 +
27818 +	list_add_tail(&prio->list, &srp->ceiling);
27819 +}
27820 +
27821 +
27822 +static int lock_srp_semaphore(struct litmus_lock* l)
27823 +{
27824 +	struct srp_semaphore* sem = container_of(l, struct srp_semaphore, litmus_lock);
27825 +
27826 +	if (!is_realtime(current))
27827 +		return -EPERM;
27828 +
27829 +	preempt_disable();
27830 +
27831 +	/* Update ceiling. */
27832 +	srp_add_prio(&__get_cpu_var(srp), &sem->ceiling);
27833 +
27834 +	/* SRP invariant: all resources available */
27835 +	BUG_ON(sem->owner != NULL);
27836 +
27837 +	sem->owner = current;
27838 +	TRACE_CUR("acquired srp 0x%p\n", sem);
27839 +
27840 +	preempt_enable();
27841 +
27842 +	return 0;
27843 +}
27844 +
27845 +static int unlock_srp_semaphore(struct litmus_lock* l)
27846 +{
27847 +	struct srp_semaphore* sem = container_of(l, struct srp_semaphore, litmus_lock);
27848 +	int err = 0;
27849 +
27850 +	preempt_disable();
27851 +
27852 +	if (sem->owner != current) {
27853 +		err = -EINVAL;
27854 +	} else {
27855 +		/* Determine new system priority ceiling for this CPU. */
27856 +		BUG_ON(!in_list(&sem->ceiling.list));
27857 +
27858 +		list_del(&sem->ceiling.list);
27859 +		sem->owner = NULL;
27860 +
27861 +		/* Wake tasks on this CPU, if they exceed current ceiling. */
27862 +		TRACE_CUR("released srp 0x%p\n", sem);
27863 +		wake_up_all(&__get_cpu_var(srp).ceiling_blocked);
27864 +	}
27865 +
27866 +	preempt_enable();
27867 +	return err;
27868 +}
27869 +
27870 +static int open_srp_semaphore(struct litmus_lock* l, void* __user arg)
27871 +{
27872 +	struct srp_semaphore* sem = container_of(l, struct srp_semaphore, litmus_lock);
27873 +	int err = 0;
27874 +	struct task_struct* t = current;
27875 +	struct srp_priority t_prio;
27876 +
27877 +	if (!is_realtime(t))
27878 +		return -EPERM;
27879 +
27880 +	TRACE_CUR("opening SRP semaphore %p, cpu=%d\n", sem, sem->cpu);
27881 +
27882 +	preempt_disable();
27883 +
27884 +	if (sem->owner != NULL)
27885 +		err = -EBUSY;
27886 +
27887 +	if (err == 0) {
27888 +		if (sem->cpu == UNDEF_SEM)
27889 +			sem->cpu = get_partition(t);
27890 +		else if (sem->cpu != get_partition(t))
27891 +			err = -EPERM;
27892 +	}
27893 +
27894 +	if (err == 0) {
27895 +		t_prio.priority = get_srp_prio(t);
27896 +		t_prio.pid      = t->pid;
27897 +		if (srp_higher_prio(&t_prio, &sem->ceiling)) {
27898 +			sem->ceiling.priority = t_prio.priority;
27899 +			sem->ceiling.pid      = t_prio.pid;
27900 +		}
27901 +	}
27902 +
27903 +	preempt_enable();
27904 +
27905 +	return err;
27906 +}
27907 +
27908 +static int close_srp_semaphore(struct litmus_lock* l)
27909 +{
27910 +	struct srp_semaphore* sem = container_of(l, struct srp_semaphore, litmus_lock);
27911 +	int err = 0;
27912 +
27913 +	preempt_disable();
27914 +
27915 +	if (sem->owner == current)
27916 +		unlock_srp_semaphore(l);
27917 +
27918 +	preempt_enable();
27919 +
27920 +	return err;
27921 +}
27922 +
27923 +static void deallocate_srp_semaphore(struct litmus_lock* l)
27924 +{
27925 +	struct srp_semaphore* sem = container_of(l, struct srp_semaphore, litmus_lock);
27926 +	atomic_dec(&srp_objects_in_use);
27927 +	kfree(sem);
27928 +}
27929 +
27930 +static struct litmus_lock_ops srp_lock_ops = {
27931 +	.open   = open_srp_semaphore,
27932 +	.close  = close_srp_semaphore,
27933 +	.lock   = lock_srp_semaphore,
27934 +	.unlock = unlock_srp_semaphore,
27935 +	.deallocate = deallocate_srp_semaphore,
27936 +};
27937 +
27938 +struct srp_semaphore* allocate_srp_semaphore(void)
27939 +{
27940 +	struct srp_semaphore* sem;
27941 +
27942 +	sem = kmalloc(sizeof(*sem), GFP_KERNEL);
27943 +	if (!sem)
27944 +		return NULL;
27945 +
27946 +	INIT_LIST_HEAD(&sem->ceiling.list);
27947 +	sem->ceiling.priority = 0;
27948 +	sem->cpu     = UNDEF_SEM;
27949 +	sem->owner   = NULL;
27950 +
27951 +	sem->litmus_lock.ops = &srp_lock_ops;
27952 +
27953 +	atomic_inc(&srp_objects_in_use);
27954 +	return sem;
27955 +}
27956 +
27957 +static int srp_wake_up(wait_queue_t *wait, unsigned mode, int sync,
27958 +		       void *key)
27959 +{
27960 +	int cpu = smp_processor_id();
27961 +	struct task_struct *tsk = wait->private;
27962 +	if (cpu != get_partition(tsk))
27963 +		TRACE_TASK(tsk, "srp_wake_up on wrong cpu, partition is %d\b",
27964 +			   get_partition(tsk));
27965 +	else if (srp_exceeds_ceiling(tsk, &__get_cpu_var(srp)))
27966 +		return default_wake_function(wait, mode, sync, key);
27967 +	return 0;
27968 +}
27969 +
27970 +static void do_ceiling_block(struct task_struct *tsk)
27971 +{
27972 +	wait_queue_t wait = {
27973 +		.private   = tsk,
27974 +		.func      = srp_wake_up,
27975 +		.task_list = {NULL, NULL}
27976 +	};
27977 +
27978 +	tsk->state = TASK_UNINTERRUPTIBLE;
27979 +	add_wait_queue(&__get_cpu_var(srp).ceiling_blocked, &wait);
27980 +	tsk->rt_param.srp_non_recurse = 1;
27981 +	preempt_enable_no_resched();
27982 +	schedule();
27983 +	preempt_disable();
27984 +	tsk->rt_param.srp_non_recurse = 0;
27985 +	remove_wait_queue(&__get_cpu_var(srp).ceiling_blocked, &wait);
27986 +}
27987 +
27988 +/* Wait for current task priority to exceed system-wide priority ceiling.
27989 + * FIXME: the hotpath should be inline.
27990 + */
27991 +void srp_ceiling_block(void)
27992 +{
27993 +	struct task_struct *tsk = current;
27994 +
27995 +	/* Only applies to real-time tasks, but optimize for RT tasks. */
27996 +	if (unlikely(!is_realtime(tsk)))
27997 +		return;
27998 +
27999 +	/* Avoid recursive ceiling blocking. */
28000 +	if (unlikely(tsk->rt_param.srp_non_recurse))
28001 +		return;
28002 +
28003 +	/* Bail out early if there aren't any SRP resources around. */
28004 +	if (likely(!atomic_read(&srp_objects_in_use)))
28005 +		return;
28006 +
28007 +	preempt_disable();
28008 +	if (!srp_exceeds_ceiling(tsk, &__get_cpu_var(srp))) {
28009 +		TRACE_CUR("is priority ceiling blocked.\n");
28010 +		while (!srp_exceeds_ceiling(tsk, &__get_cpu_var(srp)))
28011 +			do_ceiling_block(tsk);
28012 +		TRACE_CUR("finally exceeds system ceiling.\n");
28013 +	} else
28014 +		TRACE_CUR("is not priority ceiling blocked\n");
28015 +	preempt_enable();
28016 +}
28017 +
28018 +#endif
28019 diff --git a/litmus/sync.c b/litmus/sync.c
28020 new file mode 100644
28021 index 0000000..bf75fde
28022 --- /dev/null
28023 +++ b/litmus/sync.c
28024 @@ -0,0 +1,104 @@
28025 +/* litmus/sync.c - Support for synchronous and asynchronous task system releases.
28026 + *
28027 + *
28028 + */
28029 +
28030 +#include <asm/atomic.h>
28031 +#include <asm/uaccess.h>
28032 +#include <linux/spinlock.h>
28033 +#include <linux/list.h>
28034 +#include <linux/sched.h>
28035 +#include <linux/completion.h>
28036 +
28037 +#include <litmus/litmus.h>
28038 +#include <litmus/sched_plugin.h>
28039 +#include <litmus/jobs.h>
28040 +
28041 +#include <litmus/sched_trace.h>
28042 +
28043 +static DECLARE_COMPLETION(ts_release);
28044 +
28045 +static long do_wait_for_ts_release(void)
28046 +{
28047 +	long ret = 0;
28048 +
28049 +	/* If the interruption races with a release, the completion object
28050 +	 * may have a non-zero counter. To avoid this problem, this should
28051 +	 * be replaced by wait_for_completion().
28052 +	 *
28053 +	 * For debugging purposes, this is interruptible for now.
28054 +	 */
28055 +	ret = wait_for_completion_interruptible(&ts_release);
28056 +
28057 +	return ret;
28058 +}
28059 +
28060 +int count_tasks_waiting_for_release(void)
28061 +{
28062 +	unsigned long flags;
28063 +	int task_count = 0;
28064 +	struct list_head *pos;
28065 +
28066 +	spin_lock_irqsave(&ts_release.wait.lock, flags);
28067 +	list_for_each(pos, &ts_release.wait.task_list) {
28068 +		task_count++;
28069 +	}
28070 +	spin_unlock_irqrestore(&ts_release.wait.lock, flags);
28071 +
28072 +	return task_count;
28073 +}
28074 +
28075 +static long do_release_ts(lt_t start)
28076 +{
28077 +	int  task_count = 0;
28078 +	unsigned long flags;
28079 +	struct list_head	*pos;
28080 +	struct task_struct 	*t;
28081 +
28082 +
28083 +	spin_lock_irqsave(&ts_release.wait.lock, flags);
28084 +	TRACE("<<<<<< synchronous task system release >>>>>>\n");
28085 +
28086 +	sched_trace_sys_release(&start);
28087 +	list_for_each(pos, &ts_release.wait.task_list) {
28088 +		t = (struct task_struct*) list_entry(pos,
28089 +						     struct __wait_queue,
28090 +						     task_list)->private;
28091 +		task_count++;
28092 +		litmus->release_at(t, start + t->rt_param.task_params.phase);
28093 +		sched_trace_task_release(t);
28094 +	}
28095 +
28096 +	spin_unlock_irqrestore(&ts_release.wait.lock, flags);
28097 +
28098 +	complete_n(&ts_release, task_count);
28099 +
28100 +	return task_count;
28101 +}
28102 +
28103 +
28104 +asmlinkage long sys_wait_for_ts_release(void)
28105 +{
28106 +	long ret = -EPERM;
28107 +	struct task_struct *t = current;
28108 +
28109 +	if (is_realtime(t))
28110 +		ret = do_wait_for_ts_release();
28111 +
28112 +	return ret;
28113 +}
28114 +
28115 +
28116 +asmlinkage long sys_release_ts(lt_t __user *__delay)
28117 +{
28118 +	long ret;
28119 +	lt_t delay;
28120 +
28121 +	/* FIXME: check capabilities... */
28122 +
28123 +	ret = copy_from_user(&delay, __delay, sizeof(delay));
28124 +	if (ret == 0)
28125 +		ret = do_release_ts(litmus_clock() + delay);
28126 +
28127 +	return ret;
28128 +}
28129 diff --git a/litmus/trace.c b/litmus/trace.c
28130 new file mode 100644
28131 index 0000000..3c35c52
28132 --- /dev/null
28133 +++ b/litmus/trace.c
28134 @@ -0,0 +1,225 @@
28135 +#include <linux/sched.h>
28136 +#include <linux/module.h>
28137 +#include <linux/uaccess.h>
28138 +
28139 +#include <litmus/ftdev.h>
28140 +#include <litmus/litmus.h>
28141 +#include <litmus/trace.h>
28142 +
28143 +/******************************************************************************/
28144 +/*                          Allocation                                        */
28145 +/******************************************************************************/
28146 +
28147 +static struct ftdev overhead_dev;
28148 +
28149 +#define trace_ts_buf overhead_dev.minor[0].buf
28150 +
28151 +static unsigned int ts_seq_no = 0;
28152 +
28153 +DEFINE_PER_CPU(atomic_t, irq_fired_count);
28154 +
28155 +static inline void clear_irq_fired(void)
28156 +{
28157 +	atomic_set(&__raw_get_cpu_var(irq_fired_count), 0);
28158 +}
28159 +
28160 +static inline unsigned int get_and_clear_irq_fired(void)
28161 +{
28162 +	/* This is potentially not atomic  since we might migrate if
28163 +	 * preemptions are not disabled. As a tradeoff between
28164 +	 * accuracy and tracing overheads, this seems acceptable.
28165 +	 * If it proves to be a problem, then one could add a callback
28166 +	 * from the migration code to invalidate irq_fired_count.
28167 +	 */
28168 +	return atomic_xchg(&__raw_get_cpu_var(irq_fired_count), 0);
28169 +}
28170 +
28171 +static inline void __save_irq_flags(struct timestamp *ts)
28172 +{
28173 +	unsigned int irq_count;
28174 +
28175 +	irq_count     = get_and_clear_irq_fired();
28176 +	/* Store how many interrupts occurred. */
28177 +	ts->irq_count = irq_count;
28178 +	/* Extra flag because ts->irq_count overflows quickly. */
28179 +	ts->irq_flag  = irq_count > 0;
28180 +}
28181 +
28182 +static inline void __save_timestamp_cpu(unsigned long event,
28183 +					uint8_t type, uint8_t cpu)
28184 +{
28185 +	unsigned int seq_no;
28186 +	struct timestamp *ts;
28187 +	seq_no = fetch_and_inc((int *) &ts_seq_no);
28188 +	if (ft_buffer_start_write(trace_ts_buf, (void**)  &ts)) {
28189 +		ts->event     = event;
28190 +		ts->seq_no    = seq_no;
28191 +		ts->cpu       = cpu;
28192 +		ts->task_type = type;
28193 +		__save_irq_flags(ts);
28194 +		barrier();
28195 +		/* prevent re-ordering of ft_timestamp() */
28196 +		ts->timestamp = ft_timestamp();
28197 +		ft_buffer_finish_write(trace_ts_buf, ts);
28198 +	}
28199 +}
28200 +
28201 +static void __add_timestamp_user(struct timestamp *pre_recorded)
28202 +{
28203 +	unsigned int seq_no;
28204 +	struct timestamp *ts;
28205 +	seq_no = fetch_and_inc((int *) &ts_seq_no);
28206 +	if (ft_buffer_start_write(trace_ts_buf, (void**)  &ts)) {
28207 +		*ts = *pre_recorded;
28208 +		ts->seq_no = seq_no;
28209 +		__save_irq_flags(ts);
28210 +		ft_buffer_finish_write(trace_ts_buf, ts);
28211 +	}
28212 +}
28213 +
28214 +static inline void __save_timestamp(unsigned long event,
28215 +				   uint8_t type)
28216 +{
28217 +	__save_timestamp_cpu(event, type, raw_smp_processor_id());
28218 +}
28219 +
28220 +feather_callback void save_timestamp(unsigned long event)
28221 +{
28222 +	__save_timestamp(event, TSK_UNKNOWN);
28223 +}
28224 +
28225 +feather_callback void save_timestamp_def(unsigned long event,
28226 +					 unsigned long type)
28227 +{
28228 +	__save_timestamp(event, (uint8_t) type);
28229 +}
28230 +
28231 +feather_callback void save_timestamp_task(unsigned long event,
28232 +					  unsigned long t_ptr)
28233 +{
28234 +	int rt = is_realtime((struct task_struct *) t_ptr);
28235 +	__save_timestamp(event, rt ? TSK_RT : TSK_BE);
28236 +}
28237 +
28238 +feather_callback void save_timestamp_cpu(unsigned long event,
28239 +					 unsigned long cpu)
28240 +{
28241 +	__save_timestamp_cpu(event, TSK_UNKNOWN, cpu);
28242 +}
28243 +
28244 +feather_callback void save_task_latency(unsigned long event,
28245 +					unsigned long when_ptr)
28246 +{
28247 +	lt_t now = litmus_clock();
28248 +	lt_t *when = (lt_t*) when_ptr;
28249 +	unsigned int seq_no;
28250 +	int cpu = raw_smp_processor_id();
28251 +	struct timestamp *ts;
28252 +
28253 +	seq_no = fetch_and_inc((int *) &ts_seq_no);
28254 +	if (ft_buffer_start_write(trace_ts_buf, (void**)  &ts)) {
28255 +		ts->event     = event;
28256 +		ts->timestamp = now - *when;
28257 +		ts->seq_no    = seq_no;
28258 +		ts->cpu       = cpu;
28259 +		ts->task_type = TSK_RT;
28260 +		__save_irq_flags(ts);
28261 +		ft_buffer_finish_write(trace_ts_buf, ts);
28262 +	}
28263 +}
28264 +
28265 +/******************************************************************************/
28266 +/*                        DEVICE FILE DRIVER                                  */
28267 +/******************************************************************************/
28268 +
28269 +/*
28270 + * should be 8M; it is the max we can ask to buddy system allocator (MAX_ORDER)
28271 + * and we might not get as much
28272 + */
28273 +#define NO_TIMESTAMPS (2 << 16)
28274 +
28275 +static int alloc_timestamp_buffer(struct ftdev* ftdev, unsigned int idx)
28276 +{
28277 +	unsigned int count = NO_TIMESTAMPS;
28278 +
28279 +	/* An overhead-tracing timestamp should be exactly 16 bytes long. */
28280 +	BUILD_BUG_ON(sizeof(struct timestamp) != 16);
28281 +
28282 +	while (count && !trace_ts_buf) {
28283 +		printk("time stamp buffer: trying to allocate %u time stamps.\n", count);
28284 +		ftdev->minor[idx].buf = alloc_ft_buffer(count, sizeof(struct timestamp));
28285 +		count /= 2;
28286 +	}
28287 +	return ftdev->minor[idx].buf ? 0 : -ENOMEM;
28288 +}
28289 +
28290 +static void free_timestamp_buffer(struct ftdev* ftdev, unsigned int idx)
28291 +{
28292 +	free_ft_buffer(ftdev->minor[idx].buf);
28293 +	ftdev->minor[idx].buf = NULL;
28294 +}
28295 +
28296 +static ssize_t write_timestamp_from_user(struct ft_buffer* buf, size_t len,
28297 +					 const char __user *from)
28298 +{
28299 +	ssize_t consumed = 0;
28300 +	struct timestamp ts;
28301 +
28302 +	/* don't give us partial timestamps */
28303 +	if (len % sizeof(ts))
28304 +		return -EINVAL;
28305 +
28306 +	while (len >= sizeof(ts)) {
28307 +		if (copy_from_user(&ts, from, sizeof(ts))) {
28308 +			consumed = -EFAULT;
28309 +			goto out;
28310 +		}
28311 +		len  -= sizeof(ts);
28312 +		from += sizeof(ts);
28313 +		consumed += sizeof(ts);
28314 +
28315 +		__add_timestamp_user(&ts);
28316 +	}
28317 +
28318 +out:
28319 +	return consumed;
28320 +}
28321 +
28322 +static int __init init_ft_overhead_trace(void)
28323 +{
28324 +	int err, cpu;
28325 +
28326 +	printk("Initializing Feather-Trace overhead tracing device.\n");
28327 +	err = ftdev_init(&overhead_dev, THIS_MODULE, 1, "ft_trace");
28328 +	if (err)
28329 +		goto err_out;
28330 +
28331 +	overhead_dev.alloc = alloc_timestamp_buffer;
28332 +	overhead_dev.free  = free_timestamp_buffer;
28333 +	overhead_dev.write = write_timestamp_from_user;
28334 +
28335 +	err = register_ftdev(&overhead_dev);
28336 +	if (err)
28337 +		goto err_dealloc;
28338 +
28339 +	/* initialize IRQ flags */
28340 +	for (cpu = 0; cpu < NR_CPUS; cpu++)  {
28341 +		clear_irq_fired();
28342 +	}
28343 +
28344 +	return 0;
28345 +
28346 +err_dealloc:
28347 +	ftdev_exit(&overhead_dev);
28348 +err_out:
28349 +	printk(KERN_WARNING "Could not register ft_trace module.\n");
28350 +	return err;
28351 +}
28352 +
28353 +static void __exit exit_ft_overhead_trace(void)
28354 +{
28355 +	ftdev_exit(&overhead_dev);
28356 +}
28357 +
28358 +module_init(init_ft_overhead_trace);
28359 +module_exit(exit_ft_overhead_trace);
28360 -- 
28361 1.7.10.4
Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
[get | view] (2015-06-04 21:22:59, 244.8 KB) [[attachment:MC2-liblitmus-imx6-rtss15.patch]]
[get | view] (2016-05-12 14:35:37, 51.9 KB) [[attachment:MC2-liblitmus-rtss16.patch]]
[get | view] (2016-05-12 14:36:06, 190.4 KB) [[attachment:MC2-litmus-rt-rtss16.patch]]
[get | view] (2015-07-19 10:27:52, 1119.9 KB) [[attachment:MC2-litmut-rt-imx6-rtss15.patch]]
[get | view] (2014-05-27 20:46:19, 58.3 KB) [[attachment:MC2_liblitmus_ipdps15.patch]]
[get | view] (2014-05-27 20:45:43, 1044.3 KB) [[attachment:MC2_litmusrt_ipdps15.patch]]
[get | view] (2017-04-07 21:48:09, 6099.5 KB) [[attachment:buff_sharing.tar]]
[get | view] (2015-01-08 14:20:07, 61.0 KB) [[attachment:feather-trace-patch-against-sched-deadline-v8.patch]]
[get | view] (2014-04-01 23:10:10, 38.9 KB) [[attachment:gedf-mp-rtas14.patch]]
[get | view] (2012-03-02 20:13:59, 1.9 KB) [[attachment:gpu-klmirqd-liblitmus-rt-ecrts12.patch]]
[get | view] (2012-03-02 20:14:25, 389.8 KB) [[attachment:gpu-klmirqd-litmus-rt-ecrts12.patch]]
[get | view] (2012-05-26 21:41:34, 418.0 KB) [[attachment:gpusync-rtss12.patch]]
[get | view] (2012-05-26 21:42:20, 8.6 KB) [[attachment:gpusync_liblitmus-rtss12.patch]]
[get | view] (2013-05-21 15:32:08, 208.6 KB) [[attachment:gpusync_rtss13_liblitmus.patch]]
[get | view] (2013-05-21 15:31:32, 779.5 KB) [[attachment:gpusync_rtss13_litmus.patch]]
[get | view] (2012-05-26 21:42:41, 71.4 KB) [[attachment:klt_tracker_v1.0.litmus.tgz]]
[get | view] (2016-10-13 21:14:05, 19.6 KB) [[attachment:liblitmus-rtas17.patch]]
[get | view] (2017-05-01 20:46:22, 90.0 KB) [[attachment:liblitmus-rtns17.patch]]
[get | view] (2018-12-11 01:38:53, 49.1 KB) [[attachment:liblitmus-semi-part-with-edfos.patch]]
[get | view] (2017-10-09 19:16:09, 304.0 KB) [[attachment:litmus-rt-os-isolation.patch]]
[get | view] (2016-10-13 21:13:27, 207.6 KB) [[attachment:litmus-rt-rtas17.patch]]
[get | view] (2017-05-01 20:46:40, 207.6 KB) [[attachment:litmus-rt-rtns17.patch]]
[get | view] (2018-12-11 01:39:04, 100.5 KB) [[attachment:litmus-rt-semi-part-with-edfos.patch]]
[get | view] (2018-06-26 04:31:48, 7.0 KB) [[attachment:mc2_liblitmus_2015.1-rtns18.patch]]
[get | view] (2018-06-26 04:31:33, 292.7 KB) [[attachment:mc2_litmus-rt_2015.1-rtns18.patch]]
[get | view] (2017-05-01 20:45:10, 2596.9 KB) [[attachment:mcp_study.zip]]
[get | view] (2013-07-13 14:11:53, 58.0 KB) [[attachment:omip-ecrts13.patch]]
[get | view] (2014-02-19 21:48:33, 17.2 KB) [[attachment:pgmrt-liblitmus-ecrts14.patch]]
[get | view] (2014-02-19 21:47:57, 87.8 KB) [[attachment:pgmrt-litmusrt-ecrts14.patch]]
[get | view] (2015-01-08 14:22:32, 61.0 KB) [[attachment:sched-deadline-v8-feather-trace-rtas14.patch]]
[get | view] (2018-06-26 04:32:13, 2545.1 KB) [[attachment:sched_study_rtns2018.tar.gz]]
[get | view] (2017-04-07 21:53:39, 5969.5 KB) [[attachment:seminal.tar]]
[get | view] (2017-04-07 21:51:13, 6064.0 KB) [[attachment:shared_libraries.tar]]
[get | view] (2013-07-13 13:58:25, 42.7 KB) [[attachment:tracing-and-dflp-rtas13.patch]]
All files | Selected Files: delete move to page copy to page
You are not allowed to attach a file to this page.