1. 22 Dec, 2017 1 commit
    • Kevin Wolf's avatar
      blockjob: Pause job on draining any job BDS · ad90feba
      Kevin Wolf authored
      Block jobs already paused themselves when their main BlockBackend
      entered a drained section. This is not good enough: We also want to
      pause a block job and may not submit new requests if, for example, the
      mirror target node should be drained.
      
      This implements .drained_begin/end callbacks in child_job in order to
      consider all block nodes related to the job, and removes the
      BlockBackend callbacks which are unnecessary now because the root of the
      job main BlockBackend is always referenced with a child_job, too.
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      ad90feba
  2. 18 Dec, 2017 1 commit
  3. 04 Dec, 2017 1 commit
    • Alberto Garcia's avatar
      blockjob: Make block_job_pause_all() keep a reference to the jobs · 3d5d319e
      Alberto Garcia authored
      Starting from commit 40840e41 we are
      pausing all block jobs during bdrv_reopen_multiple() to prevent any of
      them from finishing and removing nodes from the graph while they are
      being reopened.
      
      It turns out that pausing a block job doesn't necessarily prevent it
      from finishing: a paused block job can still run its exit function
      from the main loop and call block_job_completed(). The mirror block
      job in particular always goes to the main loop while it is paused (by
      virtue of the bdrv_drained_begin() call in mirror_run()).
      
      Destroying a paused block job during bdrv_reopen_multiple() has two
      consequences:
      
         1) The references to the nodes involved in the job are released,
            possibly destroying some of them. If those nodes were in the
            reopen queue this would trigger the problem originally described
            in commit 40840e41, crashing QEMU.
      
         2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
            not be doing all necessary bdrv_parent_drained_end() calls.
      
      I can reproduce problem 1) easily with iotest 030 by increasing
      STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking
      the iotest like in this example:
      
         https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html
      
      This patch keeps an additional reference to all block jobs between
      block_job_pause_all() and block_job_resume_all(), guaranteeing that
      they are kept alive.
      Signed-off-by: 's avatarAlberto Garcia <berto@igalia.com>
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      3d5d319e
  4. 29 Nov, 2017 3 commits
  5. 28 Nov, 2017 1 commit
  6. 21 Nov, 2017 1 commit
    • Jeff Cody's avatar
      blockjob: do not allow coroutine double entry or entry-after-completion · 4afeffc8
      Jeff Cody authored
      When block_job_sleep_ns() is called, the co-routine is scheduled for
      future execution.  If we allow the job to be re-entered prior to the
      scheduled time, we present a race condition in which a coroutine can be
      entered recursively, or even entered after the coroutine is deleted.
      
      The job->busy flag is used by blockjobs when a coroutine is busy
      executing. The function 'block_job_enter()' obeys the busy flag,
      and will not enter a coroutine if set.  If we sleep a job, we need to
      leave the busy flag set, so that subsequent calls to block_job_enter()
      are prevented.
      
      This changes the prior behavior of block_job_cancel() being able to
      immediately wake up and cancel a job; in practice, this should not be an
      issue, as the coroutine sleep times are generally very small, and the
      cancel will occur the next time the coroutine wakes up.
      
      This fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1508708Signed-off-by: 's avatarJeff Cody <jcody@redhat.com>
      Reviewed-by: 's avatarStefan Hajnoczi <stefanha@redhat.com>
      4afeffc8
  7. 04 Sep, 2017 1 commit
  8. 26 Jun, 2017 1 commit
  9. 24 May, 2017 10 commits
  10. 11 Apr, 2017 1 commit
  11. 22 Mar, 2017 3 commits
    • John Snow's avatar
      blockjob: add devops to blockjob backends · 600ac6a0
      John Snow authored
      This lets us hook into drained_begin and drained_end requests from the
      backend level, which is particularly useful for making sure that all
      jobs associated with a particular node (whether the source or the target)
      receive a drain request.
      Suggested-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: 's avatarJohn Snow <jsnow@redhat.com>
      Reviewed-by: 's avatarJeff Cody <jcody@redhat.com>
      Message-id: 20170316212351.13797-4-jsnow@redhat.com
      Signed-off-by: 's avatarJeff Cody <jcody@redhat.com>
      600ac6a0
    • John Snow's avatar
      blockjob: add block_job_start_shim · e3796a24
      John Snow authored
      The purpose of this shim is to allow us to pause pre-started jobs.
      The purpose of *that* is to allow us to buffer a pause request that
      will be able to take effect before the job ever does any work, allowing
      us to create jobs during a quiescent state (under which they will be
      automatically paused), then resuming the jobs after the critical section
      in any order, either:
      
      (1) -block_job_start
          -block_job_resume (via e.g. drained_end)
      
      (2) -block_job_resume (via e.g. drained_end)
          -block_job_start
      
      The problem that requires a startup wrapper is the idea that a job must
      start in the busy=true state only its first time-- all subsequent entries
      require busy to be false, and the toggling of this state is otherwise
      handled during existing pause and yield points.
      
      The wrapper simply allows us to mandate that a job can "start," set busy
      to true, then immediately pause only if necessary. We could avoid
      requiring a wrapper, but all jobs would need to do it, so it's been
      factored out here.
      Signed-off-by: 's avatarJohn Snow <jsnow@redhat.com>
      Reviewed-by: 's avatarJeff Cody <jcody@redhat.com>
      Message-id: 20170316212351.13797-2-jsnow@redhat.com
      Signed-off-by: 's avatarJeff Cody <jcody@redhat.com>
      e3796a24
    • Paolo Bonzini's avatar
      blockjob: avoid recursive AioContext locking · d79df2a2
      Paolo Bonzini authored
      Streaming or any other block job hangs when performed on a block device
      that has a non-default iothread.  This happens because the AioContext
      is acquired twice by block_job_defer_to_main_loop_bh and then released
      only once by BDRV_POLL_WHILE.  (Insert rants on recursive mutexes, which
      unfortunately are a temporary but necessary evil for iothreads at the
      moment).
      
      Luckily, the reason for the double acquisition is simple; the function
      acquires the AioContext for both the job iothread and the BDS iothread,
      in case the BDS iothread was changed while the job was running.  It
      is therefore enough to skip the second acquisition when the two
      AioContexts are one and the same.
      Signed-off-by: 's avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      Reviewed-by: 's avatarJeff Cody <jcody@redhat.com>
      Message-id: 1490118490-5597-1-git-send-email-pbonzini@redhat.com
      Signed-off-by: 's avatarJeff Cody <jcody@redhat.com>
      d79df2a2
  12. 28 Feb, 2017 5 commits
  13. 31 Jan, 2017 1 commit
  14. 15 Nov, 2016 3 commits
    • John Snow's avatar
      blockjob: add block_job_start · 5ccac6f1
      John Snow authored
      Instead of automatically starting jobs at creation time via backup_start
      et al, we'd like to return a job object pointer that can be started
      manually at later point in time.
      
      For now, add the block_job_start mechanism and start the jobs
      automatically as we have been doing, with conversions job-by-job coming
      in later patches.
      
      Of note: cancellation of unstarted jobs will perform all the normal
      cleanup as if the job had started, particularly abort and clean. The
      only difference is that we will not emit any events, because the job
      never actually started.
      Signed-off-by: 's avatarJohn Snow <jsnow@redhat.com>
      Message-id: 1478587839-9834-5-git-send-email-jsnow@redhat.com
      Signed-off-by: 's avatarJeff Cody <jcody@redhat.com>
      5ccac6f1
    • John Snow's avatar
      blockjob: add .clean property · e8a40bf7
      John Snow authored
      Cleaning up after we have deferred to the main thread but before the
      transaction has converged can be dangerous and result in deadlocks
      if the job cleanup invokes any BH polling loops.
      
      A job may attempt to begin cleaning up, but may induce another job to
      enter its cleanup routine. The second job, part of our same transaction,
      will block waiting for the first job to finish, so neither job may now
      make progress.
      
      To rectify this, allow jobs to register a cleanup operation that will
      always run regardless of if the job was in a transaction or not, and
      if the transaction job group completed successfully or not.
      
      Move sensitive cleanup to this callback instead which is guaranteed to
      be run only after the transaction has converged, which removes sensitive
      timing constraints from said cleanup.
      
      Furthermore, in future patches these cleanup operations will be performed
      regardless of whether or not we actually started the job. Therefore,
      cleanup callbacks should essentially confine themselves to undoing create
      operations, e.g. setup actions taken in what is now backup_start.
      Reported-by: 's avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Signed-off-by: 's avatarJohn Snow <jsnow@redhat.com>
      Reviewed-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Message-id: 1478587839-9834-3-git-send-email-jsnow@redhat.com
      Signed-off-by: 's avatarJeff Cody <jcody@redhat.com>
      e8a40bf7
    • Vladimir Sementsov-Ogievskiy's avatar
      blockjob: fix dead pointer in txn list · 1e93b9fb
      Vladimir Sementsov-Ogievskiy authored
      Though it is not intended to be reached through normal circumstances,
      if we do not gracefully deconstruct the transaction QLIST, we may wind
      up with stale pointers in the list.
      
      The rest of this series attempts to address the underlying issues,
      but this should fix list inconsistencies.
      Signed-off-by: 's avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Tested-by: 's avatarJohn Snow <jsnow@redhat.com>
      Reviewed-by: 's avatarJohn Snow <jsnow@redhat.com>
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      Reviewed-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: 's avatarJohn Snow <jsnow@redhat.com>
      Message-id: 1478587839-9834-2-git-send-email-jsnow@redhat.com
      [Rewrote commit message. --js]
      Signed-off-by: 's avatarJohn Snow <jsnow@redhat.com>
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      Reviewed-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: 's avatarJohn Snow <jsnow@redhat.com>
      Signed-off-by: 's avatarJeff Cody <jcody@redhat.com>
      1e93b9fb
  15. 01 Nov, 2016 5 commits
  16. 31 Oct, 2016 1 commit
  17. 28 Oct, 2016 1 commit