1. 22 Dec, 2017 5 commits
    • Kevin Wolf's avatar
      block: Keep nodes drained between reopen_queue/multiple · 1a63a907
      Kevin Wolf authored
      The bdrv_reopen*() implementation doesn't like it if the graph is
      changed between queuing nodes for reopen and actually reopening them
      (one of the reasons is that queuing can be recursive).
      
      So instead of draining the device only in bdrv_reopen_multiple(),
      require that callers already drained all affected nodes, and assert this
      in bdrv_reopen_queue().
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: 's avatarFam Zheng <famz@redhat.com>
      1a63a907
    • Kevin Wolf's avatar
      block: Allow graph changes in subtree drained section · d736f119
      Kevin Wolf authored
      We need to remember how many of the drain sections in which a node is
      were recursive (i.e. subtree drain rather than node drain), so that they
      can be correctly applied when children are added or removed during the
      drained section.
      
      With this change, it is safe to modify the graph even inside a
      bdrv_subtree_drained_begin/end() section.
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      d736f119
    • Kevin Wolf's avatar
      block: Don't notify parents in drain call chain · 0152bf40
      Kevin Wolf authored
      This is in preparation for subtree drains, i.e. drained sections that
      affect not only a single node, but recursively all child nodes, too.
      
      Calling the parent callbacks for drain is pointless when we just came
      from that parent node recursively and leads to multiple increases of
      bs->quiesce_counter in a single drain call. Don't do it.
      
      In order for this to work correctly, the parent callback must be called
      for every bdrv_drain_begin/end() call, not only for the outermost one:
      
      If we have a node N with two parents A and B, recursive draining of A
      should cause the quiesce_counter of B to increase because its child N is
      drained independently of B. If now B is recursively drained, too, A must
      increase its quiesce_counter because N is drained independently of A
      only now, even if N is going from quiesce_counter 1 to 2.
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      0152bf40
    • Fam Zheng's avatar
      block: Open backing image in force share mode for size probe · cc954f01
      Fam Zheng authored
      Management tools create overlays of running guests with qemu-img:
      
        $ qemu-img create -b /image/in/use.qcow2 -f qcow2 /overlay/image.qcow2
      
      but this doesn't work anymore due to image locking:
      
          qemu-img: /overlay/image.qcow2: Failed to get shared "write" lock
          Is another process using the image?
          Could not open backing image to determine size.
      Use the force share option to allow this use case again.
      
      Cc: qemu-stable@nongnu.org
      Signed-off-by: 's avatarFam Zheng <famz@redhat.com>
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      cc954f01
    • Kevin Wolf's avatar
      block: Formats don't need CONSISTENT_READ with NO_IO · 5fbfabd3
      Kevin Wolf authored
      Commit 1f4ad7d3 fixed 'qemu-img info' for raw images that are currently
      in use as a mirror target. It is not enough for image formats, though,
      as these still unconditionally request BLK_PERM_CONSISTENT_READ.
      
      As this permission is geared towards whether the guest-visible data is
      consistent, and has no impact on whether the metadata is sane, and
      'qemu-img info' does not read guest-visible data (except for the raw
      format), it makes sense to not require BLK_PERM_CONSISTENT_READ if there
      is not going to be any guest I/O performed, regardless of image format.
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      5fbfabd3
  2. 19 Dec, 2017 1 commit
    • Paolo Bonzini's avatar
      block: avoid recursive AioContext acquire in bdrv_inactivate_all() · bd6458e4
      Paolo Bonzini authored
      BDRV_POLL_WHILE() does not support recursive AioContext locking.  It
      only releases the AioContext lock once regardless of how many times the
      caller has acquired it.  This results in a hang since the IOThread does
      not make progress while the AioContext is still locked.
      
      The following steps trigger the hang:
      
        $ qemu-system-x86_64 -M accel=kvm -m 1G -cpu host \
                             -object iothread,id=iothread0 \
                             -device virtio-scsi-pci,iothread=iothread0 \
                             -drive if=none,id=drive0,file=test.img,format=raw \
                             -device scsi-hd,drive=drive0 \
                             -drive if=none,id=drive1,file=test.img,format=raw \
                             -device scsi-hd,drive=drive1
        $ qemu-system-x86_64 ...same options... \
                             -incoming tcp::1234
        (qemu) migrate tcp:127.0.0.1:1234
        ...hang...
      Tested-by: 's avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: 's avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      Message-id: 20171207201320.19284-2-stefanha@redhat.com
      Signed-off-by: 's avatarStefan Hajnoczi <stefanha@redhat.com>
      bd6458e4
  3. 21 Nov, 2017 2 commits
    • Alberto Garcia's avatar
      block: Close a BlockDriverState completely even when bs->drv is NULL · 50a3efb0
      Alberto Garcia authored
      bdrv_close() skips much of its logic when bs->drv is NULL. This is
      fine when we're closing a BlockDriverState that has just been created
      (because e.g the initialization process failed), but it's not enough
      in other cases.
      
      For example, when a valid qcow2 image is found to be corrupted then
      QEMU marks it as such in the file header and then sets bs->drv to
      NULL in order to make the BlockDriverState unusable. When that BDS is
      later closed then many of its data structures are not freed (leaking
      their memory) and none of its children are detached. This results in
      bdrv_close_all() failing to close all BDSs and making this assertion
      fail when QEMU is being shut down:
      
         bdrv_close_all: Assertion `QTAILQ_EMPTY(&all_bdrv_states)' failed.
      
      This patch makes bdrv_close() do the full uninitialization process
      in all cases. This fixes the problem with corrupted images and still
      works fine with freshly created BDSs.
      Signed-off-by: 's avatarAlberto Garcia <berto@igalia.com>
      Message-id: 20171106145345.12038-1-berto@igalia.com
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      Signed-off-by: 's avatarMax Reitz <mreitz@redhat.com>
      50a3efb0
    • Kevin Wolf's avatar
      block: Don't use BLK_PERM_CONSISTENT_READ for format probing · dacaa162
      Kevin Wolf authored
      For format probing, we don't really care whether all of the image
      content is consistent. The only thing we're looking at is the image
      header, and specifically the magic numbers that are expected to never
      change, no matter how inconsistent the guest visible disk content is.
      
      Therefore, don't request BLK_PERM_CONSISTENT_READ. This allows to use
      format probing, e.g. in the context of 'qemu-img info', even while the
      guest visible data in the image is inconsistent during a running block
      job.
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: 's avatarFam Zheng <famz@redhat.com>
      dacaa162
  4. 17 Nov, 2017 6 commits
    • Max Reitz's avatar
      block: Make bdrv_next() keep strong references · 5e003f17
      Max Reitz authored
      On one hand, it is a good idea for bdrv_next() to return a strong
      reference because ideally nearly every pointer should be refcounted.
      This fixes intermittent failure of iotest 194.
      
      On the other, it is absolutely necessary for bdrv_next() itself to keep
      a strong reference to both the BB (in its first phase) and the BDS (at
      least in the second phase) because when called the next time, it will
      dereference those objects to get a link to the next one.  Therefore, it
      needs these objects to stay around until then.  Just storing the pointer
      to the next in the iterator is not really viable because that pointer
      might become invalid as well.
      
      Both arguments taken together means we should probably just invoke
      bdrv_ref() and blk_ref() in bdrv_next().  This means we have to assert
      that bdrv_next() is always called from the main loop, but that was
      probably necessary already before this patch and judging from the
      callers, it also looks to actually be the case.
      
      Keeping these strong references means however that callers need to give
      them up if they decide to abort the iteration early.  They can do so
      through the new bdrv_next_cleanup() function.
      Suggested-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: 's avatarMax Reitz <mreitz@redhat.com>
      Message-id: 20171110172545.32609-1-mreitz@redhat.com
      Reviewed-by: 's avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: 's avatarMax Reitz <mreitz@redhat.com>
      5e003f17
    • Max Reitz's avatar
      block: Guard against NULL bs->drv · d470ad42
      Max Reitz authored
      We currently do not guard everywhere against a NULL bs->drv where we
      should be doing so.  Most of the places fixed here just do not care
      about that case at all.
      
      Some care implicitly, e.g. through a prior function call to
      bdrv_getlength() which would always fail for an ejected BDS.  Add an
      assert there to make it more obvious.
      
      Other places seem to care, but do so insufficiently: Freeing clusters in
      a qcow2 image is an error-free operation, but it may leave the image in
      an unusable state anyway.  Giving qcow2_free_clusters() an error code is
      not really viable, it is much easier to note that bs->drv may be NULL
      even after a successful driver call.  This concerns bdrv_co_flush(), and
      the way the check is added to bdrv_co_pdiscard() (in every iteration
      instead of only once).
      
      Finally, some places employ at least an assert(bs->drv); somewhere, that
      may be reasonable (such as in the reopen code), but in
      bdrv_has_zero_init(), it is definitely not.  Returning 0 there in case
      of an ejected BDS saves us much headache instead.
      Reported-by: 's avatarR. Nageswara Sastry <nasastry@in.ibm.com>
      Buglink: https://bugs.launchpad.net/qemu/+bug/1728660Signed-off-by: 's avatarMax Reitz <mreitz@redhat.com>
      Message-id: 20171110203111.7666-4-mreitz@redhat.com
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      Signed-off-by: 's avatarMax Reitz <mreitz@redhat.com>
      d470ad42
    • Max Reitz's avatar
      block: qobject_is_equal() in bdrv_reopen_prepare() · 54fd1b0d
      Max Reitz authored
      Currently, bdrv_reopen_prepare() assumes that all BDS options are
      strings. However, this is not the case if the BDS has been created
      through the json: pseudo-protocol or blockdev-add.
      
      Note that the user-invokable reopen command is an HMP command, so you
      can only specify strings there. Therefore, specifying a non-string
      option with the "same" value as it was when originally created will now
      return an error because the values are supposedly similar (and there is
      no way for the user to circumvent this but to just not specify the
      option again -- however, this is still strictly better than just
      crashing).
      Signed-off-by: 's avatarMax Reitz <mreitz@redhat.com>
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      Message-id: 20171114180128.17076-5-mreitz@redhat.com
      Signed-off-by: 's avatarMax Reitz <mreitz@redhat.com>
      54fd1b0d
    • Kevin Wolf's avatar
      block: Fix permissions in image activation · dafe0960
      Kevin Wolf authored
      Inactive images generally request less permissions for their image files
      than they would if they were active (in particular, write permissions).
      Activating the image involves extending the permissions, therefore.
      
      drv->bdrv_invalidate_cache() can already require write access to the
      image file, so we have to update the permissions earlier than that.
      The current code does it only later, so we have to move up this part.
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: 's avatarMax Reitz <mreitz@redhat.com>
      dafe0960
    • Kevin Wolf's avatar
      block: Deprecate bdrv_set_read_only() and users · 398e6ad0
      Kevin Wolf authored
      bdrv_set_read_only() is used by some block drivers to override the
      read-only option given by the user. This is not how read-only images
      generally work in QEMU: Instead of second guessing what the user really
      meant (which currently includes making an image read-only even if the
      user didn't only use the default, but explicitly said read-only=off), we
      should error out if we can't provide what the user requested.
      
      This adds deprecation warnings to all callers of bdrv_set_read_only() so
      that the behaviour can be corrected after the usual deprecation period.
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      398e6ad0
    • Kevin Wolf's avatar
      block: Fix error path in bdrv_backing_update_filename() · 64730694
      Kevin Wolf authored
      error_setg_errno() takes a positive errno code. Spotted by Coverity
      (CID 1381628).
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: 's avatarAlberto Garcia <berto@igalia.com>
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      Reviewed-by: 's avatarStefan Hajnoczi <stefanha@redhat.com>
      64730694
  5. 26 Oct, 2017 1 commit
    • Peter Krempa's avatar
      block: don't add 'driver' to options when referring to backing via node name · 6bff597b
      Peter Krempa authored
      When referring to a backing file of an image via node name
      bdrv_open_backing_file would add the 'driver' option to the option list
      filling it with the backing format driver. This breaks construction of
      the backing chain via -blockdev, as bdrv_open_inherit reports an error
      if both 'reference' and 'options' are provided.
      
      $ qemu-img create -f raw /tmp/backing.raw 64M
      $ qemu-img create -f qcow2 -F raw -b /tmp/backing.raw /tmp/test.qcow2
      $ qemu-system-x86_64 \
        -blockdev driver=file,filename=/tmp/backing.raw,node-name=backing \
        -blockdev driver=qcow2,file.driver=file,file.filename=/tmp/test.qcow2,node-name=root,backing=backing
      qemu-system-x86_64: -blockdev driver=qcow2,file.driver=file,file.filename=/tmp/test.qcow2,node-name=root,backing=backing: Could not open backing file: Cannot reference an existing block device with additional options or a new filename
      Signed-off-by: 's avatarPeter Krempa <pkrempa@redhat.com>
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      6bff597b
  6. 06 Oct, 2017 5 commits
    • Kevin Wolf's avatar
      commit: Remove overlay_bs · bde70715
      Kevin Wolf authored
      We don't need to make any assumptions about the graph layout above the
      top node of the commit operation any more. Remove the use of
      bdrv_find_overlay() and related variables from the commit job code.
      
      bdrv_drop_intermediate() doesn't use the 'active' parameter any more, so
      we can just drop it.
      
      The overlay node was previously added to the block job to get a
      BLK_PERM_GRAPH_MOD. We really need to respect those permissions in
      bdrv_drop_intermediate() now, but as long as we haven't figured out yet
      how BLK_PERM_GRAPH_MOD is actually supposed to work, just leave a TODO
      comment there.
      
      With this change, it is now possible to perform another block job on an
      overlay node without conflicts. qemu-iotests 030 is changed accordingly.
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      bde70715
    • Kevin Wolf's avatar
      commit: Support multiple roots above top node · 61f09cea
      Kevin Wolf authored
      This changes the commit block job to support operation in a graph where
      there is more than a single active layer that references the top node.
      
      This involves inserting the commit filter node not only on the path
      between the given active node and the top node, but between the top node
      and all of its parents.
      
      On completion, bdrv_drop_intermediate() must consider all parents for
      updating the backing file link. These parents may be backing files
      themselves and as such read-only; reopen them temporarily if necessary.
      Previously this was achieved by the bdrv_reopen() calls in the commit
      block job that made overlay_bs read-write for the whole duration of the
      block job, even though write access is only needed on completion.
      
      Now that we consider all parents, overlay_bs is meaningless. It is left
      in place in this commit, but we'll remove it soon.
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      61f09cea
    • Kevin Wolf's avatar
      block: Introduce BdrvChildRole.update_filename · 6858eba0
      Kevin Wolf authored
      There is no good reason for bdrv_drop_intermediate() to know the active
      layer above the subchain it is operating on - even more so, because
      the assumption that there is a single active layer above it is not
      generally true.
      
      In order to prepare removal of the active parameter, use a BdrvChildRole
      callback to update the backing file string in the overlay image instead
      of directly calling bdrv_change_backing_file().
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: 's avatarEric Blake <eblake@redhat.com>
      6858eba0
    • Eric Blake's avatar
      dirty-bitmap: Avoid size query failure during truncate · 1b6cc579
      Eric Blake authored
      We've previously fixed several places where we failed to account
      for possible errors from bdrv_nb_sectors().  Fix another one by
      making bdrv_dirty_bitmap_truncate() take the new size from the
      caller instead of querying itself; then adjust the sole caller
      bdrv_truncate() to pass the size just determined by a successful
      resize, or to reuse the size given to the original truncate
      operation when refresh_total_sectors() was not able to confirm the
      actual size (the two sizes can potentially differ according to
      rounding constraints), thus avoiding sizing the bitmaps to -1.
      This also fixes a bug where not all failure paths in
      bdrv_truncate() would set errp.
      
      Note that bdrv_truncate() is still a bit awkward.  We may want
      to revisit it later and clean up things to better guarantee that
      a resize attempt either fails cleanly up front, or cannot fail
      after guest-visible changes have been made (if temporary changes
      are made, then they need to be cleanly rolled back).  But that
      is a task for another day; for now, the goal is the bare minimum
      fix to ensure that just bdrv_dirty_bitmap_truncate() cannot fail.
      Signed-off-by: 's avatarEric Blake <eblake@redhat.com>
      Reviewed-by: 's avatarVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Reviewed-by: 's avatarJohn Snow <jsnow@redhat.com>
      Reviewed-by: 's avatarFam Zheng <famz@redhat.com>
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      1b6cc579
    • Eric Blake's avatar
      block: Make bdrv_img_create() size selection easier to read · a8b42a1c
      Eric Blake authored
      All callers of bdrv_img_create() pass in a size, or -1 to read the
      size from the backing file.  We then set that size as the QemuOpt
      default, which means we will reuse that default rather than the
      final parameter to qemu_opt_get_size() several lines later.  But
      it is rather confusing to read subsequent checks of 'size == -1'
      when it looks (without seeing the full context) like size defaults
      to 0; it also doesn't help that a size of 0 is valid (for some
      formats).
      
      Rework the logic to make things more legible.
      Signed-off-by: 's avatarEric Blake <eblake@redhat.com>
      Reviewed-by: 's avatarJohn Snow <jsnow@redhat.com>
      Reviewed-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: 's avatarFam Zheng <famz@redhat.com>
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      a8b42a1c
  7. 26 Sep, 2017 5 commits
  8. 04 Sep, 2017 5 commits
  9. 23 Aug, 2017 1 commit
  10. 08 Aug, 2017 3 commits
  11. 01 Aug, 2017 2 commits
    • Manos Pitsidianakis's avatar
      block: fix leaks in bdrv_open_driver() · 180ca19a
      Manos Pitsidianakis authored
      bdrv_open_driver() is called in two places, bdrv_new_open_driver() and
      bdrv_open_common(). In the latter, failure cleanup in is in its caller,
      bdrv_open_inherit(), which unrefs the bs->file of the failed driver open
      if it exists.
      
      Let's move the bs->file cleanup to bdrv_open_driver() to take care of
      all callers and do not set bs->drv to NULL unless the driver's open
      function failed. When bs is destroyed by removing its last reference, it
      calls bdrv_close() which checks bs->drv to perform the needed cleanups
      and also call the driver's close function. Since it cleans up options
      and opaque we must take care not leave dangling pointers.
      
      The error paths in bdrv_open_driver() are now two:
      If open fails, drv->bdrv_close() should not be called. Unref the child
      if it exists, free what we allocated and set bs->drv to NULL. Return the
      error and let callers free their stuff.
      
      If open succeeds but we fail after, return the error and let callers
      unref and delete their bs, while cleaning up their allocations.
      Signed-off-by: 's avatarManos Pitsidianakis <el13635@mail.ntua.gr>
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      180ca19a
    • Manos Pitsidianakis's avatar
      block: fix dangling bs->explicit_options in block.c · 998cbd6a
      Manos Pitsidianakis authored
      In some error paths it is possible to QDECREF a freed dangling
      explicit_options, resulting in a heap overflow crash.  For example
      bdrv_open_inherit()'s fail unrefs it, then calls bdrv_unref which calls
      bdrv_close which also unrefs it.
      Signed-off-by: 's avatarManos Pitsidianakis <el13635@mail.ntua.gr>
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      998cbd6a
  12. 24 Jul, 2017 1 commit
    • Kevin Wolf's avatar
      block: Skip implicit nodes in query-block/blockstats · d3c8c674
      Kevin Wolf authored
      Commits 0db832f4 and 6cdbceb1 introduced the automatic insertion of filter
      nodes above the top layer of mirror and commit block jobs. The
      assumption made there was that since libvirt doesn't do node-level
      management of the block layer yet, it shouldn't be affected by added
      nodes.
      
      This is true as far as commands issued by libvirt are concerned. It only
      uses BlockBackend names to address nodes, so any operations it performs
      still operate on the root of the tree as intended.
      
      However, the assumption breaks down when you consider query commands,
      which return data for the wrong node now. These commands also return
      information on some child nodes (bs->file and/or bs->backing), which
      libvirt does make use of, and which refer to the wrong nodes, too.
      
      One of the consequences is that oVirt gets wrong information about the
      image size and stops the VM in response as long as a mirror or commit
      job is running:
      
      https://bugzilla.redhat.com/show_bug.cgi?id=1470634
      
      This patch fixes the problem by hiding the implicit nodes created
      automatically by the mirror and commit block jobs in the output of
      query-block and BlockBackend-based query-blockstats as long as the user
      doesn't indicate that they are aware of those nodes by providing a node
      name for them in the QMP command to start the block job.
      
      The node-based commands query-named-block-nodes and query-blockstats
      with query-nodes=true still show all nodes, including implicit ones.
      This ensures that users that are capable of node-level management can
      still access the full information; users that only know BlockBackends
      won't use these commands.
      
      Cc: qemu-stable@nongnu.org
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: 's avatarPeter Krempa <pkrempa@redhat.com>
      Reviewed-by: 's avatarMax Reitz <mreitz@redhat.com>
      Tested-by: 's avatarEric Blake <eblake@redhat.com>
      d3c8c674
  13. 18 Jul, 2017 1 commit
    • John Snow's avatar
      qemu-img: Check for backing image if specified during create · 6e6e55f5
      John Snow authored
      Or, rather, force the open of a backing image if one was specified
      for creation. Using a similar -unsafe option as rebase, allow qemu-img
      to ignore the backing file validation if possible.
      
      It may not always be possible, as in the existing case when a filesize
      for the new image was not specified.
      
      This is accomplished by shifting around the conditionals in
      bdrv_img_create, such that a backing file is always opened unless we
      provide BDRV_O_NO_BACKING. qemu-img is adjusted to pass this new flag
      when -u is provided to create.
      
      Sorry for the heinous looking diffstat, but it's mostly whitespace.
      
      Inspired by: https://bugzilla.redhat.com/show_bug.cgi?id=1213786Signed-off-by: 's avatarJohn Snow <jsnow@redhat.com>
      Signed-off-by: 's avatarKevin Wolf <kwolf@redhat.com>
      6e6e55f5
  14. 13 Jul, 2017 1 commit
  15. 11 Jul, 2017 1 commit