1. 25 Jan, 2018 1 commit
    • Laurent Vivier's avatar
      accel/tcg: add size paremeter in tlb_fill() · 98670d47
      Laurent Vivier authored
      The MC68040 MMU provides the size of the access that
      triggers the page fault.
      
      This size is set in the Special Status Word which
      is written in the stack frame of the access fault
      exception.
      
      So we need the size in m68k_cpu_unassigned_access() and
      m68k_cpu_handle_mmu_fault().
      
      To be able to do that, this patch modifies the prototype of
      handle_mmu_fault handler, tlb_fill() and probe_write().
      do_unassigned_access() already includes a size parameter.
      
      This patch also updates handle_mmu_fault handlers and
      tlb_fill() of all targets (only parameter, no code change).
      Signed-off-by: 's avatarLaurent Vivier <laurent@vivier.eu>
      Reviewed-by: 's avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      Message-Id: <20180118193846.24953-2-laurent@vivier.eu>
      98670d47
  2. 24 Jan, 2018 1 commit
  3. 23 Jan, 2018 2 commits
    • Peter Maydell's avatar
      page_unprotect(): handle calls to pages that are PAGE_WRITE · 9c4bbee9
      Peter Maydell authored
      If multiple guest threads in user-mode emulation write to a
      page which QEMU has marked read-only because of cached TCG
      translations, the threads can race in page_unprotect:
      
       * threads A & B both try to do a write to a page with code in it at
         the same time (ie which we've made non-writeable, so SEGV)
       * they race into the signal handler with this faulting address
       * thread A happens to get to page_unprotect() first and takes the
         mmap lock, so thread B sits waiting for it to be done
       * A then finds the page, marks it PAGE_WRITE and mprotect()s it writable
       * A can then continue OK (returns from signal handler to retry the
         memory access)
       * ...but when B gets the mmap lock it finds that the page is already
         PAGE_WRITE, and so it exits page_unprotect() via the "not due to
         protected translation" code path, and wrongly delivers the signal
         to the guest rather than just retrying the access
      
      In particular, this meant that trying to run 'javac' in user-mode
      emulation would fail with a spurious guest SIGSEGV.
      
      Handle this by making page_unprotect() assume that a call for a page
      which is already PAGE_WRITE is due to a race of this sort and return
      a "fault handled" indication.
      
      Since this would cause an infinite loop if we ever called
      page_unprotect() for some other kind of fault than "write failed due
      to bad access permissions", tighten the condition in
      handle_cpu_signal() to check the signal number and si_code, and add a
      comment so that if somebody does ever find themselves debugging an
      infinite loop of faults they have some clue about why.
      
      (The trick for identifying the correct setting for
      current_tb_invalidated for thread B (needed to handle the precise-SMC
      case) is due to Richard Henderson.  Paolo Bonzini suggested just
      relying on si_code rather than trying anything more complicated.)
      Signed-off-by: 's avatarPeter Maydell <peter.maydell@linaro.org>
      Message-Id: <1511879725-9576-3-git-send-email-peter.maydell@linaro.org>
      Signed-off-by: 's avatarLaurent Vivier <laurent@vivier.eu>
      9c4bbee9
    • Peter Maydell's avatar
      linux-user: Propagate siginfo_t through to handle_cpu_signal() · a78b1299
      Peter Maydell authored
      Currently all the architecture/OS specific cpu_signal_handler()
      functions call handle_cpu_signal() without passing it the
      siginfo_t. We're going to want that so we can look at the si_code
      to determine whether this is a SEGV_ACCERR access violation or
      some other kind of fault, so change the functions to pass through
      the pointer to the siginfo_t rather than just the si_addr value.
      Signed-off-by: 's avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      Message-Id: <1511879725-9576-2-git-send-email-peter.maydell@linaro.org>
      Signed-off-by: 's avatarLaurent Vivier <laurent@vivier.eu>
      a78b1299
  4. 18 Jan, 2018 1 commit
  5. 29 Dec, 2017 1 commit
  6. 22 Dec, 2017 1 commit
    • Sergio Andres Gomez Del Real's avatar
      i386: hvf: add code base from Google's QEMU repository · c97d6d2c
      Sergio Andres Gomez Del Real authored
      This file begins tracking the files that will be the code base for HVF
      support in QEMU. This code base is part of Google's QEMU version of
      their Android emulator, and can be found at
      https://android.googlesource.com/platform/external/qemu/+/emu-master-dev
      
      This code is based on Veertu Inc's vdhh (Veertu Desktop Hosted
      Hypervisor), found at https://github.com/veertuinc/vdhh. Everything is
      appropriately licensed under GPL v2-or-later, except for the code inside
      x86_task.c and x86_task.h, which, deriving from KVM (the Linux kernel),
      is licensed GPL v2-only.
      
      This code base already implements a very great deal of functionality,
      although Google's version removed from Vertuu's the support for APIC
      page and hyperv-related stuff. According to the Android Emulator Release
      Notes, Revision 26.1.3 (August 2017), "Hypervisor.framework is now
      enabled by default on macOS for 32-bit x86 images to improve performance
      and macOS compatibility", although we better use with caution for, as the
      same Revision warns us, "If you experience issues with it specifically,
      please file a bug report...". The code hasn't seen much update in the
      last 5 months, so I think that we can further develop the code with
      occasional visiting Google's repository to see if there has been any
      update.
      
      On top of Google's code, the following changes were made:
      
      - add code to the configure script to support the --enable-hvf argument.
      If the OS is Darwin, it checks for presence of HVF in the system. The
      patch also adds strings related to HVF in the file qemu-options.hx.
      QEMU will only support the modern syntax style '-M accel=hvf' no enable
      hvf; the legacy '-enable-hvf' will not be supported.
      
      - fix styling issues
      
      - add glue code to cpus.c
      
      - move HVFX86EmulatorState field to CPUX86State, changing the
      the emulation functions to have a parameter with signature 'CPUX86State *'
      instead of 'CPUState *' so we don't have to get the 'env'.
      Signed-off-by: 's avatarSergio Andres Gomez Del Real <Sergio.G.DelReal@gmail.com>
      Message-Id: <20170913090522.4022-2-Sergio.G.DelReal@gmail.com>
      Message-Id: <20170913090522.4022-3-Sergio.G.DelReal@gmail.com>
      Message-Id: <20170913090522.4022-5-Sergio.G.DelReal@gmail.com>
      Message-Id: <20170913090522.4022-6-Sergio.G.DelReal@gmail.com>
      Message-Id: <20170905035457.3753-7-Sergio.G.DelReal@gmail.com>
      Signed-off-by: 's avatarPaolo Bonzini <pbonzini@redhat.com>
      c97d6d2c
  7. 21 Dec, 2017 1 commit
    • David Hildenbrand's avatar
      cpu-exec: fix missed CPU kick during interrupt injection · d84be02d
      David Hildenbrand authored
      The conditional memory barrier not only looks strange but actually is
      wrong.
      
      On s390x, I can reproduce interrupts via cpu_interrupt() not leading to
      a proper kick out of emulation every now and then. cpu_interrupt() is
      especially used for inter CPU communication via SIGP (esp. external
      calls and emergency interrupts).
      
      With this patch, I was not able to reproduce. (esp. no stalls or hangs
      in the guest).
      
      My setup is s390x MTTCG with 16 VCPUs on 8 CPU host, running make -j16.
      Signed-off-by: 's avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20171129191319.11483-1-david@redhat.com>
      Signed-off-by: 's avatarPaolo Bonzini <pbonzini@redhat.com>
      d84be02d
  8. 18 Dec, 2017 3 commits
  9. 23 Nov, 2017 1 commit
  10. 21 Nov, 2017 1 commit
    • Peter Maydell's avatar
      accel/tcg: Handle atomic accesses to notdirty memory correctly · 34d49937
      Peter Maydell authored
      To do a write to memory that is marked as notdirty, we need
      to invalidate any TBs we have cached for that memory, and
      update the cpu physical memory dirty flags for VGA and migration.
      The slowpath code in notdirty_mem_write() does all this correctly,
      but the new atomic handling code in atomic_mmu_lookup() doesn't
      do anything at all, it just clears the dirty bit in the TLB.
      
      The effect of this bug is that if the first write to a notdirty
      page for which we have cached TBs is by a guest atomic access,
      we fail to invalidate the TBs and subsequently will execute
      incorrect code. This can be seen by trying to run 'javac' on AArch64.
      
      Use the new notdirty_call_before() and notdirty_call_after()
      functions to correctly handle the update to notdirty memory
      in the atomic codepath.
      
      Cc: qemu-stable@nongnu.org
      Signed-off-by: 's avatarPeter Maydell <peter.maydell@linaro.org>
      Reviewed-by: 's avatarPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      Message-id: 1511201308-23580-3-git-send-email-peter.maydell@linaro.org
      34d49937
  11. 20 Nov, 2017 1 commit
  12. 15 Nov, 2017 1 commit
  13. 14 Nov, 2017 2 commits
  14. 13 Nov, 2017 1 commit
  15. 03 Nov, 2017 1 commit
  16. 24 Oct, 2017 16 commits
    • Emilio G. Cota's avatar
      translate-all: exit from tb_phys_invalidate if qht_remove fails · cc689485
      Emilio G. Cota authored
      Two or more threads might race while invalidating the same TB. We currently
      do not check for this at all despite taking tb_lock, which means we would
      wrongly invalidate the same TB more than once. This bug has actually been
      hit by users: I recently saw a report on IRC, although I have yet to see
      the corresponding test case.
      
      Fix this by using qht_remove as the synchronization point; if it fails,
      that means the TB has already been invalidated, and therefore there
      is nothing left to do in tb_phys_invalidate.
      
      Note that this solution works now that we still have tb_lock, and will
      continue working once we remove tb_lock.
      Reviewed-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Message-Id: <1508445114-4717-1-git-send-email-cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      cc689485
    • Emilio G. Cota's avatar
      tcg: enable multiple TCG contexts in softmmu · 3468b59e
      Emilio G. Cota authored
      This enables parallel TCG code generation. However, we do not take
      advantage of it yet since tb_lock is still held during tb_gen_code.
      
      In user-mode we use a single TCG context; see the documentation
      added to tcg_region_init for the rationale.
      
      Note that targets do not need any conversion: targets initialize a
      TCGContext (e.g. defining TCG globals), and after this initialization
      has finished, the context is cloned by the vCPU threads, each of
      them keeping a separate copy.
      
      TCG threads claim one entry in tcg_ctxs[] by atomically increasing
      n_tcg_ctxs. Do not be too annoyed by the subsequent atomic_read's
      of that variable and tcg_ctxs; they are there just to play nice with
      analysis tools such as thread sanitizer.
      
      Note that we do not allocate an array of contexts (we allocate
      an array of pointers instead) because when tcg_context_init
      is called, we do not know yet how many contexts we'll use since
      the bool behind qemu_tcg_mttcg_enabled() isn't set yet.
      
      Previous patches folded some TCG globals into TCGContext. The non-const
      globals remaining are only set at init time, i.e. before the TCG
      threads are spawned. Here is a list of these set-at-init-time globals
      under tcg/:
      
      Only written by tcg_context_init:
      - indirect_reg_alloc_order
      - tcg_op_defs
      Only written by tcg_target_init (called from tcg_context_init):
      - tcg_target_available_regs
      - tcg_target_call_clobber_regs
      - arm: arm_arch, use_idiv_instructions
      - i386: have_cmov, have_bmi1, have_bmi2, have_lzcnt,
              have_movbe, have_popcnt
      - mips: use_movnz_instructions, use_mips32_instructions,
              use_mips32r2_instructions, got_sigill (tcg_target_detect_isa)
      - ppc: have_isa_2_06, have_isa_3_00, tb_ret_addr
      - s390: tb_ret_addr, s390_facilities
      - sparc: qemu_ld_trampoline, qemu_st_trampoline (build_trampolines),
               use_vis3_instructions
      
      Only written by tcg_prologue_init:
      - 'struct jit_code_entry one_entry'
      - aarch64: tb_ret_addr
      - arm: tb_ret_addr
      - i386: tb_ret_addr, guest_base_flags
      - ia64: tb_ret_addr
      - mips: tb_ret_addr, bswap32_addr, bswap32u_addr, bswap64_addr
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      3468b59e
    • Emilio G. Cota's avatar
      tcg: introduce regions to split code_gen_buffer · e8feb96f
      Emilio G. Cota authored
      This is groundwork for supporting multiple TCG contexts.
      
      The naive solution here is to split code_gen_buffer statically
      among the TCG threads; this however results in poor utilization
      if translation needs are different across TCG threads.
      
      What we do here is to add an extra layer of indirection, assigning
      regions that act just like pages do in virtual memory allocation.
      (BTW if you are wondering about the chosen naming, I did not want
      to use blocks or pages because those are already heavily used in QEMU).
      
      We use a global lock to serialize allocations as well as statistics
      reporting (we now export the size of the used code_gen_buffer with
      tcg_code_size()). Note that for the allocator we could just use
      a counter and atomic_inc; however, that would complicate the gathering
      of tcg_code_size()-like stats. So given that the region operations are
      not a fast path, a lock seems the most reasonable choice.
      
      The effectiveness of this approach is clear after seeing some numbers.
      I used the bootup+shutdown of debian-arm with '-tb-size 80' as a benchmark.
      Note that I'm evaluating this after enabling per-thread TCG (which
      is done by a subsequent commit).
      
      * -smp 1, 1 region (entire buffer):
          qemu: flush code_size=83885014 nb_tbs=154739 avg_tb_size=357
          qemu: flush code_size=83884902 nb_tbs=153136 avg_tb_size=363
          qemu: flush code_size=83885014 nb_tbs=152777 avg_tb_size=364
          qemu: flush code_size=83884950 nb_tbs=150057 avg_tb_size=373
          qemu: flush code_size=83884998 nb_tbs=150234 avg_tb_size=373
          qemu: flush code_size=83885014 nb_tbs=154009 avg_tb_size=360
          qemu: flush code_size=83885014 nb_tbs=151007 avg_tb_size=370
          qemu: flush code_size=83885014 nb_tbs=151816 avg_tb_size=367
      
      That is, 8 flushes.
      
      * -smp 8, 32 regions (80/32 MB per region) [i.e. this patch]:
      
          qemu: flush code_size=76328008 nb_tbs=141040 avg_tb_size=356
          qemu: flush code_size=75366534 nb_tbs=138000 avg_tb_size=361
          qemu: flush code_size=76864546 nb_tbs=140653 avg_tb_size=361
          qemu: flush code_size=76309084 nb_tbs=135945 avg_tb_size=375
          qemu: flush code_size=74581856 nb_tbs=132909 avg_tb_size=375
          qemu: flush code_size=73927256 nb_tbs=135616 avg_tb_size=360
          qemu: flush code_size=78629426 nb_tbs=142896 avg_tb_size=365
          qemu: flush code_size=76667052 nb_tbs=138508 avg_tb_size=368
      
      Again, 8 flushes. Note how buffer utilization is not 100%, but it
      is close. Smaller region sizes would yield higher utilization,
      but we want region allocation to be rare (it acquires a lock), so
      we do not want to go too small.
      
      * -smp 8, static partitioning of 8 regions (10 MB per region):
          qemu: flush code_size=21936504 nb_tbs=40570 avg_tb_size=354
          qemu: flush code_size=11472174 nb_tbs=20633 avg_tb_size=370
          qemu: flush code_size=11603976 nb_tbs=21059 avg_tb_size=365
          qemu: flush code_size=23254872 nb_tbs=41243 avg_tb_size=377
          qemu: flush code_size=28289496 nb_tbs=52057 avg_tb_size=358
          qemu: flush code_size=43605160 nb_tbs=78896 avg_tb_size=367
          qemu: flush code_size=45166552 nb_tbs=82158 avg_tb_size=364
          qemu: flush code_size=63289640 nb_tbs=116494 avg_tb_size=358
          qemu: flush code_size=51389960 nb_tbs=93937 avg_tb_size=362
          qemu: flush code_size=59665928 nb_tbs=107063 avg_tb_size=372
          qemu: flush code_size=38380824 nb_tbs=68597 avg_tb_size=374
          qemu: flush code_size=44884568 nb_tbs=79901 avg_tb_size=376
          qemu: flush code_size=50782632 nb_tbs=90681 avg_tb_size=374
          qemu: flush code_size=39848888 nb_tbs=71433 avg_tb_size=372
          qemu: flush code_size=64708840 nb_tbs=119052 avg_tb_size=359
          qemu: flush code_size=49830008 nb_tbs=90992 avg_tb_size=362
          qemu: flush code_size=68372408 nb_tbs=123442 avg_tb_size=368
          qemu: flush code_size=33555560 nb_tbs=59514 avg_tb_size=378
          qemu: flush code_size=44748344 nb_tbs=80974 avg_tb_size=367
          qemu: flush code_size=37104248 nb_tbs=67609 avg_tb_size=364
      
      That is, 20 flushes. Note how a static partitioning approach uses
      the code buffer poorly, leading to many unnecessary flushes.
      Reviewed-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      e8feb96f
    • Emilio G. Cota's avatar
      translate-all: use qemu_protect_rwx/none helpers · f51f315a
      Emilio G. Cota authored
      The helpers require the address and size to be page-aligned, so
      do that before calling them.
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      f51f315a
    • Emilio G. Cota's avatar
      tcg: distribute profiling counters across TCGContext's · c3fac113
      Emilio G. Cota authored
      This is groundwork for supporting multiple TCG contexts.
      
      To avoid scalability issues when profiling info is enabled, this patch
      makes the profiling info counters distributed via the following changes:
      
      1) Consolidate profile info into its own struct, TCGProfile, which
         TCGContext also includes. Note that tcg_table_op_count is brought
         into TCGProfile after dropping the tcg_ prefix.
      2) Iterate over the TCG contexts in the system to obtain the total counts.
      
      This change also requires updating the accessors to TCGProfile fields to
      use atomic_read/set whenever there may be conflicting accesses (as defined
      in C11) to them.
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      c3fac113
    • Emilio G. Cota's avatar
      tcg: define tcg_init_ctx and make tcg_ctx a pointer · b1311c4a
      Emilio G. Cota authored
      Groundwork for supporting multiple TCG contexts.
      
      The core of this patch is this change to tcg/tcg.h:
      
      > -extern TCGContext tcg_ctx;
      > +extern TCGContext tcg_init_ctx;
      > +extern TCGContext *tcg_ctx;
      
      Note that for now we set *tcg_ctx to whatever TCGContext is passed
      to tcg_context_init -- in this case &tcg_init_ctx.
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      b1311c4a
    • Emilio G. Cota's avatar
      tcg: take tb_ctx out of TCGContext · 44ded3d0
      Emilio G. Cota authored
      Groundwork for supporting multiple TCG contexts.
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Reviewed-by: 's avatarAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      44ded3d0
    • Emilio G. Cota's avatar
      translate-all: report correct avg host TB size · f19c6cc6
      Emilio G. Cota authored
      Since commit 6e3b2bfd ("tcg: allocate TB structs before the
      corresponding translated code") we are not fully utilizing
      code_gen_buffer for translated code, and therefore are
      incorrectly reporting the amount of translated code as well as
      the average host TB size. Address this by:
      
      - Making the conscious choice of misreporting the total translated code;
        doing otherwise would mislead users into thinking "-tb-size" is not
        honoured.
      
      - Expanding tb_tree_stats to accurately count the bytes of translated code on
        the host, and using this for reporting the average tb host size,
        as well as the expansion ratio.
      
      In the future we might want to consider reporting the accurate numbers for
      the total translated code, together with a "bookkeeping/overhead" field to
      account for the TB structs.
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      f19c6cc6
    • Emilio G. Cota's avatar
      exec-all: rename tb_free to tb_remove · be1e0117
      Emilio G. Cota authored
      We don't really free anything in this function anymore; we just remove
      the TB from the binary search tree.
      Suggested-by: 's avatarAlex Bennée <alex.bennee@linaro.org>
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      be1e0117
    • Emilio G. Cota's avatar
      translate-all: use a binary search tree to track TBs in TBContext · 2ac01d6d
      Emilio G. Cota authored
      This is a prerequisite for supporting multiple TCG contexts, since
      we will have threads generating code in separate regions of
      code_gen_buffer.
      
      For this we need a new field (.size) in struct tb_tc to keep
      track of the size of the translated code. This field uses a size_t
      to avoid adding a hole to the struct, although really an unsigned
      int would have been enough.
      
      The comparison function we use is optimized for the common case:
      insertions. Profiling shows that upon booting debian-arm, 98%
      of comparisons are between existing tb's (i.e. a->size and b->size
      are both !0), which happens during insertions (and removals, but
      those are rare). The remaining cases are lookups. From reading the glib
      sources we see that the first key is always the lookup key. However,
      the code does not assume this to always be the case because this
      behaviour is not guaranteed in the glib docs. However, we embed
      this knowledge in the code as a branch hint for the compiler.
      
      Note that tb_free does not free space in the code_gen_buffer anymore,
      since we cannot easily know whether the tb is the last one inserted
      in code_gen_buffer. The next patch in this series renames tb_free
      to tb_remove to reflect this.
      
      Performance-wise, lookups in tb_find_pc are the same as before:
      O(log n). However, insertions are O(log n) instead of O(1), which
      results in a small slowdown when booting debian-arm:
      
      Performance counter stats for 'build/arm-softmmu/qemu-system-arm \
      	-machine type=virt -nographic -smp 1 -m 4096 \
      	-netdev user,id=unet,hostfwd=tcp::2222-:22 \
      	-device virtio-net-device,netdev=unet \
      	-drive file=img/arm/jessie-arm32.qcow2,id=myblock,index=0,if=none \
      	-device virtio-blk-device,drive=myblock \
      	-kernel img/arm/aarch32-current-linux-kernel-only.img \
      	-append console=ttyAMA0 root=/dev/vda1 \
      	-name arm,debug-threads=on -smp 1' (10 runs):
      
      - Before:
      
             8048.598422      task-clock (msec)         #    0.931 CPUs utilized            ( +-  0.28% )
                  16,974      context-switches          #    0.002 M/sec                    ( +-  0.12% )
                       0      cpu-migrations            #    0.000 K/sec
                  10,125      page-faults               #    0.001 M/sec                    ( +-  1.23% )
          35,144,901,879      cycles                    #    4.367 GHz                      ( +-  0.14% )
         <not supported>      stalled-cycles-frontend
         <not supported>      stalled-cycles-backend
          65,758,252,643      instructions              #    1.87  insns per cycle          ( +-  0.33% )
          10,871,298,668      branches                  # 1350.707 M/sec                    ( +-  0.41% )
             192,322,212      branch-misses             #    1.77% of all branches          ( +-  0.32% )
      
             8.640869419 seconds time elapsed                                          ( +-  0.57% )
      
      - After:
             8146.242027      task-clock (msec)         #    0.923 CPUs utilized            ( +-  1.23% )
                  17,016      context-switches          #    0.002 M/sec                    ( +-  0.40% )
                       0      cpu-migrations            #    0.000 K/sec
                  18,769      page-faults               #    0.002 M/sec                    ( +-  0.45% )
          35,660,956,120      cycles                    #    4.378 GHz                      ( +-  1.22% )
         <not supported>      stalled-cycles-frontend
         <not supported>      stalled-cycles-backend
          65,095,366,607      instructions              #    1.83  insns per cycle          ( +-  1.73% )
          10,803,480,261      branches                  # 1326.192 M/sec                    ( +-  1.95% )
             195,601,289      branch-misses             #    1.81% of all branches          ( +-  0.39% )
      
             8.828660235 seconds time elapsed                                          ( +-  0.38% )
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      2ac01d6d
    • Richard Henderson's avatar
      tcg: Remove CF_IGNORE_ICOUNT · 416986d3
      Richard Henderson authored
      Now that we have curr_cflags, we can include CF_USE_ICOUNT
      early and then remove it as necessary.
      Reviewed-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      416986d3
    • Emilio G. Cota's avatar
      cpu-exec: lookup/generate TB outside exclusive region during step_atomic · ac03ee53
      Emilio G. Cota authored
      Now that all code generation has been converted to check CF_PARALLEL, we can
      generate !CF_PARALLEL code without having yet set !parallel_cpus --
      and therefore without having to be in the exclusive region during
      cpu_exec_step_atomic.
      
      While at it, merge cpu_exec_step into cpu_exec_step_atomic.
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      ac03ee53
    • Emilio G. Cota's avatar
      tcg: check CF_PARALLEL instead of parallel_cpus · e82d5a24
      Emilio G. Cota authored
      Thereby decoupling the resulting translated code from the current state
      of the system.
      
      The tb->cflags field is not passed to tcg generation functions. So
      we add a field to TCGContext, storing there a copy of tb->cflags.
      
      Most architectures have <= 32 registers, which results in a 4-byte hole
      in TCGContext. Use this hole for the new field.
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      e82d5a24
    • Emilio G. Cota's avatar
      tcg: convert tb->cflags reads to tb_cflags(tb) · c5a49c63
      Emilio G. Cota authored
      Convert all existing readers of tb->cflags to tb_cflags, so that we
      use atomic_read and therefore avoid undefined behaviour in C11.
      
      Note that the remaining setters/getters of the field are protected
      by tb_lock, and therefore do not need conversion.
      
      Luckily all readers access the field via 'tb->cflags' (so no foo.cflags,
      bar->cflags in the code base), which makes the conversion easily
      scriptable:
      
      FILES=$(git grep 'tb->cflags' target include/exec/gen-icount.h \
      	 accel/tcg/translator.c | cut -f1 -d':' | sort | uniq)
      
      perl -pi -e 's/([^.>])tb->cflags/$1tb_cflags(tb)/g' $FILES
      perl -pi -e 's/([a-z->.]*)(->|\.)tb->cflags/tb_cflags($1$2tb)/g' $FILES
      
      Then manually fixed the few errors that checkpatch reported.
      
      Compile-tested for all targets.
      Suggested-by: 's avatarRichard Henderson <rth@twiddle.net>
      Reviewed-by: 's avatarRichard Henderson <rth@twiddle.net>
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      c5a49c63
    • Richard Henderson's avatar
      tcg: Add CPUState cflags_next_tb · 9b990ee5
      Richard Henderson authored
      We were generating code during tb_invalidate_phys_page_range,
      check_watchpoint, cpu_io_recompile, and (seemingly) discarding
      the TB, assuming that it would magically be picked up during
      the next iteration through the cpu_exec loop.
      
      Instead, record the desired cflags in CPUState so that we request
      the proper TB so that there is no more magic.
      Reviewed-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      9b990ee5
    • Emilio G. Cota's avatar
      tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK · 4e2ca83e
      Emilio G. Cota authored
      This will enable us to decouple code translation from the value
      of parallel_cpus at any given time. It will also help us minimize
      TB flushes when generating code via EXCP_ATOMIC.
      
      Note that the declaration of parallel_cpus is brought to exec-all.h
      to be able to define there the "curr_cflags" inline.
      Signed-off-by: 's avatarEmilio G. Cota <cota@braap.org>
      Signed-off-by: 's avatarRichard Henderson <richard.henderson@linaro.org>
      4e2ca83e
  17. 20 Oct, 2017 1 commit
    • David Hildenbrand's avatar
      accel/tcg: allow to invalidate a write TLB entry immediately · f52bfb12
      David Hildenbrand authored
      Background: s390x implements Low-Address Protection (LAP). If LAP is
      enabled, writing to effective addresses (before any translation)
      0-511 and 4096-4607 triggers a protection exception.
      
      So we have subpage protection on the first two pages of every address
      space (where the lowcore - the CPU private data resides).
      
      By immediately invalidating the write entry but allowing the caller to
      continue, we force every write access onto these first two pages into
      the slow path. we will get a tlb fault with the specific accessed
      addresses and can then evaluate if protection applies or not.
      
      We have to make sure to ignore the invalid bit if tlb_fill() succeeds.
      Signed-off-by: 's avatarDavid Hildenbrand <david@redhat.com>
      Message-Id: <20171016202358.3633-2-david@redhat.com>
      Signed-off-by: 's avatarCornelia Huck <cohuck@redhat.com>
      f52bfb12
  18. 18 Oct, 2017 4 commits