Simics microarchitectural interface pdf




















The way to simulate a language for modelling devices. This is not a precise Simics, that also provides effective tools for debugging way to model contention, but it provides an adequate and profiling. There are other simulators, such as GEMS simulation of the contention behaviour. Thus, in our [24] or TFSim [25], based in Simics, that provide model Figure 2 , the north bridge and the buses use accurate timing models but they are focused to specific timing models and do not only act as connectors.

In the systems. For instance, GEMS is a Simics based simulator ideal case, where no timing models are used, for Sparc-based computers. Simics simulation model by defining two customized machines and a standard Ethernet network connecting System CPU Communications them in the same way we could have in the real world. Model for offloading simulation scaled model avoiding to slow down the simulation work. So, at system. We have assumed a memory 10 times slower hardware level, transferences between these memory than the CPU.

The model is shown in Figure 1. We add a north bridge in our architecture in order to simulate a real and standard machine in which we can install a standard operating system i.

Model for non-offloading simulation through the Simics Central module. The 6 This way, we have isolated and a gigabit network. In Figure 3, the loads of CPU0 in the processor in the NIC to remove it from cross-call the no-offloaded and offloaded cases are compared.

Furthermore, since the processor in the NIC and the host processor have their own memory spaces that can be accessed concurrently, the host processor is not slowed down by memory or bus traffic. This way, using the kernel 2. They are Linux Message Size Kbytes lightweight objects that allow us the machine partition. Figure 3. It consists of 40 two parts: a protocol independent driver, and a protocol 30 specific communication section. The communication 20 section depends on the specific protocol used, since it 10 implements the connection and transferring functions, whereas the driver remains the same.

For each 0 0 Message Size Kbytes measurement, netpipe increments the block size Figure 4. Decrease in the number of interrupts following its own algorithm. For UDP measurements we have used netperf [29]. The curves of Figure 4 provide the benefit on CPU In our experiments we have used optimized network time associated to the interrupts when the communication parameters in order to achieve the maximum throughput protocol is offloaded.

This could be avoided receives the same number of interrupts. So, with regards to the offloading effects in the having a non-ideal connection between the CPU0 and overall performance, as more cycles are required for CPU1. In order to simulate this, we have introduced the protocol processing, higher is the improvement in the corresponding timing models in the NIC bus and in the time spent in interrupt servicing less interrupts and less memory accesses from the processor of the NIC.

CPU time spent processing them. In Figure 8, it is apparent in the case of TCP. Figures 5 to 8. These graphs provide the throughput for each transfer block size and the maximum attainable 10 throughput as well as the latency. In the notation, Offload x, x is the number of delays, Offload 5 with respect to a reference value, to access memory from CPU1. As we can see, memory latency is an important bottleneck in the communications path, and its effect is more noticeable for small packets. Therefore, the implementation of techniques that 10 8 improves memory accesses, such as DMA or any improved DMA technique [30] are necessary in order to obtain benefits from any protocol offloading technique.

Saturation points for different offloading alternatives 2 Offload 2 5 Offload 1 x 10 2. Figure 6b. Saturation graph. Latency detail Higher throughputs do not imply lower latencies. In the test we have performed, TCP parameters have been optimized, and so, this is the reason why the improvement obtained with offloading could be less noticeable then other effects such as throughput improvement. This can be seen in Figure 5. However, in Table 1, we can see the latency improvement and in Figure 6a the saturation point in both offloaded and non- offloaded TCP cases, in which we can see how the saturation point depends on the offloading capabilities.

In figure 6b we can see the improvement in latency. Otherwise, offloading 10 No Offload could even diminish the performance.

Although Simics presents 5 some limitations and it is possible to use other simulators 4 for our purposes, the resources provided by Simics for device modelling and the debugging facilities, make 3 Simics an appropriate tool. Moreover, it allows a relative 2 fast simulation of the different models. Throughput improvement buses, but also in more realistic situations, in which memory latencies and non-ideal buses are modelled.

In Figure 7, the throughputs for different latency Thanks to the Simics model, it is possible to analyse the values in the NIC accesses are shown and as we can see, most important parameters and the conditions in which the memory latency is decisive in the performance offloading determines greater improvements in the obtained.

The lower throughputs obtained in the case of overall communication performance. Moreover it NIC processor in the performance of protocol offloading. In order to utilization.

On the other side, we also present results that do the corresponding simulations, we have modified the show how the technology of the processor included in the step rate of the NIC processor. The curves in Figure 8 NIC affects the overall communication performance. This situation constitutes an evidence of the correctness of our Simics model for protocol offloading. Although this model can be sufficient to simulate the effects of caches, it still enforces Simics's concept of atomic, in-order execution.

Simics Micro Architectural Interface MAI was designed to overcome these limitations while keeping the power of a functional full-system simulator.

Using MAI, Simics can model the timing behavior of modern processors with deep pipelines and still run unmodified system-level software.

The basic idea behind MAI is to let the user decide when things happen, while Simics handles how things happen. A user module chooses when to fetch, decode, execute and commit instructions, using MAI to tell Simics to actually perform the actions. The use of a the contention behaviour. Thus, in our model Figure text serial console is due to a limitation in Simics that 2 , the north bridge and the buses use timing models at the moment is not able to have more than one and do not only act as connectors.

In the ideal case, machine running over a single Simics instance with where no timing models are used, transferences graphical consoles. It only can simulate and between CPUs and memory would not hold any other communicate several Simics instances through the transfer. Simics Central module. Furthermore, using text serial consoles thus avoiding the use of graphical consoles we reach a faster simulation.

Once we have two machines defined and networked, Simics allows an operating system to be installed over them. For our purposes we have used Debian Linux with a 2. Experimental results 90 80 In order to evaluate protocol offloading, we have used 70 several Simics and operating system features. This could be done with Linux cpuset, that avoid 10 0 to attach processes to that isolated CPU. CPU load comparison graph running applications and the operating system processes and another processor, CPU1, for running When the protocol is offloaded, the CPU0 load is the communication subsystem.

The described lower as it only executes the application that generates architecture, makes it possible the execution of a data. It consists of two parts: a protocol Figure 4. Decrease in the number of interrupts independent driver, and a protocol specific communication section. The communication section The curves of Figure 4 provide the percentages of depends on the specific protocol used, since it the interrupts requested to CPU0 in the non-offloading implements the connection and transferring functions, case that are not requested to the CPU0 when the whereas the driver remains the same.

For each communication protocol is offloaded. For instance, we are using MTU jumbo frames. This for UDP. So, with regards to the offloading effects in could be avoided using oversized TCP windows i. In Figure 3, the loads the benefits are more apparent in the case of TCP. Figures 5 and 6. These graphs provide the throughput for each transfer block size and the maximum attainable throughput.

Block Size Bytes As we can see, the memory latency is decisive in the Figure 5. Throughput comparison performance obtained. The lower throughputs obtained in the case of small block sizes are due to the ACKs To obtain the following results Figure 6 , we have required by TCP protocol in every block transference. In order to simulate this, we have introduced offloading. As this is one of the arguments to question the corresponding timing models in the NIC bus and in the protocol offloading benefits, this analysis is the memory accesses from the processor of the NIC.

In order to do the corresponding simulations, we have modified the step rate of the NIC processor. Throughput for different NIC processor speeds As we can see form Figure 7, the speed of the very slow NIC processor, the performance for protocol processor at the NIC CPU1 affects in a decisive way offloading is even worse than the performance without to the throughputs.

The performance gets worse as the offloading. So, it is clear that offloading improves the processor speed decreases. Otherwise, offloading 6. References could even diminish the performance. IEEE Computer, pp. February In this paper we have considered the use of Simics to [2] Binkert, N. Sixth analyze protocol offloading. February, Computer Networks, 41, improvement provided by offloading heavy protocols pp.

Thanks to the Simics model, it is [7] Cruz, R. IEEE Trans. November, Moreover it is shown [11] Clark, D. June, Message on that show how the technology of the processor TSV mailing list. April, analysis of our experimental results according to the Forum, VIDF, IEEE Micro. August,



0コメント

  • 1000 / 1000