Compared performance of Varnish Cache on x86_64 and aarch64
    Martin Grigorov 
    martin.grigorov at gmail.com
       
    Wed Aug  5 09:17:11 UTC 2020
    
    
  
Hi Guillaume,
On Tue, Aug 4, 2020 at 5:47 PM Guillaume Quintard <
guillaume at varnish-software.com> wrote:
> Hi,
>
> > Varnish gives around 20% less throughput than the Golang HTTP server but
> I guess this is because the Golang server is much simpler than Varnish.
>
> Since the backend and vegeta are written in go, it's pretty safe they are
> going to use H/2 by default, and that's not the case for your varnish
> instance, so that possibly explain some of the differences you are seeing.
>
To use H/2 one has to use -http2 parameter (
https://github.com/tsenart/vegeta#-http2)
In addition I'd need to start the HTTP server with
svr.ListenAndServeTLS(cert, key)
I've added "log.Printf("Protocol: %s", r.Proto)" to the handle function and
it prints "HTTP/1.1" no matter whether I use -http2 parameter for Vegeta or
not
>
> Cheers,
>
> --
> Guillaume Quintard
>
>
> On Tue, Aug 4, 2020 at 4:33 AM Martin Grigorov <martin.grigorov at gmail.com>
> wrote:
>
>> Hi,
>>
>> I've updated the data in the article -
>> https://medium.com/@martin.grigorov/compare-varnish-cache-performance-on-x86-64-and-aarch64-cpu-architectures-cef5ad5fee5f
>> Now x86_64 and aarch64 are almost the same!
>> Varnish gives around 20% less throughput than the Golang HTTP server but
>> I guess this is because the Golang server is much simpler than Varnish.
>>
>> 3 min run produces around 3GB of Vegeta reports (130MB gzipped). If
>> anyone wants me to extract some extra data just let me know!
>>
>> Regards,
>> Martin
>>
>> On Mon, Aug 3, 2020 at 6:14 PM Martin Grigorov <martin.grigorov at gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Thank you all for the feedback!
>>> After some debugging it appeared that it is a bug in wrk - most of the
>>> requests' latencies were 0 in the raw reports.
>>>
>>> I've looked for a better maintained HTTP load testing tool and I liked
>>> https://github.com/tsenart/vegeta. it provides (correctly looking)
>>> statistics, can measure latencies while using constant rate, and last but
>>> not least can produce plot charts!
>>> I will update my article and let you know once I'm done!
>>>
>>> Regards,
>>> Martin
>>>
>>> On Fri, Jul 31, 2020 at 4:43 PM Pål Hermunn Johansen <
>>> hermunn at varnish-software.com> wrote:
>>>
>>>> I am sorry for being so late to the game, but here it goes:
>>>>
>>>> ons. 29. jul. 2020 kl. 14:12 skrev Poul-Henning Kamp <
>>>> phk at phk.freebsd.dk>:
>>>> > Your measurement says that there is 2/3 chance that the latency
>>>> > is between:
>>>> >
>>>> >         655.40µs - 798.70µs     = -143.30µs
>>>> >
>>>> > and
>>>> >         655.40µs + 798.70µs     = 1454.10µs
>>>>
>>>> No, it does not. There is no claim anywhere that the numbers are
>>>> following a normal distribution or an approximation of it. Of course,
>>>> the calculations you do demonstrate that the data is far from normally
>>>> distributed (as expected).
>>>>
>>>> > You cannot conclude _anything_ from those numbers.
>>>>
>>>> There are two numbers, the average and the standard deviation, and
>>>> they are calculated from the data, but the truth is hidden deeper in
>>>> the data. By looking at the particular numbers, I agree completely
>>>> that it is wrong to conclude that one is better than the other. I am
>>>> not saying that the statements in the article are false, just that you
>>>> do not have data to draw the conclusions.
>>>>
>>>> Furthermore I have to say that Geoff got things right (see below). As
>>>> a mathematician, I have to say that statistics is hard, and trusting
>>>> the output of wrk to draw conclusions is outright the wrong thing to
>>>> do.
>>>>
>>>> In this case we have a luxury which you typically do not have: Data is
>>>> essentially free. You can run many tests and you can run short or long
>>>> tests with different parameters. A 30 second test is simply not enough
>>>> for anything.
>>>>
>>>> As Geoff indicated, for each transaction you can extract many relevant
>>>> values from varnishlog, with the status, hit/miss, time to first byte
>>>> and time to last byte being the most obvious ones. They can be
>>>> extracted and saved to a csv file by using varnishncsa with a custom
>>>> format string, and you can use R (used it myself as a tool in my
>>>> previous job - not a fan) to do statistical analysis on the data. The
>>>> Student T suggestion from Geoff is a good idea, but just looking at
>>>> one set of numbers without considering other factors is mathematically
>>>> problematic.
>>>>
>>>> Anyway, some obvious questions then arise. For example:
>>>> - How do the numbers between wrk and varnishlog/varnishncsa compare?
>>>> Did wrk report a total number of transactions than varnish? If there
>>>> is a discrepancy, then the errors might be because of some resource
>>>> restraint (number of sockets or dropped syn packages?).
>>>> - How does the average and maximum compare between varnish and wrk?
>>>> - What is the CPU usage of the kernel, the benchmarking tool and the
>>>> varnish processes in the tests?
>>>> - What is the difference between the time to first byte and the time
>>>> to last byte in Varnish for different object sizes?
>>>>
>>>> When Varnish writes to a socket, it hands bytes over to the kernel,
>>>> and when the write call returns, we do not know how far the bytes have
>>>> come, and how long it will take before they get to the final
>>>> destination. The bytes may be in a kernel buffer, they might be on the
>>>> network card, and they might be already received at the client's
>>>> kernel, and they might have made it all into wrk (which may or may not
>>>> have timestamped the response). Typically, depending on many things,
>>>> Varnish will report faster times than what wrk, but since returning
>>>> from the write call means that the calling thread must be rescheduled,
>>>> it is even possible that wrk will see that some requests are faster
>>>> than what Varnish reports. Running wrk2 with different speeds in a
>>>> series of tests seems natural to me, so that you can observe when (and
>>>> how) the system starts running into bottlenecks. Note that the
>>>> bottleneck can just as well be in wrk2 itself or on the combined CPU
>>>> usage of kernel + Varnish + wrk2.
>>>>
>>>> To complicate things even further: On your ARM vs. x64 tests, my guess
>>>> is that both kernel parameters and parameters for the network are
>>>> different, and the distributions probably have good reason to choose
>>>> different values. It is very likely that these differences affect the
>>>> performance of the systems in many ways, and that different tests will
>>>> have different "optimal" tunings of kernel and network parameters.
>>>>
>>>> Sorry for rambling, but getting the statistics wrong is so easy. The
>>>> question is very interesting, but if you want to draw conclusions, you
>>>> should do the analysis, and (ideally) give access to the raw data in
>>>> case anyone wants to have a look.
>>>>
>>>> Best,
>>>> Pål
>>>>
>>>> fre. 31. jul. 2020 kl. 08:45 skrev Geoff Simmons <geoff at uplex.de>:
>>>> >
>>>> > On 7/28/20 13:52, Martin Grigorov wrote:
>>>> > >
>>>> > > I've just posted an article [1] about comparing the performance of
>>>> Varnish
>>>> > > Cache on two similar
>>>> > > machines - the main difference is the CPU architecture - x86_64 vs
>>>> aarch64.
>>>> > > It uses a specific use case - the backend service just returns a
>>>> static
>>>> > > content. The idea is
>>>> > > to compare Varnish on the different architectures but also to
>>>> compare
>>>> > > Varnish against the backend HTTP server.
>>>> > > What is interesting is that Varnish gives the same throughput as the
>>>> > > backend server on x86_64 but on aarch64 it is around 30% slower
>>>> than the
>>>> > > backend.
>>>> >
>>>> > Does your test have an account of whether there were any errors in
>>>> > backend fetches? Don't know if that explains anything, but with a
>>>> > connect timeout of 10s and first byte timeout of 5m, any error would
>>>> > have a considerable effect on the results of a 30 second test.
>>>> >
>>>> > The test tool output doesn't say anything I can see about error rates
>>>> --
>>>> > whether all responses had status 200, and if not, how many had which
>>>> > other status. Ideally it should be all 200, otherwise the results may
>>>> > not be valid.
>>>> >
>>>> > I agree with phk that a statistical analysis is needed for a robust
>>>> > statement about differences between the two platforms. For that, you'd
>>>> > need more than the summary stats shown in your blog post -- you need
>>>> to
>>>> > collect all of the response times. What I usually do is query Varnish
>>>> > client request logs for Timestamp:Resp and save the number in the last
>>>> > column.
>>>> >
>>>> > t.test() in R runs Student's t-test (me R fanboi).
>>>> >
>>>> >
>>>>
>>> _______________________________________________
>> varnish-dev mailing list
>> varnish-dev at varnish-cache.org
>> https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.varnish-cache.org/lists/pipermail/varnish-dev/attachments/20200805/d2e43cc4/attachment-0001.html>
    
    
More information about the varnish-dev
mailing list