- Jan 25, 2019
- Jan 24, 2019
-
-
Qi Wang authored
This feature uses an dedicated arena to handle huge requests, which significantly improves VM fragmentation. In production workload we tested it often reduces VM size by >30%.
-
Qi Wang authored
This helps us avoid issues with size based routing (i.e. the huge_threshold feature).
-
Edward Tomasz Napierala authored
-
Qi Wang authored
-
Qi Wang authored
The rate calculation for the total row was missing.
-
- Jan 16, 2019
- Jan 14, 2019
-
-
Jason Evans authored
This reverts commit 646af596.
-
Jason Evans authored
This reverts commit fc13a7f1.
-
- Jan 12, 2019
- Jan 11, 2019
-
-
Jason Evans authored
The --branch parameter is unnecessary, and may avoid problems when testing directly on the dev branch.
-
Jason Evans authored
-
Li-Wen Hsu authored
-
- Jan 09, 2019
-
-
Faidon Liambotis authored
This automatically adds -latomic if and when needed, e.g. on riscv64 systems. Fixes #1401.
-
- Jan 08, 2019
-
-
Leonardo Santagada authored
-
- Dec 19, 2018
-
-
John Ericson authored
My distro offers a custom toolchain where it's not possible to make static libs, so it's insufficient to just delete the libs I don't want. I actually need to avoid building them in the first place.
-
Qi Wang authored
Add exten_arena_ind_get() to avoid loading the actual arena ptr in case we just need to check arena matching.
-
Qi Wang authored
-
- Dec 18, 2018
-
-
Alexander Zinoviev authored
-
- Dec 08, 2018
-
-
Qi Wang authored
With sharded bins, we may not flush all items from the same arena in one run. Adjust the stats merging logic accordingly.
-
- Dec 04, 2018
-
-
Qi Wang authored
-
Qi Wang authored
This avoids having to choose bin shard on the fly, also will allow flexible bin binding for each thread.
-
Qi Wang authored
-
Qi Wang authored
The option uses the same format as "slab_sizes" to specify number of shards for each bin size.
-
Qi Wang authored
This makes it possible to have multiple set of bins in an arena, which improves arena scalability because the bins (especially the small ones) are always the limiting factor in production workload. A bin shard is picked on allocation; each extent tracks the bin shard id for deallocation. The shard size will be determined using runtime options.
-
- Nov 29, 2018
-
-
Dave Watson authored
If there are 3 or more threads spin-waiting on the same mutex, there will be excessive exclusive cacheline contention because pthread_trylock() immediately tries to CAS in a new value, instead of first checking if the lock is locked. This diff adds a 'locked' hint flag, and we will only spin wait without trylock()ing while set. I don't know of any other portable way to get the same behavior as pthread_mutex_lock(). This is pretty easy to test via ttest, e.g. ./ttest1 500 3 10000 1 100 Throughput is nearly 3x as fast. This blames to the mutex profiling changes, however, we almost never have 3 or more threads contending in properly configured production workloads, but still worth fixing.
-
- Nov 16, 2018
-
-
Qi Wang authored
The setting has been tested in production for a while. No negative effect while we were able to reduce number of threads per process.
-
- Nov 14, 2018
-
-
Qi Wang authored
-
Dave Watson authored
Also adds a configure.ac check for __builtin_popcount, which is used in the new fastpath.
-
Dave Watson authored
-
Dave Watson authored
Refactor tcache_fill, introducing a new function arena_slab_reg_alloc_batch, which will fill multiple pointers from a slab. There should be no functional changes here, but allows future optimization on reg_alloc_batch.
-
- Nov 13, 2018
- Nov 12, 2018
-
-
Dave Watson authored
Add unsized and sized deallocation fastpaths. Similar to the malloc() fastpath, this removes all frame manipulation for the majority of free() calls. The performance advantages here are less than that of the malloc() fastpath, but from prod tests seems to still be half a percent or so of improvement. Stats and sampling a both supported (sdallocx needs a sampling check, for rtree lookups slab will only be set for unsampled objects). We don't support flush, any flush requests go to the slowpath.
-
Dave Watson authored
Add a cache_bin_dalloc_easy (to match the alloc_easy function), and use it in tcache_dalloc_small. It will also be used in the new free fastpath.
-
Dave Watson authored
For a free fastpath, we want something that will not make additional calls. Assume most free() calls will hit the L1 cache, and use a custom rtree function for this. Additionally, roll the ptr=NULL check in to the rtree cache check.
-
- Nov 09, 2018
-
-
Edward Tomasz Napierala authored
It was removed in 0771ff2c. Add a comment explaining its purpose.
-