Commits · 374dc30d3dc6c5b664fda9b1fa0510559e568b6a · ALEIX ROCA NONELL / jemalloc-mod

Jan 25, 2019
- Update copyright dates. · 374dc30d
  Qi Wang authored Jan 24, 2019
  
  374dc30d
- Rename huge_threshold to oversize_threshold. · e3db480f
  Qi Wang authored Jan 24, 2019
```
The keyword huge tend to remind people of huge pages which is not relevent to
the feature.
```
  e3db480f
Jan 24, 2019
- Set huge_threshold to 8M by default. · 350809dc
  Qi Wang authored Jan 16, 2019
```
This feature uses an dedicated arena to handle huge requests, which
significantly improves VM fragmentation.  In production workload we tested it
often reduces VM size by >30%.
```
  350809dc
- Explicitly use arena 0 in alignment and OOM tests. · d3145014
  Qi Wang authored Jan 18, 2019
```
This helps us avoid issues with size based routing (i.e. the huge_threshold
feature).
```
  d3145014
- Mention different mmap(2) behaviour with retain:true. · a7b0a124
  Edward Tomasz Napierala authored Nov 30, 2018
  
  a7b0a124
- Tweak the spacing for nrequests in stats output. · 522d1e7b
  Qi Wang authored Jan 18, 2019
  
  522d1e7b
- Fix stats output (rate for total # of requests). · 8c957137
  Qi Wang authored Jan 18, 2019
```
The rate calculation for the total row was missing.
```
  8c957137
Jan 16, 2019

Un-experimental the huge_threshold feature. · 7a815c1b
Qi Wang authored Jan 15, 2019

7a815c1b

Avoid creating bg thds for huge arena lone. · bbe8e6a9

Qi Wang authored Jan 14, 2019

For low arena count settings, the huge threshold feature may trigger an unwanted
bg thd creation. Given that the huge arena does eager purging by default,
bypass bg thd creation when initializing the huge arena.

bbe8e6a9

Jan 14, 2019
- Revert "Customize cloning to include tags so that VERSION is valid." · b6f1f266
  Jason Evans authored Jan 14, 2019
```
This reverts commit 646af596.
```
  b6f1f266
- Revert "Remove --branch=${CIRRUS_BASE_BRANCH} in git clone command." · 225d8999
  Jason Evans authored Jan 14, 2019
```
This reverts commit fc13a7f1.
```
  225d8999
Jan 12, 2019

Avoid potential issues on extent zero-out. · f459454a

Qi Wang authored Aug 10, 2018

When custom extent_hooks or transparent huge pages are in use, the purging
semantics may change, which means we may not get zeroed pages on repopulating.
Fixing the issue by manually memset for such cases.

f459454a

Force purge on thread death only when w/o bg thds. · 0ecd5add
Qi Wang authored Jan 11, 2019

0ecd5add

Jan 11, 2019
- Remove --branch=${CIRRUS_BASE_BRANCH} in git clone command. · fc13a7f1
  Jason Evans authored Jan 11, 2019
```
The --branch parameter is unnecessary, and may avoid problems when
testing directly on the dev branch.
```
  fc13a7f1
- Customize cloning to include tags so that VERSION is valid. · 646af596
  Jason Evans authored Jan 09, 2019
  
  646af596
- Add Cirrus-CI config for FreeBSD builds · 6910fcb2
  Li-Wen Hsu authored Jan 04, 2019
  
  6910fcb2
Jan 09, 2019

Replace -lpthread with -pthread · 47119107

Faidon Liambotis authored Jan 08, 2019

This automatically adds -latomic if and when needed, e.g. on riscv64
systems.

Fixes #1401.

47119107

Jan 08, 2019
- implement malloc_getcpu for windows · daa0e436
  Leonardo Santagada authored Oct 31, 2018
  
  daa0e436
Dec 19, 2018

Add --{enable,disable}-{static,shared} to configure script · 4e920d2c

John Ericson authored Dec 14, 2018

My distro offers a custom toolchain where it's not possible to make
static libs, so it's insufficient to just delete the libs I don't want.
I actually need to avoid building them in the first place.

4e920d2c

Only read arena index from extent on the tcache flush path. · 7241bf5b

Qi Wang authored Dec 03, 2018

Add exten_arena_ind_get() to avoid loading the actual arena ptr in case we just
need to check arena matching.

7241bf5b

Add unit test for producer-consumer pattern. · 441335d9
Qi Wang authored Dec 17, 2018

441335d9

Dec 18, 2018
- Add rate counters to stats · 36de5189
  Alexander Zinoviev authored Dec 10, 2018
  
  36de5189
Dec 08, 2018

Fix incorrect stats mreging with sharded bins. · 99f4eefb

Qi Wang authored Dec 07, 2018

With sharded bins, we may not flush all items from the same arena in one run.
Adjust the stats merging logic accordingly.

99f4eefb

Dec 04, 2018

Add unit test for sharded bins. · 711a61f3
Qi Wang authored Nov 28, 2018

711a61f3

Store the bin shard selection in TSD. · 98b56ab2

Qi Wang authored Nov 27, 2018

This avoids having to choose bin shard on the fly, also will allow flexible bin
binding for each thread.

98b56ab2

Add stats for arenas.bin.i.nshards. · 45bb4483
Qi Wang authored Nov 21, 2018

45bb4483
Add opt.bin_shards to specify number of bin shards. · 3f9f2833
Qi Wang authored Nov 20, 2018
```
The option uses the same format as "slab_sizes" to specify number of shards for
each bin size.
```
3f9f2833

Add support for sharded bins within an arena. · 37b89139

Qi Wang authored Nov 12, 2018

This makes it possible to have multiple set of bins in an arena, which improves
arena scalability because the bins (especially the small ones) are always the
limiting factor in production workload.

A bin shard is picked on allocation; each extent tracks the bin shard id for
deallocation. The shard size will be determined using runtime options.

37b89139

Nov 29, 2018

mutex: fix trylock spin wait contention · b23336af

Dave Watson authored Nov 26, 2018

If there are 3 or more threads spin-waiting on the same mutex,
there will be excessive exclusive cacheline contention because
pthread_trylock() immediately tries to CAS in a new value, instead
of first checking if the lock is locked.

This diff adds a 'locked' hint flag, and we will only spin wait
without trylock()ing while set.  I don't know of any other portable
way to get the same behavior as pthread_mutex_lock().

This is pretty easy to test via ttest, e.g.

./ttest1 500 3 10000 1 100

Throughput is nearly 3x as fast.

This blames to the mutex profiling changes, however, we almost never
have 3 or more threads contending in properly configured production
workloads, but still worth fixing.

b23336af

Nov 16, 2018

Set the default number of background threads to 4. · c4063ce4

Qi Wang authored Nov 15, 2018

The setting has been tested in production for a while.  No negative effect while
we were able to reduce number of threads per process.

c4063ce4

Nov 14, 2018
- Deprecate OSSpinLock. · 43f3b1ad
  Qi Wang authored Nov 08, 2018
  
  43f3b1ad
- Add a fastpath for arena_slab_reg_alloc_batch · 13c237c7
  Dave Watson authored Oct 29, 2018
```
Also adds a configure.ac check for __builtin_popcount, which is used
in the new fastpath.
```
  13c237c7
- add extent_nfree_sub · 17aa4707
  Dave Watson authored Oct 29, 2018
  
  17aa4707
- arena: Refactor tcache_fill to batch fill from slab · 4b82872e
  Dave Watson authored Oct 18, 2018
```
Refactor tcache_fill, introducing a new function arena_slab_reg_alloc_batch,
which will fill multiple pointers from a slab.

There should be no functional changes here, but allows future optimization
on reg_alloc_batch.
```
  4b82872e
Nov 13, 2018
- Avoid touching all pages in extent_recycle for debug build. · 57553c3b
  Qi Wang authored Nov 12, 2018
```
We may have a large number of pages with *zero set (since they are populated on
demand).  Only check the first page to avoid paging in all of them.
```
  57553c3b
- Fix tcache_flush (follow up cd2931ad). · 1f561157
  Qi Wang authored Nov 09, 2018
```
Also catch invalid tcache id.
```
  1f561157
Nov 12, 2018

Add a free() and sdallocx(where flags=0) fastpath · 794e29c0

Dave Watson authored Oct 18, 2018

Add unsized and sized deallocation fastpaths. Similar to the malloc()
fastpath, this removes all frame manipulation for the majority of
free() calls. The performance advantages here are less than that
of the malloc() fastpath, but from prod tests seems to still be half
a percent or so of improvement.

Stats and sampling a both supported (sdallocx needs a sampling check,
for rtree lookups slab will only be set for unsampled objects).

We don't support flush, any flush requests go to the slowpath.

794e29c0

refactor tcache_dalloc_small · e2ab2153

Dave Watson authored Oct 18, 2018

Add a cache_bin_dalloc_easy (to match the alloc_easy function),
and use it in tcache_dalloc_small.  It will also be used in the
new free fastpath.

e2ab2153

rtree: add rtree_szind_slab_read_fast · 5e795297

Dave Watson authored Oct 18, 2018

For a free fastpath, we want something that will not make additional
calls.  Assume most free() calls will hit the L1 cache, and use
a custom rtree function for this.

Additionally, roll the ptr=NULL check in to the rtree cache check.

5e795297

Nov 09, 2018
- Restore a FreeBSD-specific getpagesize(3) optimization. · a4c6b9ae
  Edward Tomasz Napierala authored Oct 25, 2018
```
It was removed in 0771ff2c.
Add a comment explaining its purpose.
```
  a4c6b9ae