| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
- Replace the old throttle multiplier with a divisor
- Fix up the logic for saved_pages_to_scan
- Fix a couple memory leaks and assorted tiny bugs
- Add a cpu_scales tuner to better configure rung CPU usage
Signed-off-by: Joe Maples <joe@frap129.org>
Conflicts:
mm/uksm.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
uksm: cleanups and scheduling changes
- Fix deprecated function warnings
- Use nice 15 instead of 5
- Use fixed periods instead of adjusting
- Rewrite pages_to_scan logic
- Modify kthread behavior to freeze more aggressively
uksm: fixups
- enforce usage limit across all rungs
- mark uksm_do_scan hot
- frob a few tuning knobs
uksm: back off during load
Signed-off-by: Joe Maples <joe@frap129.org>
Conflicts:
mm/uksm.c
|
| |
|
|
|
|
|
|
|
|
|
|
| |
gcc 4.9 produces this error:
mm/uksm.c: In function 'is_full_zero':
mm/uksm.c:169:23: warning: initialization discards 'const' qualifier from pointer target type
error, forbidden warning: uksm.c:169
Casting *src as a constant fixes this. Could also remove the const from the parameter.
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
|
| |
|
|
|
|
| |
This is already defined in include/linux/kernel.h, causing a compilation error; UKSM has not been updated for the later versions of 3.10
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
According to Hugh's suggestion, alloc_stable_node() with GFP_KERNEL can
in rare cases cause a hung task warning.
At present, if alloc_stable_node() allocation fails, two break_cows may
want to allocate a couple of pages, and the issue will come up when free
memory is under pressure.
We fix it by adding __GFP_HIGH to GFP, to grant access to memory
reserves, increasing the likelihood of allocation success.
[akpm@linux-foundation.org: tweak comment]
Link: http://lkml.kernel.org/r/1474354484-58233-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I hit the following hung task when runing a OOM LTP test case with 4.1
kernel.
Call trace:
[<ffffffc000086a88>] __switch_to+0x74/0x8c
[<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
[<ffffffc000a1c09c>] schedule+0x3c/0x94
[<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
[<ffffffc000a1e32c>] down_write+0x64/0x80
[<ffffffc00021f794>] __ksm_exit+0x90/0x19c
[<ffffffc0000be650>] mmput+0x118/0x11c
[<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
[<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
[<ffffffc0000d0f34>] get_signal+0x444/0x5e0
[<ffffffc000089fcc>] do_signal+0x1d8/0x450
[<ffffffc00008a35c>] do_notify_resume+0x70/0x78
The oom victim cannot terminate because it needs to take mmap_sem for
write while the lock is held by ksmd for read which loops in the page
allocator
ksm_do_scan
scan_get_next_rmap_item
down_read
get_next_rmap_item
alloc_rmap_item #ksmd will loop permanently.
There is no way forward because the oom victim cannot release any memory
in 4.1 based kernel. Since 4.6 we have the oom reaper which would solve
this problem because it would release the memory asynchronously.
Nevertheless we can relax alloc_rmap_item requirements and use
__GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
would just retry later after the lock got dropped.
Such a patch would be also easy to backport to older stable kernels which
do not have oom_reaper.
While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
by the allocation failure.
Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Suggested-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A concurrency issue about KSM in the function scan_get_next_rmap_item.
task A (ksmd): |task B (the mm's task):
|
mm = slot->mm; |
down_read(&mm->mmap_sem); |
|
... |
|
spin_lock(&ksm_mmlist_lock); |
|
ksm_scan.mm_slot go to the next slot; |
|
spin_unlock(&ksm_mmlist_lock); |
|mmput() ->
| ksm_exit():
|
|spin_lock(&ksm_mmlist_lock);
|if (mm_slot && ksm_scan.mm_slot != mm_slot) {
| if (!mm_slot->rmap_list) {
| easy_to_free = 1;
| ...
|
|if (easy_to_free) {
| mmdrop(mm);
| ...
|
|So this mm_struct may be freed in the mmput().
|
up_read(&mm->mmap_sem); |
As we can see above, the ksmd thread may access a mm_struct that already
been freed to the kmem_cache. Suppose a fork will get this mm_struct from
the kmem_cache, the ksmd thread then call up_read(&mm->mmap_sem), will
cause mmap_sem.count to become -1.
As suggested by Andrea Arcangeli, unmerge_and_remove_all_rmap_items has
the same SMP race condition, so fix it too. My prev fix in function
scan_get_next_rmap_item will introduce a different SMP race condition, so
just invert the up_read/spin_unlock order as Andrea Arcangeli said.
Link: http://lkml.kernel.org/r/1462708815-31301-1-git-send-email-zhouchengming1@huawei.com
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Geliang Tang <geliangtang@163.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MADV_FREE patchset changes page reclaim to simply free a clean
anonymous page with no dirty ptes, instead of swapping it out; but KSM
uses clean write-protected ptes to reference the stable ksm page. So be
sure to mark that page dirty, so it's never mistakenly discarded.
[hughd@google.com: adjusted comments]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@kernel.org>
Cc: <yalin.wang2010@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jason Evans <je@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttil <mika.penttila@nextfour.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Rik van Riel <riel@redhat.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Shaohua Li <shli@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
Use list_for_each_entry_safe() instead of list_for_each_safe() to
simplify the code.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
get_mergeable_page() can only return NULL (also in case of errors) or the
pinned mergeable page. It can't return an error different than NULL.
This optimizes away the unnecessary error check.
Add a return after the "out:" label in the callee to make it more
readable.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Doing the VM_MERGEABLE check after the page == kpage check won't provide
any meaningful benefit. The !vma->anon_vma check of find_mergeable_vma is
the only superfluous bit in using find_mergeable_vma because the !PageAnon
check of try_to_merge_one_page() implicitly checks for that, but it still
looks cleaner to share the same find_mergeable_vma().
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This just uses the helper function to cleanup the assumption on the
hlist_node internals.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joe Maples <joe@frap129.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The stable_nodes can become stale at any time if the underlying pages gets
freed. The stable_node gets collected and removed from the stable rbtree
if that is detected during the rbtree lookups.
Don't fail the lookup if running into stale stable_nodes, just restart the
lookup after collecting the stale stable_nodes. Otherwise the CPU spent
in the preparation stage is wasted and the lookup must be repeated at the
next loop potentially failing a second time in a second stale stable_node.
If we don't prune aggressively we delay the merging of the unstable node
candidates and at the same time we delay the freeing of the stale
stable_nodes. Keeping stale stable_nodes around wastes memory and it
can't provide any benefit.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joe Maples <joe@frap129.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
While at it add it to the file and anon walks too.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joe Maples <joe@frap129.org>
Conflicts:
mm/rmap.c
|
| |
|
|
| |
Signed-off-by: Joe Maples <joe@frap129.org>
|
| |
|
|
|
|
|
|
|
| |
WARNING: Prefer: pr_err(... to printk(KERN_ERR ...
[akpm@linux-foundation.org: remove KERN_ERR]
Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
[Detail] implement Adaptive KSM
[Solution] implement Adaptive KSM
[Feature] Adaptive KSM
MTK-Commit-Id: 39cab34ba6273f932a61698513284383dc6bb3bc
Change-Id: I844434ef2d1159fea0f0c265d05a0e118cf618f6
Signed-off-by: Casper Li <casper.li@mediatek.com>
CR-Id: ALPS02298405
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).
Instead, the correct and race-free way of performing the callback
registration is:
cpu_notifier_register_begin();
for_each_online_cpu(cpu)
init_cpu(cpu);
/* Note the use of the double underscored version of the API */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_notifier_register_done();
Fix the zsmalloc code by using this latter form of callback registration.
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| |
|
|
|
|
|
|
|
| |
Add my copyright to the zsmalloc source code which I maintain.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch moves zsmalloc under mm directory.
Before that, description will explain why we have needed custom
allocator.
Zsmalloc is a new slab-based memory allocator for storing compressed
pages. It is designed for low fragmentation and high allocation success
rate on large object, but <= PAGE_SIZE allocations.
zsmalloc differs from the kernel slab allocator in two primary ways to
achieve these design goals.
zsmalloc never requires high order page allocations to back slabs, or
"size classes" in zsmalloc terms. Instead it allows multiple
single-order pages to be stitched together into a "zspage" which backs
the slab. This allows for higher allocation success rate under memory
pressure.
Also, zsmalloc allows objects to span page boundaries within the zspage.
This allows for lower fragmentation than could be had with the kernel
slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE. With the
kernel slab allocator, if a page compresses to 60% of it original size,
the memory savings gained through compression is lost in fragmentation
because another object of the same size can't be stored in the leftover
space.
This ability to span pages results in zsmalloc allocations not being
directly addressable by the user. The user is given an
non-dereferencable handle in response to an allocation request. That
handle must be mapped, using zs_map_object(), which returns a pointer to
the mapped region that can be used. The mapping is necessary since the
object data may reside in two different noncontigious pages.
The zsmalloc fulfills the allocation needs for zram perfectly
[sjenning@linux.vnet.ibm.com: borrow Seth's quote]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
mm/Kconfig
mm/Makefile
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A userspace call to mmap(MAP_LOCKED) may result in the successful locking
of memory while also producing a confusing audit log denial. can_do_mlock
checks capable and rlimit. If either of these return positive
can_do_mlock returns true. The capable check leads to an LSM hook used by
apparmour and selinux which produce the audit denial. Reordering so
rlimit is checked first eliminates the denial on success, only recording a
denial when the lock is unsuccessful as a result of the denial.
Change-Id: Ic6e724554a7d566768a594917f160ab5b732108e
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Acked-by: Nick Kralevich <nnk@google.com>
Cc: Jeff Vander Stoep <jeffv@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Paul Cassella <cassella@cray.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Every current KDE system has process named ksysguardd polling files
below once in several seconds:
$ strace -e trace=open -p $(pidof ksysguardd)
Process 1812 attached
open("/etc/mtab", O_RDONLY|O_CLOEXEC) = 8
open("/etc/mtab", O_RDONLY|O_CLOEXEC) = 8
open("/proc/net/dev", O_RDONLY) = 8
open("/proc/net/wireless", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/proc/stat", O_RDONLY) = 8
open("/proc/vmstat", O_RDONLY) = 8
Hell knows what it is doing but speed up reading /proc/vmstat by 33%!
Benchmark is open+read+close 1.000.000 times.
BEFORE
$ perf stat -r 10 taskset -c 3 ./proc-vmstat
Performance counter stats for 'taskset -c 3 ./proc-vmstat' (10 runs):
13146.768464 task-clock (msec) # 0.960 CPUs utilized ( +- 0.60% )
15 context-switches # 0.001 K/sec ( +- 1.41% )
1 cpu-migrations # 0.000 K/sec ( +- 11.11% )
104 page-faults # 0.008 K/sec ( +- 0.57% )
45,489,799,349 cycles # 3.460 GHz ( +- 0.03% )
9,970,175,743 stalled-cycles-frontend # 21.92% frontend cycles idle ( +- 0.10% )
2,800,298,015 stalled-cycles-backend # 6.16% backend cycles idle ( +- 0.32% )
79,241,190,850 instructions # 1.74 insn per cycle
# 0.13 stalled cycles per insn ( +- 0.00% )
17,616,096,146 branches # 1339.956 M/sec ( +- 0.00% )
176,106,232 branch-misses # 1.00% of all branches ( +- 0.18% )
13.691078109 seconds time elapsed ( +- 0.03% )
^^^^^^^^^^^^
AFTER
$ perf stat -r 10 taskset -c 3 ./proc-vmstat
Performance counter stats for 'taskset -c 3 ./proc-vmstat' (10 runs):
8688.353749 task-clock (msec) # 0.950 CPUs utilized ( +- 1.25% )
10 context-switches # 0.001 K/sec ( +- 2.13% )
1 cpu-migrations # 0.000 K/sec
104 page-faults # 0.012 K/sec ( +- 0.56% )
30,384,010,730 cycles # 3.497 GHz ( +- 0.07% )
12,296,259,407 stalled-cycles-frontend # 40.47% frontend cycles idle ( +- 0.13% )
3,370,668,651 stalled-cycles-backend # 11.09% backend cycles idle ( +- 0.69% )
28,969,052,879 instructions # 0.95 insn per cycle
# 0.42 stalled cycles per insn ( +- 0.01% )
6,308,245,891 branches # 726.058 M/sec ( +- 0.00% )
214,685,502 branch-misses # 3.40% of all branches ( +- 0.26% )
9.146081052 seconds time elapsed ( +- 0.07% )
^^^^^^^^^^^
vsnprintf() is slow because:
1. format_decode() is busy looking for format specifier: 2 branches
per character (not in this case, but in others)
2. approximately million branches while parsing format mini language
and everywhere
3. just look at what string() does /proc/vmstat is good case because
most of its content are strings
Link: http://lkml.kernel.org/r/20160806125455.GA1187@p183.telecom.by
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 19be0eaffa3ac7d8eb6784ad9bdbc7d67ed8e619 upstream.
This is an ancient bug that was actually attempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").
In the meantime, the s390 situation has long been fixed, and we can now
fix it by checking the pte_dirty() bit properly (and do it better). The
s390 dirty bit was implemented in abf09bed3cce ("s390/mm: implement
software dirty bits") which made it into v3.9. Earlier kernels will
have to look at the page state itself.
Also, the VM has become more scalable, and what used a purely
theoretical race back then has become easier to trigger.
To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.
Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: s/gup.c/memory.c; s/follow_page_pte/follow_page_mask;
s/faultin_page/__get_user_page]
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit ad33bb04b2a6cee6c1f99fabb15cddbf93ff0433 upstream.
pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).
While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable(). The old pmd_trans_huge() left a
tiny window for a race.
Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.
[js] 3.12 backport: no pmd_devmap in 3.12 yet.
[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
vbq in vmap_block isn't used. So remove it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our intention in here is to find last_bit within the region to flush.
There is well-defined function, find_last_bit() for this purpose and its
performance may be slightly better than current implementation. So change
it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Rename vm_is_stack() to task_of_stack() and change it to return
"struct task_struct *" rather than the global (and thus wrong in
general) pid_t.
- Add the new pid_of_stack() helper which calls task_of_stack() and
uses the right namespace to report the correct pid_t.
Unfortunately we need to define this helper twice, in task_mmu.c
and in task_nommu.c. perhaps it makes sense to add fs/proc/util.c
and move at least pid_of_stack/task_of_stack there to avoid the
code duplication.
- Change show_map_vma() and show_numa_map() to use the new helper.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Conflicts:
fs/proc/task_nommu.c
mm/util.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This helps performance on moderately dense random reads on SSD.
Transaction-Per-Second numbers provided by Taobao:
QPS case
-------------------------------------------------------
7536 disable context readahead totally
w/ patch: 7129 slower size rampup and start RA on the 3rd read
6717 slower size rampup
w/o patch: 5581 unmodified context readahead
Before, readahead will be started whenever reading page N+1 when it happen
to read N recently. After patch, we'll only start readahead when *three*
random reads happen to access pages N, N+1, N+2. The probability of this
happening is extremely low for pure random reads, unless they are very
dense, which actually deserves some readahead.
Also start with a smaller readahead window. The impact to interleaved
sequential reads should be small, because for a long run stream, the the
small readahead window rampup phase is negletable.
The context readahead actually benefits clustered random reads on HDD
whose seek cost is pretty high. However as SSD is increasingly used for
random read workloads it's better for the context readahead to concentrate
on interleaved sequential reads.
Another SSD rand read test from Miao
# file size: 2GB
# read IO amount: 625MB
sysbench --test=fileio \
--max-requests=10000 \
--num-threads=1 \
--file-num=1 \
--file-block-size=64K \
--file-test-mode=rndrd \
--file-fsync-freq=0 \
--file-fsync-end=off run
shows the performance of btrfs grows up from 69MB/s to 121MB/s, ext4 from
104MB/s to 121MB/s.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Tested-by: Tao Ma <tm@tao.ma>
Tested-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: flar2 <asegaert@gmail.com>
Signed-off-by: Brandon Berhent <bbedward@gmail.com>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vladimir reported the following issue:
Commit c65c1877bd68 ("slub: use lockdep_assert_held") requires
remove_partial() to be called with n->list_lock held, but free_partial()
called from kmem_cache_close() on cache destruction does not follow this
rule, leading to a warning:
WARNING: CPU: 0 PID: 2787 at mm/slub.c:1536 __kmem_cache_shutdown+0x1b2/0x1f0()
Modules linked in:
CPU: 0 PID: 2787 Comm: modprobe Tainted: G W 3.14.0-rc1-mm1+ #1
Hardware name:
0000000000000600 ffff88003ae1dde8 ffffffff816d9583 0000000000000600
0000000000000000 ffff88003ae1de28 ffffffff8107c107 0000000000000000
ffff880037ab2b00 ffff88007c240d30 ffffea0001ee5280 ffffea0001ee52a0
Call Trace:
__kmem_cache_shutdown+0x1b2/0x1f0
kmem_cache_destroy+0x43/0xf0
xfs_destroy_zones+0x103/0x110 [xfs]
exit_xfs_fs+0x38/0x4e4 [xfs]
SyS_delete_module+0x19a/0x1f0
system_call_fastpath+0x16/0x1b
His solution was to add a spinlock in order to quiet lockdep. Although
there would be no contention to adding the lock, that lock also requires
disabling of interrupts which will have a larger impact on the system.
Instead of adding a spinlock to a location where it is not needed for
lockdep, make a __remove_partial() function that does not test if the
list_lock is held, as no one should have it due to it being freed.
Also added a __add_partial() function that does not do the lock
validation either, as it is not needed for the creation of the cache.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Vladimir Davydov <vdavydov@parallels.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The slub code does some setup during early boot in
early_kmem_cache_node_alloc() with some local data. There is no
possible way that another CPU can see this data, so the slub code
doesn't unnecessarily lock it. However, some new lockdep asserts
check to make sure that add_partial() _always_ has the list_lock
held.
Just add the locking, even though it is technically unnecessary.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Signed-off-by: Anik1199 <anik9280@gmail.com>
Conflicts:
mm/slub.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no point in flooding logs with warnings or especially crashing
the system if we fail to create a cache for a memcg. In this case we
will be accounting the memcg allocation to the root cgroup until we
succeed to create its own cache, but it isn't that critical.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently kmem_cache_create_memcg() backoffs on failure inside
conditionals, without using gotos. This results in the rollback code
duplication, which makes the function look cumbersome even though on
error we should only free the allocated cache. Since in the next patch
I am going to add yet another rollback function call on error path
there, let's employ labels instead of conditionals for undoing any
changes on failure to keep things clean.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Conflicts:
mm/slab_common.c
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
| |
Give s_next and s_stop slab-specific names instead of exporting
"s_next" and "s_stop".
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Slab have some tunables like limit, batchcount, and sharedfactor can be
tuned through function slabinfo_write. Commit (b7454ad3: mm/sl[au]b: Move
slabinfo processing to slab_common.c) uncorrectly change /proc/slabinfo
unwriteable for slab, this patch fix it by revert to original mode.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
| |
This patch shares s_next and s_stop between slab and slub.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that
on ppc LPARs with memoryless nodes, a large amount of memory was consumed
by slabs and was marked unreclaimable. He tracked it down to slab
deactivations in the SLUB core when we allocate remotely, leading to poor
efficiency always when memoryless nodes are present.
After much discussion, Joonsoo provided a few patches that help
significantly. They don't resolve the problem altogether:
- memory hotplug still needs testing, that is when a memoryless node
becomes memory-ful, we want to dtrt
- there are other reasons for going off-node than memoryless nodes,
e.g., fully exhausted local nodes
Neither case is resolved with this series, but I don't think that should
block their acceptance, as they can be explored/resolved with follow-on
patches.
The series consists of:
[1/3] topology: add support for node_to_mem_node() to determine the
fallback node
[2/3] slub: fallback to node_to_mem_node() node if allocating on
memoryless node
- Joonsoo's patches to cache the nearest node with memory for each
NUMA node
[3/3] Partial revert of 81c98869faa5 (""kthread: ensure locality of
task_struct allocations")
- At Tejun's request, keep the knowledge of memoryless node fallback
to the allocator core.
This patch (of 3):
We need to determine the fallback node in slub allocator if the allocation
target node is memoryless node. Without it, the SLUB wrongly select the
node which has no memory and can't use a partial slab, because of node
mismatch. Introduced function, node_to_mem_node(X), will return a node Y
with memory that has the nearest distance. If X is memoryless node, it
will return nearest distance node, but, if X is normal node, it will
return itself.
We will use this function in following patch to determine the fallback
node.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update the SLUB code to search for partial slabs on the nearest node with
memory in the presence of memoryless nodes. Additionally, do not consider
it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when a
memoryless-node specified allocation goes off-node.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tracing of mergeable slabs as well as uses of failslab are confusing since
the objects of multiple slab caches will be affected. Moreover this
creates a situation where a mergeable slab will become unmergeable.
If tracing or failslab testing is desired then it may be best to switch
merging off for starters.
Signed-off-by: Christoph Lameter <cl@linux.com>
Tested-by: WANG Chao <chaowang@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function is never called for memcg caches, because they are
unmergeable, so remove the dead code.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We mark some slab caches (e.g. kmem_cache_node) as unmergeable by
setting refcount to -1, and their alias should be 0, not refcount-1, so
correct it here.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a kmem_cache is created with ctor, each object in the kmem_cache
will be initialized before ready to use. While in slub implementation,
the first object will be initialized twice.
This patch reduces the duplication of initialization of the first
object.
Fix commit 7656c72b ("SLUB: add macros for scanning objects in a slab").
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
resiliency_test() is only called for bootstrap, so it may be moved to
init.text and freed after boot.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
min_partial means minimum number of slab cached in node partial list.
So, if nr_partial is less than it, we keep newly empty slab on node
partial list rather than freeing it. But if nr_partial is equal or
greater than it, it means that we have enough partial slabs so should
free newly empty slab. Current implementation missed the equal case so
if we set min_partial is 0, then, at least one slab could be cached.
This is critical problem to kmemcg destroying logic because it doesn't
works properly if some slabs is cached. This patch fixes this problem.
Fixes 91cb69620284 ("slub: make dead memcg caches discard free slabs
immediately").
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, if allocation constraint to node is NUMA_NO_NODE, we search a
partial slab on numa_node_id() node. This doesn't work properly on a
system having memoryless nodes, since it can have no memory on that node
so there must be no partial slab on that node.
On that node, page allocation always falls back to numa_mem_id() first.
So searching a partial slab on numa_node_id() in that case is the proper
solution for the memoryless node case.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
| |
Size is usually below than KMALLOC_MAX_SIZE.
If we add a 'unlikely' macro, compiler can make better code.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There used to be only one path out of __slab_alloc(), and ALLOC_SLOWPATH
got bumped in that exit path. Now there are two, and a bunch of gotos.
ALLOC_SLOWPATH can now get set more than once during a single call to
__slab_alloc() which is pretty bogus. Here's the sequence:
1. Enter __slab_alloc(), fall through all the way to the
stat(s, ALLOC_SLOWPATH);
2. hit 'if (!freelist)', and bump DEACTIVATE_BYPASS, jump to
new_slab (goto #1)
3. Hit 'if (c->partial)', bump CPU_PARTIAL_ALLOC, goto redo
(goto #2)
4. Fall through in the same path we did before all the way to
stat(s, ALLOC_SLOWPATH)
5. bump ALLOC_REFILL stat, then return
Doing this is obviously bogus. It keeps us from being able to
accurately compare ALLOC_SLOWPATH vs. ALLOC_FASTPATH. It also means
that the total number of allocs always exceeds the total number of
frees.
This patch moves stat(s, ALLOC_SLOWPATH) to be called from the same
place that __slab_alloc() is. This makes it much less likely that
ALLOC_SLOWPATH will get botched again in the spaghetti-code inside
__slab_alloc().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the slab or slub allocators cannot allocate additional slab pages,
they emit diagnostic information to the kernel log such as current
number of slabs, number of objects, active objects, etc. This is always
coupled with a page allocation failure warning since it is controlled by
!__GFP_NOWARN.
Suppress this out of memory warning if the allocator is configured
without debug supported. The page allocation failure warning will
indicate it is a failed slab allocation, the order, and the gfp mask, so
this is only useful to diagnose allocator issues.
Since CONFIG_SLUB_DEBUG is already enabled by default for the slub
allocator, there is no functional change with this patch. If debug is
disabled, however, the warnings are now suppressed.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Inspired by Joe Perches suggestion in ntfs logging clean-up.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Joe Perches <joe@perches.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
All printk(KERN_foo converted to pr_foo()
Default printk converted to pr_warn()
Coalesce format fragments
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Joe Perches <joe@perches.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|