| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
| |
BUG: 27577101
BUG: 27532522
Change-Id: I890831a72e5ad4485fdf30e51a146712b18052ed
Signed-off-by: Mohamad Ayyash <mkayyash@google.com
|
| |
|
|
| |
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
|
| |
Occasionally, there's a question about the method we use to find the
start of physical memory. Add some documentation so we don't have to
keep repeating outselves on the mailing list.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Chet Kener <Cl3Kener@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is currently a hardcoded limit of 64KB for the DTB to live in and
be extended with ATAG info. Some DTBs have outgrown that limit:
$ du -b arch/arm/boot/dts/omap3-n900.dtb
70212 arch/arm/boot/dts/omap3-n900.dtb
Furthermore, the actual size passed to atags_to_fdt() included the stack
size which is obviously wrong.
The initial DTB size is known, so use it to size the allocated workspace
with a 50% growth assumption and relocate the temporary stack above that.
This is also clamped to 32KB min / 1MB max for robustness against bad
DTB data.
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Tested-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Chet Kener <Cl3Kener@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
|
|
| |
addruart from the generic debug macro is doing exactly the same using
the common lowlevel uart definition, so there is no cause for this
special casing for s3c24xx.
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Chet Kener <Cl3Kener@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
…addr2 in futex_requeue(..., requeue_pi=1)
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call. If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.
This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")
[ tglx: Compare the resulting keys as well, as uaddrs might be
different depending on the mapping ]
Fixes CVE-2014-3153.
Change-Id: I3d40911aca262eaefc3852327fa12bec416cd27d
Reported-by: Pinkie Pie
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
| |
"timer" was checked for null, but used later without being re-checked.
Change-Id: Ib4d08cd49860c9f157d1cac556705ba85cd44f4e
Reported-by: dan.carpenter@oracle.com
Signed-off-by: JP Abgrall <jpa@google.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
vbq in vmap_block isn't used. So remove it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our intention in here is to find last_bit within the region to flush.
There is well-defined function, find_last_bit() for this purpose and its
performance may be slightly better than current implementation. So change
it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 88a7e37d265 (sched: provide per cpu-cgroup option to
notify on migrations) added a notifier call when a task is moved
to a different CPU. Unfortunately the two call sites in the RT
sched class where this occurs happens with a runqueue lock held.
This can result in a deadlock if the notifier call attempts to do
something like wake up a task.
Fortunately the benefit of 88a7e37d265 comes mainly from notifying
on migration of non-RT tasks, so we can simply ignore the movements
of RT tasks.
CRs-Fixed: 491370
Change-Id: I8849d826bf1eeaf85a6f6ad872acb475247c5926
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Conflicts:
kernel/sched/rt.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On systems where CPUs may run asynchronously, task migrations
between CPUs running at grossly different speeds can cause
problems.
This change provides a mechanism to notify a subsystem
in the kernel if a task in a particular cgroup migrates to a
different CPU. Other subsystems (such as cpufreq) may then
register for this notifier to take appropriate action when
such a task is migrated.
The cgroup attribute to set for this behavior is
"notify_on_migrate" .
Change-Id: Ie1868249e53ef901b89c837fdc33b0ad0c0a4590
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Conflicts:
kernel/sched/core.c
kernel/sched/rt.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 001eabfd54c0cbf9d7d16264ddc8cc0bee67e3ed ]
This updates the bit sliced AES module to the latest version in the
upstream OpenSSL repository (e620e5ae37bc). This is needed to fix a
bug in the XTS decryption path, where data chunked in a certain way
could trigger the ciphertext stealing code, which is not supposed to
be active in the kernel build (The kernel implementation of XTS only
supports round multiples of the AES block size of 16 bytes, whereas
the conformant OpenSSL implementation of XTS supports inputs of
arbitrary size by applying ciphertext stealing). This is fixed in
the upstream version by adding the missing #ifndef XTS_CHAIN_TWEAK
around the offending instructions.
The upstream code also contains the change applied by Russell to
build the code unconditionally, i.e., even if __LINUX_ARM_ARCH__ < 7,
but implemented slightly differently.
Cc: stable@vger.kernel.org
Fixes: e4e7f10bfc40 ("ARM: add support for bit sliced AES using NEON instructions")
Reported-by: Adrian Kotelba <adrian.kotelba@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Extract the mm_access() code from __mem_open() into the new helper,
proc_mem_open(), the next patch will add another caller.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Conflicts:
fs/proc/base.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
9e7814404b77 "hold task->mempolicy while numa_maps scans." fixed the
race with the exiting task but this is not enough.
The current code assumes that get_vma_policy(task) should either see
task->mempolicy == NULL or it should be equal to ->task_mempolicy saved
by hold_task_mempolicy(), so we can never race with __mpol_put(). But
this can only work if we can't race with do_set_mempolicy(), and thus
we can't race with another do_set_mempolicy() or do_exit() after that.
However, do_set_mempolicy()->down_write(mmap_sem) can not prevent this
race. This task can exec, change it's ->mm, and call do_set_mempolicy()
after that; in this case they take 2 different locks.
Change hold_task_mempolicy() to use get_task_policy(), it never returns
NULL, and change show_numa_map() to use __get_vma_policy() or fall back
to proc_priv->task_mempolicy.
Note: this is the minimal fix, we will cleanup this code later. I think
hold_task_mempolicy() and release_task_mempolicy() should die, we can
move this logic into show_numa_map(). Or we can move get_task_policy()
outside of ->mmap_sem and !CONFIG_NUMA code at least.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Rename vm_is_stack() to task_of_stack() and change it to return
"struct task_struct *" rather than the global (and thus wrong in
general) pid_t.
- Add the new pid_of_stack() helper which calls task_of_stack() and
uses the right namespace to report the correct pid_t.
Unfortunately we need to define this helper twice, in task_mmu.c
and in task_nommu.c. perhaps it makes sense to add fs/proc/util.c
and move at least pid_of_stack/task_of_stack there to avoid the
code duplication.
- Change show_map_vma() and show_numa_map() to use the new helper.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Conflicts:
fs/proc/task_nommu.c
mm/util.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
m_start() can use get_proc_task() instead, and "struct inode *"
provides more potentially useful info, see the next changes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the main loop in m_start() to update m->version. Mostly for
consistency, but this can help to avoid the same loop if the very
1st ->show() fails due to seq_overflow().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the "last_addr" optimization back. Like before, every ->show()
method checks !seq_overflow() and sets m->version = vma->vm_start.
However, it also checks that m_next_vma(vma) != NULL, otherwise it
sets m->version = -1 for the lockless "EOF" fast-path in m_start().
m_start() can simply do find_vma() + m_next_vma() if last_addr is
not zero, the code looks clear and simple and this case is clearly
separated from "scan vmas" path.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Conflicts:
fs/proc/task_mmu.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Extract the tail_vma/vm_next calculation from m_next() into the new
trivial helper, m_next_vma().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now that m->version is gone we can cleanup m_start(). In particular,
- Remove the "unsigned long" typecast, m->index can't be negative
or exceed ->map_count. But lets use "unsigned int pos" to make
it clear that "pos < map_count" is safe.
- Remove the unnecessary "vma != NULL" check in the main loop. It
can't be NULL unless we have a vm bug.
- This also means that "pos < map_count" case can simply return the
valid vma and avoid "goto" and subsequent checks.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
m_start() carefully documents, checks, and sets "m->version = -1" if
we are going to return NULL. The only problem is that we will be never
called again if m_start() returns NULL, so this is simply pointless
and misleading.
Otoh, ->show() methods m->version = 0 if vma == tail_vma and this is
just wrong, we want -1 in this case. And in fact we also want -1 if
->vm_next == NULL and ->tail_vma == NULL.
And it is not used consistently, the "scan vmas" loop in m_start()
should update last_addr too.
Finally, imo the whole "last_addr" logic in m_start() looks horrible.
find_vma(last_addr) is called unconditionally even if we are not going
to use the result. But the main problem is that this code participates
in tail_vma-or-NULL mess, and this looks simply unfixable.
Remove this optimization. We will add it back after some cleanups.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Signed-off-by: Anik1199 <anik9280@gmail.com>
Conflicts:
fs/proc/task_mmu.c
Conflicts:
fs/proc/task_mmu.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. There is no reason to reset ->tail_vma in m_start(), if we return
IS_ERR_OR_NULL() it won't be used.
2. m_start() also clears priv->task to ensure that m_stop() won't use
the stale pointer if we fail before get_task_struct(). But this is
ugly and confusing, move this initialization in m_stop().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Kill the first "vma != NULL" check. Firstly this is not possible,
m_next() won't be called if ->start() or the previous ->next()
returns NULL.
And if it was possible the 2nd "vma != tail_vma" check is buggy,
we should not wrongly return ->tail_vma.
2. Make this function readable. The logic is very simple, we should
return check "vma != tail" once and return "vm_next || tail_vma".
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
m_start() drops ->mmap_sem and does mmput() if it retuns vsyscall
vma. This is because in this case m_stop()->vma_stop() obviously
can't use gate_vma->vm_mm.
Now that we have proc_maps_private->mm we can simplify this logic:
- Change m_start() to return with ->mmap_sem held unless it returns
IS_ERR_OR_NULL().
- Change vma_stop() to use priv->mm and avoid the ugly vma checks,
this makes "vm_area_struct *vma" unnecessary.
- This also allows m_start() to use vm_stop().
- Cleanup m_next() to follow the new locking rule.
Note: m_stop() looks very ugly, and this temporary uglifies it
even more. Fixed by the next change.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A simple test-case from Kirill Shutemov
cat /proc/self/maps >/dev/null
chmod +x /proc/self/net/packet
exec /proc/self/net/packet
makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in
the opposite order.
It's a false positive and probably we should not allow "chmod +x" on proc
files. Still I think that we should avoid mm_access() and cred_guard_mutex
in sys_read() paths, security checking should happen at open time. Besides,
this doesn't even look right if the task changes its ->mm between m_stop()
and m_start().
Add the new "mm_struct *mm" member into struct proc_maps_private and change
proc_maps_open() to initialize it using proc_mem_open(). Change m_start() to
use priv->mm if atomic_inc_not_zero(mm_users) succeeds or return NULL (eof)
otherwise.
The only complication is that proc_maps_open() users should additionally do
mmdrop() in fop->release(), add the new proc_map_release() helper for that.
Note: this is the user-visible change, if the task execs after open("maps")
the new ->mm won't be visible via this file. I hope this is fine, and this
matches /proc/pid/mem bahaviour.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Conflicts:
fs/proc/task_mmu.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
do_maps_open() and numa_maps_open() are overcomplicated, they could use
__seq_open_private(). Plus they do the same, just sizeof(*priv)
Change them to use a new simple helper, proc_maps_open(ops, psize). This
simplifies the code and allows us to do the next changes.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
get_gate_vma(priv->task->mm) looks ugly and wrong, task->mm can be NULL or
it can changed by exec right after mm_access().
And in theory this race is not harmless, the task can exec and then later
exit and free the new mm_struct. In this case get_task_mm(oldmm) can't
help, get_gate_vma(task->mm) can read the freed/unmapped memory.
I think that priv->task should simply die and hold_task_mempolicy() logic
can be simplified. tail_vma logic asks for cleanups too.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This helps performance on moderately dense random reads on SSD.
Transaction-Per-Second numbers provided by Taobao:
QPS case
-------------------------------------------------------
7536 disable context readahead totally
w/ patch: 7129 slower size rampup and start RA on the 3rd read
6717 slower size rampup
w/o patch: 5581 unmodified context readahead
Before, readahead will be started whenever reading page N+1 when it happen
to read N recently. After patch, we'll only start readahead when *three*
random reads happen to access pages N, N+1, N+2. The probability of this
happening is extremely low for pure random reads, unless they are very
dense, which actually deserves some readahead.
Also start with a smaller readahead window. The impact to interleaved
sequential reads should be small, because for a long run stream, the the
small readahead window rampup phase is negletable.
The context readahead actually benefits clustered random reads on HDD
whose seek cost is pretty high. However as SSD is increasingly used for
random read workloads it's better for the context readahead to concentrate
on interleaved sequential reads.
Another SSD rand read test from Miao
# file size: 2GB
# read IO amount: 625MB
sysbench --test=fileio \
--max-requests=10000 \
--num-threads=1 \
--file-num=1 \
--file-block-size=64K \
--file-test-mode=rndrd \
--file-fsync-freq=0 \
--file-fsync-end=off run
shows the performance of btrfs grows up from 69MB/s to 121MB/s, ext4 from
104MB/s to 121MB/s.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Tested-by: Tao Ma <tm@tao.ma>
Tested-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: flar2 <asegaert@gmail.com>
Signed-off-by: Brandon Berhent <bbedward@gmail.com>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
In case card detect IRQ is triggered before mmc_add_host(), then
there could be a potential race between mmc_power_off() which gets
called from mmc_rescan() of card detect IRQ handler and
mmc_start_host() which gets called from mmc_add_host(). This may
turn off the clocks while mmc_start_host() is still running and thus
may result in an un-clocked register access.
Change-Id: I522df75ba0ad23f065127e015ad825075be94596
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Vladimir reported the following issue:
Commit c65c1877bd68 ("slub: use lockdep_assert_held") requires
remove_partial() to be called with n->list_lock held, but free_partial()
called from kmem_cache_close() on cache destruction does not follow this
rule, leading to a warning:
WARNING: CPU: 0 PID: 2787 at mm/slub.c:1536 __kmem_cache_shutdown+0x1b2/0x1f0()
Modules linked in:
CPU: 0 PID: 2787 Comm: modprobe Tainted: G W 3.14.0-rc1-mm1+ #1
Hardware name:
0000000000000600 ffff88003ae1dde8 ffffffff816d9583 0000000000000600
0000000000000000 ffff88003ae1de28 ffffffff8107c107 0000000000000000
ffff880037ab2b00 ffff88007c240d30 ffffea0001ee5280 ffffea0001ee52a0
Call Trace:
__kmem_cache_shutdown+0x1b2/0x1f0
kmem_cache_destroy+0x43/0xf0
xfs_destroy_zones+0x103/0x110 [xfs]
exit_xfs_fs+0x38/0x4e4 [xfs]
SyS_delete_module+0x19a/0x1f0
system_call_fastpath+0x16/0x1b
His solution was to add a spinlock in order to quiet lockdep. Although
there would be no contention to adding the lock, that lock also requires
disabling of interrupts which will have a larger impact on the system.
Instead of adding a spinlock to a location where it is not needed for
lockdep, make a __remove_partial() function that does not test if the
list_lock is held, as no one should have it due to it being freed.
Also added a __add_partial() function that does not do the lock
validation either, as it is not needed for the creation of the cache.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Vladimir Davydov <vdavydov@parallels.com>
Suggested-by: David Rientjes <rientjes@google.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The slub code does some setup during early boot in
early_kmem_cache_node_alloc() with some local data. There is no
possible way that another CPU can see this data, so the slub code
doesn't unnecessarily lock it. However, some new lockdep asserts
check to make sure that add_partial() _always_ has the list_lock
held.
Just add the locking, even though it is technically unnecessary.
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Signed-off-by: Anik1199 <anik9280@gmail.com>
Conflicts:
mm/slub.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To speed up decompression, the decompressor sets up a flat, cacheable
mapping of memory. However, when there is insufficient space to hold
the page tables for this mapping, we don't bother to enable the caches
and subsequently skip all the cache maintenance hooks.
Skipping the cache maintenance before jumping to the relocated code
allows the processor to predict the branch and populate the I-cache
with stale data before the relocation loop has completed (since a
bootloader may have SCTLR.I set, which permits normal, cacheable
instruction fetches regardless of SCTLR.M).
This patch moves the cache maintenance check into the maintenance
routines themselves, allowing the v6/v7 versions to invalidate the
I-cache regardless of the MMU state.
Cc: <stable@vger.kernel.org>
Reported-by: Marc Carino <marc.ceeeee@gmail.com>
Tested-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Chet Kener <Cl3Kener@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
|
| |
|
|
|
|
|
|
|
|
| |
cpu partial pages are used to avoid contention which does not exist in
the UP case. So let SLUB_CPU_PARTIAL depend on SMP.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no point in flooding logs with warnings or especially crashing
the system if we fail to create a cache for a memcg. In this case we
will be accounting the memcg allocation to the root cgroup until we
succeed to create its own cache, but it isn't that critical.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently kmem_cache_create_memcg() backoffs on failure inside
conditionals, without using gotos. This results in the rollback code
duplication, which makes the function look cumbersome even though on
error we should only free the allocated cache. Since in the next patch
I am going to add yet another rollback function call on error path
there, let's employ labels instead of conditionals for undoing any
changes on failure to keep things clean.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
Conflicts:
mm/slab_common.c
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
| |
Give s_next and s_stop slab-specific names instead of exporting
"s_next" and "s_stop".
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Slab have some tunables like limit, batchcount, and sharedfactor can be
tuned through function slabinfo_write. Commit (b7454ad3: mm/sl[au]b: Move
slabinfo processing to slab_common.c) uncorrectly change /proc/slabinfo
unwriteable for slab, this patch fix it by revert to original mode.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
| |
This patch shares s_next and s_stop between slab and slub.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that
on ppc LPARs with memoryless nodes, a large amount of memory was consumed
by slabs and was marked unreclaimable. He tracked it down to slab
deactivations in the SLUB core when we allocate remotely, leading to poor
efficiency always when memoryless nodes are present.
After much discussion, Joonsoo provided a few patches that help
significantly. They don't resolve the problem altogether:
- memory hotplug still needs testing, that is when a memoryless node
becomes memory-ful, we want to dtrt
- there are other reasons for going off-node than memoryless nodes,
e.g., fully exhausted local nodes
Neither case is resolved with this series, but I don't think that should
block their acceptance, as they can be explored/resolved with follow-on
patches.
The series consists of:
[1/3] topology: add support for node_to_mem_node() to determine the
fallback node
[2/3] slub: fallback to node_to_mem_node() node if allocating on
memoryless node
- Joonsoo's patches to cache the nearest node with memory for each
NUMA node
[3/3] Partial revert of 81c98869faa5 (""kthread: ensure locality of
task_struct allocations")
- At Tejun's request, keep the knowledge of memoryless node fallback
to the allocator core.
This patch (of 3):
We need to determine the fallback node in slub allocator if the allocation
target node is memoryless node. Without it, the SLUB wrongly select the
node which has no memory and can't use a partial slab, because of node
mismatch. Introduced function, node_to_mem_node(X), will return a node Y
with memory that has the nearest distance. If X is memoryless node, it
will return nearest distance node, but, if X is normal node, it will
return itself.
We will use this function in following patch to determine the fallback
node.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Update the SLUB code to search for partial slabs on the nearest node with
memory in the presence of memoryless nodes. Additionally, do not consider
it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when a
memoryless-node specified allocation goes off-node.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Tracing of mergeable slabs as well as uses of failslab are confusing since
the objects of multiple slab caches will be affected. Moreover this
creates a situation where a mergeable slab will become unmergeable.
If tracing or failslab testing is desired then it may be best to switch
merging off for starters.
Signed-off-by: Christoph Lameter <cl@linux.com>
Tested-by: WANG Chao <chaowang@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This function is never called for memcg caches, because they are
unmergeable, so remove the dead code.
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We mark some slab caches (e.g. kmem_cache_node) as unmergeable by
setting refcount to -1, and their alias should be 0, not refcount-1, so
correct it here.
Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a kmem_cache is created with ctor, each object in the kmem_cache
will be initialized before ready to use. While in slub implementation,
the first object will be initialized twice.
This patch reduces the duplication of initialization of the first
object.
Fix commit 7656c72b ("SLUB: add macros for scanning objects in a slab").
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
resiliency_test() is only called for bootstrap, so it may be moved to
init.text and freed after boot.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
min_partial means minimum number of slab cached in node partial list.
So, if nr_partial is less than it, we keep newly empty slab on node
partial list rather than freeing it. But if nr_partial is equal or
greater than it, it means that we have enough partial slabs so should
free newly empty slab. Current implementation missed the equal case so
if we set min_partial is 0, then, at least one slab could be cached.
This is critical problem to kmemcg destroying logic because it doesn't
works properly if some slabs is cached. This patch fixes this problem.
Fixes 91cb69620284 ("slub: make dead memcg caches discard free slabs
immediately").
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, if allocation constraint to node is NUMA_NO_NODE, we search a
partial slab on numa_node_id() node. This doesn't work properly on a
system having memoryless nodes, since it can have no memory on that node
so there must be no partial slab on that node.
On that node, page allocation always falls back to numa_mem_id() first.
So searching a partial slab on numa_node_id() in that case is the proper
solution for the memoryless node case.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Han Pingtian <hanpt@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
| |
Size is usually below than KMALLOC_MAX_SIZE.
If we add a 'unlikely' macro, compiler can make better code.
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There used to be only one path out of __slab_alloc(), and ALLOC_SLOWPATH
got bumped in that exit path. Now there are two, and a bunch of gotos.
ALLOC_SLOWPATH can now get set more than once during a single call to
__slab_alloc() which is pretty bogus. Here's the sequence:
1. Enter __slab_alloc(), fall through all the way to the
stat(s, ALLOC_SLOWPATH);
2. hit 'if (!freelist)', and bump DEACTIVATE_BYPASS, jump to
new_slab (goto #1)
3. Hit 'if (c->partial)', bump CPU_PARTIAL_ALLOC, goto redo
(goto #2)
4. Fall through in the same path we did before all the way to
stat(s, ALLOC_SLOWPATH)
5. bump ALLOC_REFILL stat, then return
Doing this is obviously bogus. It keeps us from being able to
accurately compare ALLOC_SLOWPATH vs. ALLOC_FASTPATH. It also means
that the total number of allocs always exceeds the total number of
frees.
This patch moves stat(s, ALLOC_SLOWPATH) to be called from the same
place that __slab_alloc() is. This makes it much less likely that
ALLOC_SLOWPATH will get botched again in the spaghetti-code inside
__slab_alloc().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the slab or slub allocators cannot allocate additional slab pages,
they emit diagnostic information to the kernel log such as current
number of slabs, number of objects, active objects, etc. This is always
coupled with a page allocation failure warning since it is controlled by
!__GFP_NOWARN.
Suppress this out of memory warning if the allocator is configured
without debug supported. The page allocation failure warning will
indicate it is a failed slab allocation, the order, and the gfp mask, so
this is only useful to diagnose allocator issues.
Since CONFIG_SLUB_DEBUG is already enabled by default for the slub
allocator, there is no functional change with this patch. If debug is
disabled, however, the warnings are now suppressed.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
|