aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* Don't show empty tag stats for unprivileged uidsMohamad Ayyash2016-09-281-1/+1
| | | | | | | BUG: 27577101 BUG: 27532522 Change-Id: I890831a72e5ad4485fdf30e51a146712b18052ed Signed-off-by: Mohamad Ayyash <mkayyash@google.com
* power: wakeup: add wakelock togglesengstk2016-09-281-0/+29
| | | | Signed-off-by: engstk <eng.stk@sapo.pt>
* ARM: add documentation for finding start of physical memoryRussell King2016-09-281-1/+18
| | | | | | | | | | Occasionally, there's a question about the method we use to find the start of physical memory. Add some documentation so we don't have to keep repeating outselves on the mailing list. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Chet Kener <Cl3Kener@gmail.com> Signed-off-by: engstk <eng.stk@sapo.pt>
* ARM: 8294/1: ATAG_DTB_COMPAT: remove the DT workspace's hardcoded 64KB sizeNicolas Pitre2016-09-281-9/+30
| | | | | | | | | | | | | | | | | | | | | | | There is currently a hardcoded limit of 64KB for the DTB to live in and be extended with ATAG info. Some DTBs have outgrown that limit: $ du -b arch/arm/boot/dts/omap3-n900.dtb 70212 arch/arm/boot/dts/omap3-n900.dtb Furthermore, the actual size passed to atags_to_fdt() included the stack size which is obviously wrong. The initial DTB size is known, so use it to size the allocated workspace with a 50% growth assumption and relocate the temporary stack above that. This is also clamped to 32KB min / 1MB max for robustness against bad DTB data. Reported-by: Pali Rohár <pali.rohar@gmail.com> Tested-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Chet Kener <Cl3Kener@gmail.com> Signed-off-by: engstk <eng.stk@sapo.pt>
* ARM: compressed/head.S: remove s3c24xx special caseHeiko Stuebner2016-09-281-5/+0
| | | | | | | | | | | addruart from the generic debug macro is doing exactly the same using the common lowlevel uart definition, so there is no cause for this special casing for s3c24xx. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Chet Kener <Cl3Kener@gmail.com> Signed-off-by: engstk <eng.stk@sapo.pt>
* futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == u…Thomas Gleixner2016-09-281-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | …addr2 in futex_requeue(..., requeue_pi=1) If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, then dangling pointers may be left for rt_waiter resulting in an exploitable condition. This change brings futex_requeue() in line with futex_wait_requeue_pi() which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()") [ tglx: Compare the resulting keys as well, as uaddrs might be different depending on the mapping ] Fixes CVE-2014-3153. Change-Id: I3d40911aca262eaefc3852327fa12bec416cd27d Reported-by: Pinkie Pie Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com> Signed-off-by: engstk <eng.stk@sapo.pt>
* netfilter: IDLETIMER: fix invalid deference of timerJP Abgrall2016-09-281-1/+1
| | | | | | | | | "timer" was checked for null, but used later without being re-checked. Change-Id: Ib4d08cd49860c9f157d1cac556705ba85cd44f4e Reported-by: dan.carpenter@oracle.com Signed-off-by: JP Abgrall <jpa@google.com> Signed-off-by: engstk <eng.stk@sapo.pt>
* mm, vmalloc: remove useless variable in vmap_blockJoonsoo Kim2016-09-281-2/+0
| | | | | | | | | | | | | | vbq in vmap_block isn't used. So remove it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com> Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com> Signed-off-by: engstk <eng.stk@sapo.pt>
* mm, vmalloc: use well-defined find_last_bit() funcJoonsoo Kim2016-09-281-9/+6
| | | | | | | | | | | | | | | | | Our intention in here is to find last_bit within the region to flush. There is well-defined function, find_last_bit() for this purpose and its performance may be slightly better than current implementation. So change it. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com> Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com> Signed-off-by: engstk <eng.stk@sapo.pt>
* sched: remove migration notification from RT classSteve Muckle2016-09-281-19/+1
| | | | | | | | | | | | | | | | | | | | | Commit 88a7e37d265 (sched: provide per cpu-cgroup option to notify on migrations) added a notifier call when a task is moved to a different CPU. Unfortunately the two call sites in the RT sched class where this occurs happens with a runqueue lock held. This can result in a deadlock if the notifier call attempts to do something like wake up a task. Fortunately the benefit of 88a7e37d265 comes mainly from notifying on migration of non-RT tasks, so we can simply ignore the movements of RT tasks. CRs-Fixed: 491370 Change-Id: I8849d826bf1eeaf85a6f6ad872acb475247c5926 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com> Conflicts: kernel/sched/rt.c
* sched: provide per cpu-cgroup option to notify on migrationsSteve Muckle2016-09-281-1/+19
| | | | | | | | | | | | | | | | | | | | | | | On systems where CPUs may run asynchronously, task migrations between CPUs running at grossly different speeds can cause problems. This change provides a mechanism to notify a subsystem in the kernel if a task in a particular cgroup migrates to a different CPU. Other subsystems (such as cpufreq) may then register for this notifier to take appropriate action when such a task is migrated. The cgroup attribute to set for this behavior is "notify_on_migrate" . Change-Id: Ie1868249e53ef901b89c837fdc33b0ad0c0a4590 Signed-off-by: Steve Muckle <smuckle@codeaurora.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com> Conflicts: kernel/sched/core.c kernel/sched/rt.c
* crypto: arm/aes update NEON AES module to latest OpenSSL versionArd Biesheuvel2016-09-282-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 001eabfd54c0cbf9d7d16264ddc8cc0bee67e3ed ] This updates the bit sliced AES module to the latest version in the upstream OpenSSL repository (e620e5ae37bc). This is needed to fix a bug in the XTS decryption path, where data chunked in a certain way could trigger the ciphertext stealing code, which is not supposed to be active in the kernel build (The kernel implementation of XTS only supports round multiples of the AES block size of 16 bytes, whereas the conformant OpenSSL implementation of XTS supports inputs of arbitrary size by applying ciphertext stealing). This is fixed in the upstream version by adding the missing #ifndef XTS_CHAIN_TWEAK around the offending instructions. The upstream code also contains the change applied by Russell to build the code unconditionally, i.e., even if __LINUX_ARM_ARCH__ < 7, but implemented slightly differently. Cc: stable@vger.kernel.org Fixes: e4e7f10bfc40 ("ARM: add support for bit sliced AES using NEON instructions") Reported-by: Adrian Kotelba <adrian.kotelba@gmail.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* proc: introduce proc_mem_open()Oleg Nesterov2016-09-282-15/+23
| | | | | | | | | | | | | | | | Extract the mm_access() code from __mem_open() into the new helper, proc_mem_open(), the next patch will add another caller. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com> Conflicts: fs/proc/base.c
* mempolicy: fix show_numa_map() vs exec() + do_set_mempolicy() raceOleg Nesterov2016-09-281-24/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 9e7814404b77 "hold task->mempolicy while numa_maps scans." fixed the race with the exiting task but this is not enough. The current code assumes that get_vma_policy(task) should either see task->mempolicy == NULL or it should be equal to ->task_mempolicy saved by hold_task_mempolicy(), so we can never race with __mpol_put(). But this can only work if we can't race with do_set_mempolicy(), and thus we can't race with another do_set_mempolicy() or do_exit() after that. However, do_set_mempolicy()->down_write(mmap_sem) can not prevent this race. This task can exec, change it's ->mm, and call do_set_mempolicy() after that; in this case they take 2 different locks. Change hold_task_mempolicy() to use get_task_policy(), it never returns NULL, and change show_numa_map() to use __get_vma_policy() or fall back to proc_priv->task_mempolicy. Note: this is the minimal fix, we will cleanup this code later. I think hold_task_mempolicy() and release_task_mempolicy() should die, we can move this logic into show_numa_map(). Or we can move get_task_policy() outside of ->mmap_sem and !CONFIG_NUMA code at least. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hugh Dickins <hughd@google.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* proc/maps: make vm_is_stack() logic namespace-friendlyOleg Nesterov2016-09-284-24/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Rename vm_is_stack() to task_of_stack() and change it to return "struct task_struct *" rather than the global (and thus wrong in general) pid_t. - Add the new pid_of_stack() helper which calls task_of_stack() and uses the right namespace to report the correct pid_t. Unfortunately we need to define this helper twice, in task_mmu.c and in task_nommu.c. perhaps it makes sense to add fs/proc/util.c and move at least pid_of_stack/task_of_stack there to avoid the code duplication. - Change show_map_vma() and show_numa_map() to use the new helper. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com> Conflicts: fs/proc/task_nommu.c mm/util.c
* proc/maps: replace proc_maps_private->pid with "struct inode *inode"Oleg Nesterov2016-09-283-5/+5
| | | | | | | | | | | | | | | m_start() can use get_proc_task() instead, and "struct inode *" provides more potentially useful info, see the next changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* fs/proc/task_mmu.c: update m->version in the main loop in m_start()Oleg Nesterov2016-09-281-1/+4
| | | | | | | | | | | | | | Change the main loop in m_start() to update m->version. Mostly for consistency, but this can help to avoid the same loop if the very 1st ->show() fails due to seq_overflow(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* fs/proc/task_mmu.c: reintroduce m->version logicOleg Nesterov2016-09-281-6/+19
| | | | | | | | | | | | | | | | | | | | | | | Add the "last_addr" optimization back. Like before, every ->show() method checks !seq_overflow() and sets m->version = vma->vm_start. However, it also checks that m_next_vma(vma) != NULL, otherwise it sets m->version = -1 for the lockless "EOF" fast-path in m_start(). m_start() can simply do find_vma() + m_next_vma() if last_addr is not zero, the code looks clear and simple and this case is clearly separated from "scan vmas" path. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com> Conflicts: fs/proc/task_mmu.c
* fs/proc/task_mmu.c: introduce m_next_vma() helperOleg Nesterov2016-09-281-5/+10
| | | | | | | | | | | | | Extract the tail_vma/vm_next calculation from m_next() into the new trivial helper, m_next_vma(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* fs/proc/task_mmu.c: simplify m_start() to make it readableOleg Nesterov2016-09-281-24/+10
| | | | | | | | | | | | | | | | | | | | | | Now that m->version is gone we can cleanup m_start(). In particular, - Remove the "unsigned long" typecast, m->index can't be negative or exceed ->map_count. But lets use "unsigned int pos" to make it clear that "pos < map_count" is safe. - Remove the unnecessary "vma != NULL" check in the main loop. It can't be NULL unless we have a vm bug. - This also means that "pos < map_count" case can simply return the valid vma and avoid "goto" and subsequent checks. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* fs/proc/task_mmu.c: kill the suboptimal and confusing m->version logicOleg Nesterov2016-09-281-24/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | m_start() carefully documents, checks, and sets "m->version = -1" if we are going to return NULL. The only problem is that we will be never called again if m_start() returns NULL, so this is simply pointless and misleading. Otoh, ->show() methods m->version = 0 if vma == tail_vma and this is just wrong, we want -1 in this case. And in fact we also want -1 if ->vm_next == NULL and ->tail_vma == NULL. And it is not used consistently, the "scan vmas" loop in m_start() should update last_addr too. Finally, imo the whole "last_addr" logic in m_start() looks horrible. find_vma(last_addr) is called unconditionally even if we are not going to use the result. But the main problem is that this code participates in tail_vma-or-NULL mess, and this looks simply unfixable. Remove this optimization. We will add it back after some cleanups. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com> Signed-off-by: Anik1199 <anik9280@gmail.com> Conflicts: fs/proc/task_mmu.c Conflicts: fs/proc/task_mmu.c
* fs/proc/task_mmu.c: shift "priv->task = NULL" from m_start() to m_stop()Oleg Nesterov2016-09-281-6/+3
| | | | | | | | | | | | | | | | | 1. There is no reason to reset ->tail_vma in m_start(), if we return IS_ERR_OR_NULL() it won't be used. 2. m_start() also clears priv->task to ensure that m_stop() won't use the stale pointer if we fail before get_task_struct(). But this is ugly and confusing, move this initialization in m_stop(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* fs/proc/task_mmu.c: cleanup the "tail_vma" horror in m_next()Oleg Nesterov2016-09-281-5/+3
| | | | | | | | | | | | | | | | | | | | 1. Kill the first "vma != NULL" check. Firstly this is not possible, m_next() won't be called if ->start() or the previous ->next() returns NULL. And if it was possible the 2nd "vma != tail_vma" check is buggy, we should not wrongly return ->tail_vma. 2. Make this function readable. The logic is very simple, we should return check "vma != tail" once and return "vm_next || tail_vma". Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* fs/proc/task_mmu.c: simplify the vma_stop() logicOleg Nesterov2016-09-281-16/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | m_start() drops ->mmap_sem and does mmput() if it retuns vsyscall vma. This is because in this case m_stop()->vma_stop() obviously can't use gate_vma->vm_mm. Now that we have proc_maps_private->mm we can simplify this logic: - Change m_start() to return with ->mmap_sem held unless it returns IS_ERR_OR_NULL(). - Change vma_stop() to use priv->mm and avoid the ugly vma checks, this makes "vm_area_struct *vma" unnecessary. - This also allows m_start() to use vm_stop(). - Cleanup m_next() to follow the new locking rule. Note: m_stop() looks very ugly, and this temporary uglifies it even more. Fixed by the next change. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* fs/proc/task_mmu.c: shift mm_access() from m_start() to proc_maps_open()Oleg Nesterov2016-09-282-9/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A simple test-case from Kirill Shutemov cat /proc/self/maps >/dev/null chmod +x /proc/self/net/packet exec /proc/self/net/packet makes lockdep unhappy, cat/exec take seq_file->lock + cred_guard_mutex in the opposite order. It's a false positive and probably we should not allow "chmod +x" on proc files. Still I think that we should avoid mm_access() and cred_guard_mutex in sys_read() paths, security checking should happen at open time. Besides, this doesn't even look right if the task changes its ->mm between m_stop() and m_start(). Add the new "mm_struct *mm" member into struct proc_maps_private and change proc_maps_open() to initialize it using proc_mem_open(). Change m_start() to use priv->mm if atomic_inc_not_zero(mm_users) succeeds or return NULL (eof) otherwise. The only complication is that proc_maps_open() users should additionally do mmdrop() in fop->release(), add the new proc_map_release() helper for that. Note: this is the user-visible change, if the task execs after open("maps") the new ->mm won't be visible via this file. I hope this is fine, and this matches /proc/pid/mem bahaviour. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com> Conflicts: fs/proc/task_mmu.c
* fs/proc/task_mmu.c: unify/simplify do_maps_open() and numa_maps_open()Oleg Nesterov2016-09-281-28/+16
| | | | | | | | | | | | | | | | do_maps_open() and numa_maps_open() are overcomplicated, they could use __seq_open_private(). Plus they do the same, just sizeof(*priv) Change them to use a new simple helper, proc_maps_open(ops, psize). This simplifies the code and allows us to do the next changes. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* fs/proc/task_mmu.c: don't use task->mm in m_start() and show_*map()Oleg Nesterov2016-09-281-5/+3
| | | | | | | | | | | | | | | | | | | | get_gate_vma(priv->task->mm) looks ugly and wrong, task->mm can be NULL or it can changed by exec right after mm_access(). And in theory this race is not harmless, the task can exec and then later exit and free the new mm_struct. In this case get_task_mm(oldmm) can't help, get_gate_vma(task->mm) can read the freed/unmapped memory. I think that priv->task should simply die and hold_task_mempolicy() logic can be simplified. tail_vma logic asks for cleanups too. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* readahead: make context readahead more conservativeFengguang Wu2016-09-281-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This helps performance on moderately dense random reads on SSD. Transaction-Per-Second numbers provided by Taobao: QPS case ------------------------------------------------------- 7536 disable context readahead totally w/ patch: 7129 slower size rampup and start RA on the 3rd read 6717 slower size rampup w/o patch: 5581 unmodified context readahead Before, readahead will be started whenever reading page N+1 when it happen to read N recently. After patch, we'll only start readahead when *three* random reads happen to access pages N, N+1, N+2. The probability of this happening is extremely low for pure random reads, unless they are very dense, which actually deserves some readahead. Also start with a smaller readahead window. The impact to interleaved sequential reads should be small, because for a long run stream, the the small readahead window rampup phase is negletable. The context readahead actually benefits clustered random reads on HDD whose seek cost is pretty high. However as SSD is increasingly used for random read workloads it's better for the context readahead to concentrate on interleaved sequential reads. Another SSD rand read test from Miao # file size: 2GB # read IO amount: 625MB sysbench --test=fileio \ --max-requests=10000 \ --num-threads=1 \ --file-num=1 \ --file-block-size=64K \ --file-test-mode=rndrd \ --file-fsync-freq=0 \ --file-fsync-end=off run shows the performance of btrfs grows up from 69MB/s to 121MB/s, ext4 from 104MB/s to 121MB/s. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Tested-by: Tao Ma <tm@tao.ma> Tested-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: flar2 <asegaert@gmail.com> Signed-off-by: Brandon Berhent <bbedward@gmail.com> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* mmc: core: fix race between mmc_power_off and mmc_power_upSahitya Tummala2016-09-281-1/+8
| | | | | | | | | | | | | | In case card detect IRQ is triggered before mmc_add_host(), then there could be a potential race between mmc_power_off() which gets called from mmc_rescan() of card detect IRQ handler and mmc_start_host() which gets called from mmc_add_host(). This may turn off the clocks while mmc_start_host() is still running and thus may result in an un-clocked register access. Change-Id: I522df75ba0ad23f065127e015ad825075be94596 Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* slub: do not assert not having lock in removing freed partialSteven Rostedt2016-09-281-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vladimir reported the following issue: Commit c65c1877bd68 ("slub: use lockdep_assert_held") requires remove_partial() to be called with n->list_lock held, but free_partial() called from kmem_cache_close() on cache destruction does not follow this rule, leading to a warning: WARNING: CPU: 0 PID: 2787 at mm/slub.c:1536 __kmem_cache_shutdown+0x1b2/0x1f0() Modules linked in: CPU: 0 PID: 2787 Comm: modprobe Tainted: G W 3.14.0-rc1-mm1+ #1 Hardware name: 0000000000000600 ffff88003ae1dde8 ffffffff816d9583 0000000000000600 0000000000000000 ffff88003ae1de28 ffffffff8107c107 0000000000000000 ffff880037ab2b00 ffff88007c240d30 ffffea0001ee5280 ffffea0001ee52a0 Call Trace: __kmem_cache_shutdown+0x1b2/0x1f0 kmem_cache_destroy+0x43/0xf0 xfs_destroy_zones+0x103/0x110 [xfs] exit_xfs_fs+0x38/0x4e4 [xfs] SyS_delete_module+0x19a/0x1f0 system_call_fastpath+0x16/0x1b His solution was to add a spinlock in order to quiet lockdep. Although there would be no contention to adding the lock, that lock also requires disabling of interrupts which will have a larger impact on the system. Instead of adding a spinlock to a location where it is not needed for lockdep, make a __remove_partial() function that does not test if the list_lock is held, as no one should have it due to it being freed. Also added a __add_partial() function that does not do the lock validation either, as it is not needed for the creation of the cache. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Reported-by: Vladimir Davydov <vdavydov@parallels.com> Suggested-by: David Rientjes <rientjes@google.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* mm: slub: work around unneeded lockdep warningDave Hansen2016-09-281-3/+6
| | | | | | | | | | | | | | | | | | | | | | The slub code does some setup during early boot in early_kmem_cache_node_alloc() with some local data. There is no possible way that another CPU can see this data, so the slub code doesn't unnecessarily lock it. However, some new lockdep asserts check to make sure that add_partial() _always_ has the list_lock held. Just add the locking, even though it is technically unnecessary. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Russell King <linux@arm.linux.org.uk> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com> Signed-off-by: Anik1199 <anik9280@gmail.com> Conflicts: mm/slub.c
* ARM: 8191/1: decompressor: ensure I-side picks up relocated codeWill Deacon2016-09-281-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | To speed up decompression, the decompressor sets up a flat, cacheable mapping of memory. However, when there is insufficient space to hold the page tables for this mapping, we don't bother to enable the caches and subsequently skip all the cache maintenance hooks. Skipping the cache maintenance before jumping to the relocated code allows the processor to predict the branch and populate the I-cache with stale data before the relocation loop has completed (since a bootloader may have SCTLR.I set, which permits normal, cacheable instruction fetches regardless of SCTLR.M). This patch moves the cache maintenance check into the maintenance routines themselves, allowing the v6/v7 versions to invalidate the I-cache regardless of the MMU state. Cc: <stable@vger.kernel.org> Reported-by: Marc Carino <marc.ceeeee@gmail.com> Tested-by: Julien Grall <julien.grall@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Chet Kener <Cl3Kener@gmail.com> Signed-off-by: engstk <eng.stk@sapo.pt>
* slub: don't use cpu partial pages on UPUwe Kleine-König2016-09-281-1/+1
| | | | | | | | | | cpu partial pages are used to avoid contention which does not exist in the UP case. So let SLUB_CPU_PARTIAL depend on SMP. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* slab: do not panic if we fail to create memcg cacheVladimir Davydov2016-09-281-1/+8
| | | | | | | | | | | | | | | | | There is no point in flooding logs with warnings or especially crashing the system if we fail to create a cache for a memcg. In this case we will be accounting the memcg allocation to the root cgroup until we succeed to create its own cache, but it isn't that critical. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* slab: clean up kmem_cache_create_memcg() error handlingVladimir Davydov2016-09-281-34/+31
| | | | | | | | | | | | | | | | | | | | | | | | | Currently kmem_cache_create_memcg() backoffs on failure inside conditionals, without using gotos. This results in the rollback code duplication, which makes the function look cumbersome even though on error we should only free the allocated cache. Since in the next patch I am going to add yet another rollback function call on error path there, let's employ labels instead of conditionals for undoing any changes on failure to keep things clean. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Glauber Costa <glommer@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com> Conflicts: mm/slab_common.c Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* mm/slab: Give s_next and s_stop slab-specific namesWanpeng Li2016-09-283-8/+8
| | | | | | | | | Give s_next and s_stop slab-specific names instead of exporting "s_next" and "s_stop". Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* mm/slab: Fix /proc/slabinfo unwriteable for slabWanpeng Li2016-09-281-1/+9
| | | | | | | | | | | | Slab have some tunables like limit, batchcount, and sharedfactor can be tuned through function slabinfo_write. Commit (b7454ad3: mm/sl[au]b: Move slabinfo processing to slab_common.c) uncorrectly change /proc/slabinfo unwriteable for slab, this patch fix it by revert to original mode. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* mm/slab: Sharing s_next and s_stop between slab and slubWanpeng Li2016-09-283-12/+5
| | | | | | | | | This patch shares s_next and s_stop between slab and slub. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* topology: add support for node_to_mem_node() to determine the fallback nodeJoonsoo Kim2016-09-282-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Anton noticed (http://www.spinics.net/lists/linux-mm/msg67489.html) that on ppc LPARs with memoryless nodes, a large amount of memory was consumed by slabs and was marked unreclaimable. He tracked it down to slab deactivations in the SLUB core when we allocate remotely, leading to poor efficiency always when memoryless nodes are present. After much discussion, Joonsoo provided a few patches that help significantly. They don't resolve the problem altogether: - memory hotplug still needs testing, that is when a memoryless node becomes memory-ful, we want to dtrt - there are other reasons for going off-node than memoryless nodes, e.g., fully exhausted local nodes Neither case is resolved with this series, but I don't think that should block their acceptance, as they can be explored/resolved with follow-on patches. The series consists of: [1/3] topology: add support for node_to_mem_node() to determine the fallback node [2/3] slub: fallback to node_to_mem_node() node if allocating on memoryless node - Joonsoo's patches to cache the nearest node with memory for each NUMA node [3/3] Partial revert of 81c98869faa5 (""kthread: ensure locality of task_struct allocations") - At Tejun's request, keep the knowledge of memoryless node fallback to the allocator core. This patch (of 3): We need to determine the fallback node in slub allocator if the allocation target node is memoryless node. Without it, the SLUB wrongly select the node which has no memory and can't use a partial slab, because of node mismatch. Introduced function, node_to_mem_node(X), will return a node Y with memory that has the nearest distance. If X is memoryless node, it will return nearest distance node, but, if X is normal node, it will return itself. We will use this function in following patch to determine the fallback node. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Han Pingtian <hanpt@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Lameter <cl@linux.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* slub: fall back to node_to_mem_node() node if allocating on memoryless nodeJoonsoo Kim2016-09-281-6/+18
| | | | | | | | | | | | | | | | | | | | | | | Update the SLUB code to search for partial slabs on the nearest node with memory in the presence of memoryless nodes. Additionally, do not consider it to be an ALLOC_NODE_MISMATCH (and deactivate the slab) when a memoryless-node specified allocation goes off-node. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Han Pingtian <hanpt@linux.vnet.ibm.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Anton Blanchard <anton@samba.org> Cc: Christoph Lameter <cl@linux.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* slub: disable tracing and failslab for merged slabsChristoph Lameter2016-09-281-0/+11
| | | | | | | | | | | | | | | | | | Tracing of mergeable slabs as well as uses of failslab are confusing since the objects of multiple slab caches will be affected. Moreover this creates a situation where a mergeable slab will become unmergeable. If tracing or failslab testing is desired then it may be best to switch merging off for starters. Signed-off-by: Christoph Lameter <cl@linux.com> Tested-by: WANG Chao <chaowang@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* slub: remove kmemcg id from create_unique_idVladimir Davydov2016-09-281-6/+0
| | | | | | | | | | | | | | | | This function is never called for memcg caches, because they are unmergeable, so remove the dead code. Signed-off-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux.com> Reviewed-by: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* slab: fix the alias count (via sysfs) of slab cacheGu Zheng2016-09-281-1/+1
| | | | | | | | | | | | | | | We mark some slab caches (e.g. kmem_cache_node) as unmergeable by setting refcount to -1, and their alias should be 0, not refcount-1, so correct it here. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* slub: avoid duplicate creation on the first objectWei Yang2016-09-281-8/+11
| | | | | | | | | | | | | | | | | | | | When a kmem_cache is created with ctor, each object in the kmem_cache will be initialized before ready to use. While in slub implementation, the first object will be initialized twice. This patch reduces the duplication of initialization of the first object. Fix commit 7656c72b ("SLUB: add macros for scanning objects in a slab"). Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* mm, slub: mark resiliency_test as init textDavid Rientjes2016-09-281-1/+1
| | | | | | | | | | | | | resiliency_test() is only called for bootstrap, so it may be moved to init.text and freed after boot. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* slub: fix off by one in number of slab testsJoonsoo Kim2016-09-281-3/+3
| | | | | | | | | | | | | | | | | | | | | | | min_partial means minimum number of slab cached in node partial list. So, if nr_partial is less than it, we keep newly empty slab on node partial list rather than freeing it. But if nr_partial is equal or greater than it, it means that we have enough partial slabs so should free newly empty slab. Current implementation missed the equal case so if we set min_partial is 0, then, at least one slab could be cached. This is critical problem to kmemcg destroying logic because it doesn't works properly if some slabs is cached. This patch fixes this problem. Fixes 91cb69620284 ("slub: make dead memcg caches discard free slabs immediately"). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* slub: search partial list on numa_mem_id(), instead of numa_node_id()Joonsoo Kim2016-09-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | Currently, if allocation constraint to node is NUMA_NO_NODE, we search a partial slab on numa_node_id() node. This doesn't work properly on a system having memoryless nodes, since it can have no memory on that node so there must be no partial slab on that node. On that node, page allocation always falls back to numa_mem_id() first. So searching a partial slab on numa_node_id() in that case is the proper solution for the memoryless node case. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Han Pingtian <hanpt@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* mm, slab_common: add 'unlikely' to size check of kmalloc_slab()Joonsoo Kim2016-09-281-1/+1
| | | | | | | | | | Size is usually below than KMALLOC_MAX_SIZE. If we add a 'unlikely' macro, compiler can make better code. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* mm: slub: fix ALLOC_SLOWPATH statDave Hansen2016-09-281-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There used to be only one path out of __slab_alloc(), and ALLOC_SLOWPATH got bumped in that exit path. Now there are two, and a bunch of gotos. ALLOC_SLOWPATH can now get set more than once during a single call to __slab_alloc() which is pretty bogus. Here's the sequence: 1. Enter __slab_alloc(), fall through all the way to the stat(s, ALLOC_SLOWPATH); 2. hit 'if (!freelist)', and bump DEACTIVATE_BYPASS, jump to new_slab (goto #1) 3. Hit 'if (c->partial)', bump CPU_PARTIAL_ALLOC, goto redo (goto #2) 4. Fall through in the same path we did before all the way to stat(s, ALLOC_SLOWPATH) 5. bump ALLOC_REFILL stat, then return Doing this is obviously bogus. It keeps us from being able to accurately compare ALLOC_SLOWPATH vs. ALLOC_FASTPATH. It also means that the total number of allocs always exceeds the total number of frees. This patch moves stat(s, ALLOC_SLOWPATH) to be called from the same place that __slab_alloc() is. This makes it much less likely that ALLOC_SLOWPATH will get botched again in the spaghetti-code inside __slab_alloc(). Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>
* mm, slab: suppress out of memory warning unless debug is enabledDavid Rientjes2016-09-282-14/+25
| | | | | | | | | | | | | | | | | | | | | | | | When the slab or slub allocators cannot allocate additional slab pages, they emit diagnostic information to the kernel log such as current number of slabs, number of objects, active objects, etc. This is always coupled with a page allocation failure warning since it is controlled by !__GFP_NOWARN. Suppress this out of memory warning if the allocator is configured without debug supported. The page allocation failure warning will indicate it is a failed slab allocation, the order, and the gfp mask, so this is only useful to diagnose allocator issues. Since CONFIG_SLUB_DEBUG is already enabled by default for the slub allocator, there is no functional change with this patch. If debug is disabled, however, the warnings are now suppressed. Signed-off-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: W4TCH0UT <ateekujjawal@gmail.com>