aboutsummaryrefslogtreecommitdiff
path: root/include/linux
Commit message (Collapse)AuthorAgeFilesLines
* pmem: cleanup last bits of itMister Oyster2017-12-241-111/+0
|
* fsync: revert to a version with fbsuspend instead of earlysuspendMister Oyster2017-12-231-6/+0
| | | | | This reverts commit fdd3da71eb0fae04ef0e67cd02eab371a26449c6. This reverts commit 384cf00787041088f91a0604dec112317135a369.
* Provide a binary to hex conversion functionDavid Howells2017-12-221-0/+1
| | | | | | | | Provide a function to convert a buffer of binary data into an unterminated ascii hex string representation of that data. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* compiler.h: add support for malloc attributeRasmus Villemoes2017-12-182-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc as far back as at least 3.04 documents the function attribute __malloc__. Add a shorthand for attaching that to a function declaration. This was also suggested by Andi Kleen way back in 2002 [1], but didn't get applied, perhaps because gcc at that time generated the exact same code with and without this attribute. This attribute tells the compiler that the return value (if non-NULL) can be assumed not to alias any other valid pointers at the time of the call. Please note that the documentation for a range of gcc versions (starting from around 4.7) contained a somewhat confusing and self-contradicting text: The malloc attribute is used to tell the compiler that a function may be treated as if any non-NULL pointer it returns cannot alias any other pointer valid when the function returns and *that the memory has undefined content*. [...] Standard functions with this property include malloc and *calloc*. (emphasis mine). The intended meaning has later been clarified [2]: This tells the compiler that a function is malloc-like, i.e., that the pointer P returned by the function cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P. What this means is that we can apply the attribute to kmalloc and friends, and it is ok for the returned memory to have well-defined contents (__GFP_ZERO). But it is not ok to apply it to kmemdup(), nor to other functions which both allocate and possibly initialize the memory with existing pointers. So unless someone is doing something pretty perverted kstrdup() should also be a fine candidate. [1] http://thread.gmane.org/gmane.linux.kernel/57172 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* compiler-gcc: disable -ftracer for __noclone functionsPaolo Bonzini2017-12-181-1/+1
| | | | | | | | | | | | | | | | | | | | -ftracer can duplicate asm blocks causing compilation to fail in noclone functions. For example, KVM declares a global variable in an asm like asm("2: ... \n .pushsection data \n .global vmx_return \n vmx_return: .long 2b"); and -ftracer causes a double declaration. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: stable@vger.kernel.org Cc: kvm@vger.kernel.org Reported-by: Linda Walsh <lkml@tlinx.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* mm: page_alloc: add kasan hooks on alloc and free pathsAndrey Ryabinin2017-12-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add kernel address sanitizer hooks to mark allocated page's addresses as accessible in corresponding shadow region. Mark freed pages as inaccessible. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* compiler-gcc.h: use "proved" instead of "proofed"Benjamin Peterson2017-12-181-1/+1
| | | | | | | Link: http://lkml.kernel.org/r/1477894241.1103202.772260161.1B0A5995@webmail.messagingengine.com Signed-off-by: Benjamin Peterson <bp@benjamin.pe> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* compiler-gcc.h: neateningJoe Perches2017-12-181-40/+45
| | | | | | | | | | | | | | | | | | | | | | - Move the inline and noinline blocks together - Comment neatening - Alignment of __attribute__ uses - Consistent naming of __must_be_array macro argument - Multiline macro neatening Signed-off-by: Joe Perches <joe@perches.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Alan Modra <amodra@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* compiler: introduce __alias(symbol) shortcutAndrey Ryabinin2017-12-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | To be consistent with other compiler attributes introduce __alias(symbol) macro expanding into __attribute__((alias(#symbol))) Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lib: make memzero_explicit more robust against dead store eliminationDaniel Borkmann2017-12-183-1/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in case LTO would decide to inline memzero_explicit() and eventually find out it could be elimiated as dead store. While using barrier() works well for the case of gcc, recent efforts from LLVMLinux people suggest to use llvm as an alternative to gcc, and there, Stephan found in a simple stand-alone user space example that llvm could nevertheless optimize and thus elimitate the memset(). A similar issue has been observed in the referenced llvm bug report, which is regarded as not-a-bug. Based on some experiments, icc is a bit special on its own, while it doesn't seem to eliminate the memset(), it could do so with an own implementation, and then result in similar findings as with llvm. The fix in this patch now works for all three compilers (also tested with more aggressive optimization levels). Arguably, in the current kernel tree it's more of a theoretical issue, but imho, it's better to be pedantic about it. It's clearly visible with gcc/llvm though, with the below code: if we would have used barrier() only here, llvm would have omitted clearing, not so with barrier_data() variant: static inline void memzero_explicit(void *s, size_t count) { memset(s, 0, count); barrier_data(s); } int main(void) { char buff[20]; memzero_explicit(buff, sizeof(buff)); return 0; } $ gcc -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x0000000000400400 <+0>: lea -0x28(%rsp),%rax 0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp) 0x000000000040040e <+14>: movq $0x0,-0x20(%rsp) 0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp) 0x000000000040041f <+31>: xor %eax,%eax 0x0000000000400421 <+33>: retq End of assembler dump. $ clang -O2 test.c $ gdb a.out (gdb) disassemble main Dump of assembler code for function main: 0x00000000004004f0 <+0>: xorps %xmm0,%xmm0 0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp) 0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp) 0x0000000000400500 <+16>: lea -0x18(%rsp),%rax 0x0000000000400505 <+21>: xor %eax,%eax 0x0000000000400507 <+23>: retq End of assembler dump. As gcc, clang, but also icc defines __GNUC__, it's sufficient to define this in compiler-gcc.h only to be picked up. For a fallback or otherwise unsupported compiler, we define it as a barrier. Similarly, for ecc which does not support gcc inline asm. Reference: https://llvm.org/bugs/show_bug.cgi?id=15495 Reported-by: Stephan Mueller <smueller@chronox.de> Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Stephan Mueller <smueller@chronox.de> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: mancha security <mancha1@zoho.com> Cc: Mark Charlebois <charlebm@gmail.com> Cc: Behan Webster <behanw@converseincode.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* UPSTREAM: include/linux/mm.h: add PAGE_ALIGNED() helperAndrew Morton2017-12-181-0/+3
| | | | | | | | | | | | | To test whether an address is aligned to PAGE_SIZE. Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com>, Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 0fa73b86ef0797ca4fde5334117ca0b330f08030) Bug: 36007193 Change-Id: I7e912bb0dbd8c9737fb13c5b48acb54ee39dd5fc
* include/linux/compiler-gcc.h: improve __visible documentationAndrew Morton2017-12-181-1/+4
| | | | | | | Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
* include/linux/compiler-gcc.h: hide assume_aligned attribute from sparseRasmus Villemoes2017-12-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The patch "slab.h: sprinkle __assume_aligned attributes" causes *tons* of whinges if you do 'make C=2' with sparse 0.5.0: CHECK drivers/media/usb/pwc/pwc-if.c include/linux/slab.h:307:43: error: attribute '__assume_aligned__': unknown attribute include/linux/slab.h:308:58: error: attribute '__assume_aligned__': unknown attribute include/linux/slab.h:337:73: error: attribute '__assume_aligned__': unknown attribute include/linux/slab.h:375:74: error: attribute '__assume_aligned__': unknown attribute include/linux/slab.h:378:80: error: attribute '__assume_aligned__': unknown attribute sparse apparently pretends to be gcc >= 4.9, yet isn't prepared to handle all the function attributes supported by those gccs and complains loudly. So hide the definition of __assume_aligned from it (so that the generic one in compiler.h gets used). Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Tested-By: Valdis Kletnieks <valdis.kletnieks@vt.edu> Cc: Christopher Li <sparse@chrisli.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
* compiler.h: add support for function attribute assume_alignedRasmus Villemoes2017-12-182-0/+25
| | | | | | | | | | | | | | | | gcc 4.9 added the function attribute assume_aligned, indicating to the caller that the returned pointer may be assumed to have a certain minimal alignment. This is useful if, for example, the return value is passed to memset(). Add a shorthand macro for that. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
* compiler-gcc: integrate the various compiler-gcc[34567].h filesJoe Perches2017-12-182-65/+11
| | | | | | | | | | | | | | | | | | | | As gcc major version numbers are going to advance rather rapidly in the future, there's no real value in separate files for each compiler version. Deduplicate some of the macros #defined in each file too. Neaten comments using normal kernel commenting style. Signed-off-by: Joe Perches <joe@perches.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Alan Modra <amodra@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* crypto: api - prevent helper ciphers from being usedStephan Mueller2017-12-141-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several hardware related cipher implementations are implemented as follows: a "helper" cipher implementation is registered with the kernel crypto API. Such helper ciphers are never intended to be called by normal users. In some cases, calling them via the normal crypto API may even cause failures including kernel crashes. In a normal case, the "wrapping" ciphers that use the helpers ensure that these helpers are invoked such that they cannot cause any calamity. Considering the AF_ALG user space interface, unprivileged users can call all ciphers registered with the crypto API, including these helper ciphers that are not intended to be called directly. That means, with AF_ALG user space may invoke these helper ciphers and may cause undefined states or side effects. To avoid any potential side effects with such helpers, the patch prevents the helpers to be called directly. A new cipher type flag is added: CRYPTO_ALG_INTERNAL. This flag shall be used to mark helper ciphers. These ciphers can only be used if the caller invoke the cipher with CRYPTO_ALG_INTERNAL in the type and mask field. Change-Id: I5fd76b3e7c83d064cce9f61340d57bdaa14171ba Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* BACKPORT: dentry name snapshotsAl Viro2017-12-082-31/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 49d31c2f389acfe83417083e1208422b4091cd9e upstream. take_dentry_name_snapshot() takes a safe snapshot of dentry name; if the name is a short one, it gets copied into caller-supplied structure, otherwise an extra reference to external name is grabbed (those are never modified). In either case the pointer to stable string is stored into the same structure. dentry must be held by the caller of take_dentry_name_snapshot(), but may be freely dropped afterwards - the snapshot will stay until destroyed by release_dentry_name_snapshot(). Intended use: struct name_snapshot s; take_dentry_name_snapshot(&s, dentry); ... access s.name ... release_dentry_name_snapshot(&s); Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name to pass down with event. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [carnil: backport 4.9: adjust context] [bwh: Backported to 3.16: - External names are not ref-counted, so copy them - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [ghackmann@google.com: backported to 3.10: adjust context] Signed-off-by: Greg Hackmann <ghackmann@google.com> Change-Id: I612e687cbffa1a03107331a6b3f00911ffbebd8e Bug: 63689921
* seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computingAndy Lutomirski2017-12-051-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The secure_computing function took a syscall number parameter, but it only paid any attention to that parameter if seccomp mode 1 was enabled. Rather than coming up with a kludge to get the parameter to work in mode 2, just remove the parameter. To avoid churn in arches that don't have seccomp filters (and may not even support syscall_get_nr right now), this leaves the parameter in secure_computing_strict, which is now a real function. For ARM, this is a bit ugly due to the fact that ARM conditionally supports seccomp filters. Fixing that would probably only be a couple of lines of code, but it should be coordinated with the audit maintainers. This will be a slight slowdown on some arches. The right fix is to pass in all of seccomp_data instead of trying to make just the syscall nr part be fast. This is a prerequisite for making two-phase seccomp work cleanly. Cc: Russell King <linux@arm.linux.org.uk> Cc: linux-arm-kernel@lists.infradead.org Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: linux-s390@vger.kernel.org Cc: x86@kernel.org Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Kees Cook <keescook@chromium.org>
* workqueue: Remove deprecated flush[_delayed]_work_sync()Jingoo Han2017-12-051-12/+0
| | | | | | | | | | flush[_delayed]_work_sync() were deprecated by 4382973 ("workqueue: deprecate flush[_delayed]_work_sync()") and have been deprecated for a long time. In addition, these are not used anymore. So, let's remove these functions. Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* workqueue: Spelling s/instensive/intensive/Geert Uytterhoeven2017-12-051-1/+1
| | | | | Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Tejun Heo <tj@kernel.org>
* workqueue: Remove deprecated __cancel_delayed_work()Tan Xiaojun2017-12-051-15/+0
| | | | | | | | | | | | __cancel_delayed_work() was deprecated by 136b5721d75a ("workqueue: deprecate __cancel_delayed_work()") as cancel_delayed_work() was updated so that it could be used from all contexts. Enough time has passed since the deprecation. Let's remove it. tj: description update Signed-off-by: Tan Xiaojun <tanxiaojun@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* workqueue.h: remove loops of single statement macrosValentin Rothberg2017-12-051-6/+2
| | | | | | | | | | checkpatch.pl complained about two single statement macros in do while (0) loops. The loops and the trailing semicolons are now removed, which makes checkpatch happy and the two macros consistent with the rest of the file. Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* workqueue: remove unused WORK_CPU_ENDLai Jiangshan2017-12-051-2/+1
| | | | | | | | | | | | | | | WORK_CPU_END is totally unused since 4e8b22bd1a37 ("workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues"). It should be removed. After it is removed, the comment "special cpu IDs" is not precise due to there is only one special CPU ID (WORK_CPU_UNBOUND) left, so we also change this comment to the description for WORK_CPU_UNBOUND. tj: Minor description and comment tweaks. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* workqueue: move flush_scheduled_work() to workqueue.hLai Jiangshan2017-12-051-1/+29
| | | | | | | flush_scheduled_work() is just a simple call to flush_work(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* workqueue: remove unused work_clear_pending()Lai Jiangshan2017-12-051-7/+0
| | | | | | | | | | | | In 8930caba3dbd ("workqueue: disable irq while manipulating PENDING"), setting last CPU and clearing PENDING got merged into a single operation (set_work_cpu_and_clear_pending()), which resulted that the internal routine work_clear_pending() is not used any more. tj: Minor description tweak. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* workqueue: Provide destroy_delayed_work_on_stack()Thomas Gleixner2017-12-051-0/+2
| | | | | | | | | | | | If a delayed or deferrable work is on stack we need to tell debug objects that we are destroying the timer and the work. Otherwise we leak the tracking object. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Acked-by: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20140323141939.911487677@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* initramfs: Add skip_initramfs command line optionRom Lemarchand2017-12-051-0/+32
| | | | | | | | Add a skip_initramfs option to allow choosing whether to boot using the initramfs or not at runtime. Change-Id: If30428fa748c1d4d3d7b9d97c1f781de5e4558c3 Signed-off-by: Rom Lemarchand <romlem@google.com>
* update ion driverDerTeufel2017-11-201-127/+139
| | | | Signed-off-by: Mister Oyster <oysterized@gmail.com>
* ipv6: add option to drop unsolicited neighbor advertisementsJohannes Berg2017-11-181-1/+2
| | | | | | | | | | | | | In certain 802.11 wireless deployments, there will be NA proxies that use knowledge of the network to correctly answer requests. To prevent unsolicitd advertisements on the shared medium from being a problem, on such deployments wireless needs to drop them. Enable this by providing an option called "drop_unsolicited_na". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: danielhk <daniel_hk>
* ipv4: add option to drop gratuitous ARP packetsJohannes Berg2017-11-181-0/+1
| | | | | | | | | | | | | In certain 802.11 wireless deployments, there will be ARP proxies that use knowledge of the network to correctly answer requests. To prevent gratuitous ARP frames on the shared medium from being a problem, on such deployments wireless needs to drop them. Enable this by providing an option called "drop_gratuitous_arp". Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: danielhk <daniel_hk>
* ipv6: add option to drop unicast encapsulated in L2 multicastJohannes Berg2017-11-181-0/+1
| | | | | | | | | | | | | | | In order to solve a problem with 802.11, the so-called hole-196 attack, add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if enabled, causes the stack to drop IPv6 unicast packets encapsulated in link-layer multi- or broadcast frames. Such frames can (as an attack) be created by any member of the same wireless network and transmitted as valid encrypted frames since the symmetric key for broadcast frames is shared between all stations. Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: danielhk <daniel_hk>
* ipv4: add option to drop unicast encapsulated in L2 multicastJohannes Berg2017-11-181-0/+1
| | | | | | | | | | | | | | | | | | In order to solve a problem with 802.11, the so-called hole-196 attack, add an option (sysctl) called "drop_unicast_in_l2_multicast" which, if enabled, causes the stack to drop IPv4 unicast packets encapsulated in link-layer multi- or broadcast frames. Such frames can (as an attack) be created by any member of the same wireless network and transmitted as valid encrypted frames since the symmetric key for broadcast frames is shared between all stations. Additionally, enabling this option provides compliance with a SHOULD clause of RFC 1122. Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: danielhk <daniel_hk>
* workqueue: implicit ordered attribute should be overridableTejun Heo2017-11-061-1/+3
| | | | | | | | | | | | | | | | | | | | | commit 0a94efb5acbb6980d7c9ab604372d93cd507e4d8 upstream. 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") automatically enabled ordered attribute for unbound workqueues w/ max_active == 1. Because ordered workqueues reject max_active and some attribute changes, this implicit ordered mode broke cases where the user creates an unbound workqueue w/ max_active == 1 and later explicitly changes the related attributes. This patch distinguishes explicit and implicit ordered setting and overrides from attribute changes if implict. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") Cc: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* KEYS: prevent creating a different user's keyringsEric Biggers2017-11-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 237bbd29f7a049d310d907f4b2716a7feef9abf3 upstream. It was possible for an unprivileged user to create the user and user session keyrings for another user. For example: sudo -u '#3000' sh -c 'keyctl add keyring _uid.4000 "" @u keyctl add keyring _uid_ses.4000 "" @u sleep 15' & sleep 1 sudo -u '#4000' keyctl describe @u sudo -u '#4000' keyctl describe @us This is problematic because these "fake" keyrings won't have the right permissions. In particular, the user who created them first will own them and will have full access to them via the possessor permissions, which can be used to compromise the security of a user's keys: -4: alswrv-----v------------ 3000 0 keyring: _uid.4000 -5: alswrv-----v------------ 3000 0 keyring: _uid_ses.4000 Fix it by marking user and user session keyrings with a flag KEY_FLAG_UID_KEYRING. Then, when searching for a user or user session keyring by name, skip all keyrings that don't have the flag set. Fixes: 69664cf16af4 ("keys: don't generate user and user session keyrings unless they're accessed") Cc: <stable@vger.kernel.org> [v2.6.26+] Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com> [wt: adjust context] Signed-off-by: Willy Tarreau <w@1wt.eu>
* FIQ and sysrq default deauthorizedMark Salyzyn2017-10-211-1/+1
| | | | | | | | (cherry picked from commit c6389b11ae779d9f360d2c6555037e1d73c0fe1b) Signed-off-by: Mark Salyzyn <salyzyn@google.com> Bug: 36101220 Change-Id: I9f0ae9a9e3828dedf4b933bf1d752c98777cfa91
* f2fs: support journalled quotaChao Yu2017-10-041-0/+17
| | | | | | | | | | | | | This patch supports to enable f2fs to accept quota information through mount option: - {usr,grp,prj}jquota=<quota file path> - jqfmt=<quota type> Then, in ->mount flow, we can recover quota file during log replaying, by this, journelled quota can be supported. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support inode checksumChao Yu2017-10-041-0/+1
| | | | | | | | This patch adds to support inode checksum in f2fs. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix verification flow] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support project quotaChao Yu2017-10-041-0/+3
| | | | | | | This patch adds to support plain project quota. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: enhance on-disk inode structure scalabilityChao Yu2017-10-041-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline to indicate that on-disk structure of current inode is extended. In order to extend, we changed the inode structure a bit: Original one: struct f2fs_inode { ... struct f2fs_extent i_ext; __le32 i_addr[DEF_ADDRS_PER_INODE]; __le32 i_nid[DEF_NIDS_PER_INODE]; } Extended one: struct f2fs_inode { ... struct f2fs_extent i_ext; union { struct { __le16 i_extra_isize; __le16 i_padding; __le32 i_extra_end[0]; }; __le32 i_addr[DEF_ADDRS_PER_INODE]; }; __le32 i_nid[DEF_NIDS_PER_INODE]; } Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of i_addr field for storing i_extra_isize and i_padding. with i_extra_isize, we can calculate actual size of reserved space in i_addr, available attribute fields included in total extra attribute fields for current inode can be described as below: +--------------------+ | .i_mode | | ... | | .i_ext | +--------------------+ | .i_extra_isize |-----+ | .i_padding | | | .i_prjid | | | .i_atime_extra | | | .i_ctime_extra | | | .i_mtime_extra |<----+ | .i_inode_cs |<----- store blkaddr/inline from here | .i_xattr_cs | | ... | +--------------------+ | | | block address | | | +--------------------+ | .i_nid | +--------------------+ | node_footer | | (nid, ino, offset) | +--------------------+ Hence, with this patch, we would enhance scalability of f2fs inode for storing more newly added attribute. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: make max inline size changeableChao Yu2017-10-041-22/+1
| | | | | | | | | | | | | | | | | This patch tries to make below macros calculating max inline size, inline dentry field size considerring reserving size-changeable space: - MAX_INLINE_DATA - NR_INLINE_DENTRY - INLINE_DENTRY_BITMAP_SIZE - INLINE_RESERVED_SIZE Then, when inline_{data,dentry} options is enabled, it allows us to reserve inline space with different size flexibly for adding newly introduced inode attribute. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* CHROMIUM: remove Android's cgroup generic permissions checksDmitry Torokhov2017-09-301-17/+0
| | | | | | | | | | | | | | | | | | | | The implementation is utterly broken, resulting in all processes being allows to move tasks between sets (as long as they have access to the "tasks" attribute), and upstream is heading towards checking only capability anyway, so let's get rid of this code. BUG=b:31790445,chromium:647994 TEST=Boot android container, examine logcat Change-Id: I2f780a5992c34e52a8f2d0b3557fc9d490da2779 Signed-off-by: Dmitry Torokhov <dtor@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/394967 Reviewed-by: Ricky Zhou <rickyz@chromium.org> Reviewed-by: John Stultz <john.stultz@linaro.org> (cherry picked from commit 6895149f8bf0719aa70487e285fa6a8ad3d2692d) Reviewed-on: https://chromium-review.googlesource.com/399858 Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Mister Oyster <oysterized@gmail.com>
* include: linux: Remove unneeded zram_drv.halexlartsev192017-09-281-108/+0
|
* lz4k: Remove lz4k modulealexlartsev192017-09-261-27/+0
|
* debugfs: add get/set for atomic typesSeth Jennings2017-09-261-0/+2
| | | | | | | | | | | | | | | | | | | debugfs currently lack the ability to create attributes that set/get atomic_t values. This patch adds support for this through a new debugfs_create_atomic_t() function. Change-Id: I60cb007e9a67a410771e0ad78621f0875cb6d48c Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Git-commit: 3a76e5e09fbb51e756b4e732e3e65446f4984cf5 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
* zram: delete custom lzo/lz4Sergey Senozhatsky2017-09-251-15/+0
| | | | | | | | | | | | | Remove lzo/lz4 backends, we use crypto API now. [sergey.senozhatsky@gmail.com: zram-delete-custom-lzo-lz4-v3] Link: http://lkml.kernel.org/r/20160604024902.11778-6-sergey.senozhatsky@gmail.com Link: http://lkml.kernel.org/r/20160531122017.2878-7-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: switch to crypto compress APISergey Senozhatsky2017-09-251-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't have an idle zstreams list anymore and our write path now works absolutely differently, preventing preemption during compression. This removes possibilities of read paths preempting writes at wrong places (which could badly affect the performance of both paths) and at the same time opens the door for a move from custom LZO/LZ4 compression backends implementation to a more generic one, using crypto compress API. Joonsoo Kim [1] attempted to do this a while ago, but faced with the need of introducing a new crypto API interface. The root cause was the fact that crypto API compression algorithms require a compression stream structure (in zram terminology) for both compression and decompression ops, while in reality only several of compression algorithms really need it. This resulted in a concept of context-less crypto API compression backends [2]. Both write and read paths, though, would have been executed with the preemption enabled, which in the worst case could have resulted in a decreased worst-case performance, e.g. consider the following case: CPU0 zram_write() spin_lock() take the last idle stream spin_unlock() << preempted >> zram_read() spin_lock() no idle streams spin_unlock() schedule() resuming zram_write compression() but it took me some time to realize that, and it took even longer to evolve zram and to make it ready for crypto API. The key turned out to be -- drop the idle streams list entirely. Without the idle streams list we are free to use compression algorithms that require compression stream for decompression (read), because streams are now placed in per-cpu data and each write path has to disable preemption for compression op, almost completely eliminating the aforementioned case (technically, we still have a small chance, because write path has a fast and a slow paths and the slow path is executed with the preemption enabled; but the frequency of failed fast path is too low). TEST ==== - 4 CPUs, x86_64 system - 3G zram, lzo - fio tests: read, randread, write, randwrite, rw, randrw test script [3] command: ZRAM_SIZE=3G LOG_SUFFIX=XXXX FIO_LOOPS=5 ./zram-fio-test.sh BASE PATCHED jobs1 READ: 2527.2MB/s 2482.7MB/s READ: 2102.7MB/s 2045.0MB/s WRITE: 1284.3MB/s 1324.3MB/s WRITE: 1080.7MB/s 1101.9MB/s READ: 430125KB/s 437498KB/s WRITE: 430538KB/s 437919KB/s READ: 399593KB/s 403987KB/s WRITE: 399910KB/s 404308KB/s jobs2 READ: 8133.5MB/s 7854.8MB/s READ: 7086.6MB/s 6912.8MB/s WRITE: 3177.2MB/s 3298.3MB/s WRITE: 2810.2MB/s 2871.4MB/s READ: 1017.6MB/s 1023.4MB/s WRITE: 1018.2MB/s 1023.1MB/s READ: 977836KB/s 984205KB/s WRITE: 979435KB/s 985814KB/s jobs3 READ: 13557MB/s 13391MB/s READ: 11876MB/s 11752MB/s WRITE: 4641.5MB/s 4682.1MB/s WRITE: 4164.9MB/s 4179.3MB/s READ: 1453.8MB/s 1455.1MB/s WRITE: 1455.1MB/s 1458.2MB/s READ: 1387.7MB/s 1395.7MB/s WRITE: 1386.1MB/s 1394.9MB/s jobs4 READ: 20271MB/s 20078MB/s READ: 18033MB/s 17928MB/s WRITE: 6176.8MB/s 6180.5MB/s WRITE: 5686.3MB/s 5705.3MB/s READ: 2009.4MB/s 2006.7MB/s WRITE: 2007.5MB/s 2004.9MB/s READ: 1929.7MB/s 1935.6MB/s WRITE: 1926.8MB/s 1932.6MB/s jobs5 READ: 18823MB/s 19024MB/s READ: 18968MB/s 19071MB/s WRITE: 6191.6MB/s 6372.1MB/s WRITE: 5818.7MB/s 5787.1MB/s READ: 2011.7MB/s 1981.3MB/s WRITE: 2011.4MB/s 1980.1MB/s READ: 1949.3MB/s 1935.7MB/s WRITE: 1940.4MB/s 1926.1MB/s jobs6 READ: 21870MB/s 21715MB/s READ: 19957MB/s 19879MB/s WRITE: 6528.4MB/s 6537.6MB/s WRITE: 6098.9MB/s 6073.6MB/s READ: 2048.6MB/s 2049.9MB/s WRITE: 2041.7MB/s 2042.9MB/s READ: 2013.4MB/s 1990.4MB/s WRITE: 2009.4MB/s 1986.5MB/s jobs7 READ: 21359MB/s 21124MB/s READ: 19746MB/s 19293MB/s WRITE: 6660.4MB/s 6518.8MB/s WRITE: 6211.6MB/s 6193.1MB/s READ: 2089.7MB/s 2080.6MB/s WRITE: 2085.8MB/s 2076.5MB/s READ: 2041.2MB/s 2052.5MB/s WRITE: 2037.5MB/s 2048.8MB/s jobs8 READ: 20477MB/s 19974MB/s READ: 18922MB/s 18576MB/s WRITE: 6851.9MB/s 6788.3MB/s WRITE: 6407.7MB/s 6347.5MB/s READ: 2134.8MB/s 2136.1MB/s WRITE: 2132.8MB/s 2134.4MB/s READ: 2074.2MB/s 2069.6MB/s WRITE: 2087.3MB/s 2082.4MB/s jobs9 READ: 19797MB/s 19994MB/s READ: 18806MB/s 18581MB/s WRITE: 6878.7MB/s 6822.7MB/s WRITE: 6456.8MB/s 6447.2MB/s READ: 2141.1MB/s 2154.7MB/s WRITE: 2144.4MB/s 2157.3MB/s READ: 2084.1MB/s 2085.1MB/s WRITE: 2091.5MB/s 2092.5MB/s jobs10 READ: 19794MB/s 19784MB/s READ: 18794MB/s 18745MB/s WRITE: 6984.4MB/s 6676.3MB/s WRITE: 6532.3MB/s 6342.7MB/s READ: 2150.6MB/s 2155.4MB/s WRITE: 2156.8MB/s 2161.5MB/s READ: 2106.4MB/s 2095.6MB/s WRITE: 2109.7MB/s 2098.4MB/s BASE PATCHED jobs1 perfstat stalled-cycles-frontend 102,480,595,419 ( 41.53%) 114,508,864,804 ( 46.92%) stalled-cycles-backend 51,941,417,832 ( 21.05%) 46,836,112,388 ( 19.19%) instructions 283,612,054,215 ( 1.15) 283,918,134,959 ( 1.16) branches 56,372,560,385 ( 724.923) 56,449,814,753 ( 733.766) branch-misses 374,826,000 ( 0.66%) 326,935,859 ( 0.58%) jobs2 perfstat stalled-cycles-frontend 155,142,745,777 ( 40.99%) 164,170,979,198 ( 43.82%) stalled-cycles-backend 70,813,866,387 ( 18.71%) 66,456,858,165 ( 17.74%) instructions 463,436,648,173 ( 1.22) 464,221,890,191 ( 1.24) branches 91,088,733,902 ( 760.088) 91,278,144,546 ( 769.133) branch-misses 504,460,363 ( 0.55%) 394,033,842 ( 0.43%) jobs3 perfstat stalled-cycles-frontend 201,300,397,212 ( 39.84%) 223,969,902,257 ( 44.44%) stalled-cycles-backend 87,712,593,974 ( 17.36%) 81,618,888,712 ( 16.19%) instructions 642,869,545,023 ( 1.27) 644,677,354,132 ( 1.28) branches 125,724,560,594 ( 690.682) 126,133,159,521 ( 694.542) branch-misses 527,941,798 ( 0.42%) 444,782,220 ( 0.35%) jobs4 perfstat stalled-cycles-frontend 246,701,197,429 ( 38.12%) 280,076,030,886 ( 43.29%) stalled-cycles-backend 119,050,341,112 ( 18.40%) 110,955,641,671 ( 17.15%) instructions 822,716,962,127 ( 1.27) 825,536,969,320 ( 1.28) branches 160,590,028,545 ( 688.614) 161,152,996,915 ( 691.068) branch-misses 650,295,287 ( 0.40%) 550,229,113 ( 0.34%) jobs5 perfstat stalled-cycles-frontend 298,958,462,516 ( 38.30%) 344,852,200,358 ( 44.16%) stalled-cycles-backend 137,558,742,122 ( 17.62%) 129,465,067,102 ( 16.58%) instructions 1,005,714,688,752 ( 1.29) 1,007,657,999,432 ( 1.29) branches 195,988,773,962 ( 697.730) 196,446,873,984 ( 700.319) branch-misses 695,818,940 ( 0.36%) 624,823,263 ( 0.32%) jobs6 perfstat stalled-cycles-frontend 334,497,602,856 ( 36.71%) 387,590,419,779 ( 42.38%) stalled-cycles-backend 163,539,365,335 ( 17.95%) 152,640,193,639 ( 16.69%) instructions 1,184,738,177,851 ( 1.30) 1,187,396,281,677 ( 1.30) branches 230,592,915,640 ( 702.902) 231,253,802,882 ( 702.356) branch-misses 747,934,786 ( 0.32%) 643,902,424 ( 0.28%) jobs7 perfstat stalled-cycles-frontend 396,724,684,187 ( 37.71%) 460,705,858,952 ( 43.84%) stalled-cycles-backend 188,096,616,496 ( 17.88%) 175,785,787,036 ( 16.73%) instructions 1,364,041,136,608 ( 1.30) 1,366,689,075,112 ( 1.30) branches 265,253,096,936 ( 700.078) 265,890,524,883 ( 702.839) branch-misses 784,991,589 ( 0.30%) 729,196,689 ( 0.27%) jobs8 perfstat stalled-cycles-frontend 440,248,299,870 ( 36.92%) 509,554,793,816 ( 42.46%) stalled-cycles-backend 222,575,930,616 ( 18.67%) 213,401,248,432 ( 17.78%) instructions 1,542,262,045,114 ( 1.29) 1,545,233,932,257 ( 1.29) branches 299,775,178,439 ( 697.666) 300,528,458,505 ( 694.769) branch-misses 847,496,084 ( 0.28%) 748,794,308 ( 0.25%) jobs9 perfstat stalled-cycles-frontend 506,269,882,480 ( 37.86%) 592,798,032,820 ( 44.43%) stalled-cycles-backend 253,192,498,861 ( 18.93%) 233,727,666,185 ( 17.52%) instructions 1,721,985,080,913 ( 1.29) 1,724,666,236,005 ( 1.29) branches 334,517,360,255 ( 694.134) 335,199,758,164 ( 697.131) branch-misses 873,496,730 ( 0.26%) 815,379,236 ( 0.24%) jobs10 perfstat stalled-cycles-frontend 549,063,363,749 ( 37.18%) 651,302,376,662 ( 43.61%) stalled-cycles-backend 281,680,986,810 ( 19.07%) 277,005,235,582 ( 18.55%) instructions 1,901,859,271,180 ( 1.29) 1,906,311,064,230 ( 1.28) branches 369,398,536,153 ( 694.004) 370,527,696,358 ( 688.409) branch-misses 967,929,335 ( 0.26%) 890,125,056 ( 0.24%) BASE PATCHED seconds elapsed 79.421641008 78.735285546 seconds elapsed 61.471246133 60.869085949 seconds elapsed 62.317058173 62.224188495 seconds elapsed 60.030739363 60.081102518 seconds elapsed 74.070398362 74.317582865 seconds elapsed 84.985953007 85.414364176 seconds elapsed 97.724553255 98.173311344 seconds elapsed 109.488066758 110.268399318 seconds elapsed 122.768189405 122.967164498 seconds elapsed 135.130035105 136.934770801 On my other system (8 x86_64 CPUs, short version of test results): BASE PATCHED seconds elapsed 19.518065994 19.806320662 seconds elapsed 15.172772749 15.594718291 seconds elapsed 13.820925970 13.821708564 seconds elapsed 13.293097816 14.585206405 seconds elapsed 16.207284118 16.064431606 seconds elapsed 17.958376158 17.771825767 seconds elapsed 19.478009164 19.602961508 seconds elapsed 21.347152811 21.352318709 seconds elapsed 24.478121126 24.171088735 seconds elapsed 26.865057442 26.767327618 So performance-wise the numbers are quite similar. Also update zcomp interface to be more aligned with the crypto API. [1] http://marc.info/?l=linux-kernel&m=144480832108927&w=2 [2] http://marc.info/?l=linux-kernel&m=145379613507518&w=2 [3] https://github.com/sergey-senozhatsky/zram-perf-test Link: http://lkml.kernel.org/r/20160531122017.2878-3-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Suggested-by: Minchan Kim <minchan@kernel.org> Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: rename zstrm find-release functionsSergey Senozhatsky2017-09-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has started as a 'add zlib support' work, but after some thinking I saw no blockers for a bigger change -- a switch to crypto API. We don't have an idle zstreams list anymore and our write path now works absolutely differently, preventing preemption during compression. This removes possibilities of read paths preempting writes at wrong places and opens the door for a move from custom LZO/LZ4 compression backends implementation to a more generic one, using crypto compress API. This patch set also eliminates the need of a new context-less crypto API interface, which was quite hard to sell, so we can move along faster. benchmarks: (x86_64, 4GB, zram-perf script) perf reported run-time fio (max jobs=3). I performed fio test with the increasing number of parallel jobs (max to 3) on a 3G zram device, using `static' data and the following crypto comp algorithms: 842, deflate, lz4, lz4hc, lzo the output was: - test running time (which can tell us what algorithms performs faster) and - zram mm_stat (which tells the compressed memory size, max used memory, etc). It's just for information. for example, LZ4HC has twice the running time of LZO, but the compressed memory size is: 23592960 vs 34603008 bytes. test-fio-zram-842 197.907655282 seconds time elapsed 201.623142884 seconds time elapsed 226.854291345 seconds time elapsed test-fio-zram-DEFLATE 253.259516155 seconds time elapsed 258.148563401 seconds time elapsed 290.251909365 seconds time elapsed test-fio-zram-LZ4 27.022598717 seconds time elapsed 29.580522717 seconds time elapsed 33.293463430 seconds time elapsed test-fio-zram-LZ4HC 56.393954615 seconds time elapsed 74.904659747 seconds time elapsed 101.940998564 seconds time elapsed test-fio-zram-LZO 28.155948075 seconds time elapsed 30.390036330 seconds time elapsed 34.455773159 seconds time elapsed zram mm_stat-s (max fio jobs=3) test-fio-zram-842 mm_stat (jobs1): 3221225472 673185792 690266112 0 690266112 0 0 mm_stat (jobs2): 3221225472 673185792 690266112 0 690266112 0 0 mm_stat (jobs3): 3221225472 673185792 690266112 0 690266112 0 0 test-fio-zram-DEFLATE mm_stat (jobs1): 3221225472 24379392 37761024 0 37761024 0 0 mm_stat (jobs2): 3221225472 24379392 37761024 0 37761024 0 0 mm_stat (jobs3): 3221225472 24379392 37761024 0 37761024 0 0 test-fio-zram-LZ4 mm_stat (jobs1): 3221225472 23592960 37761024 0 37761024 0 0 mm_stat (jobs2): 3221225472 23592960 37761024 0 37761024 0 0 mm_stat (jobs3): 3221225472 23592960 37761024 0 37761024 0 0 test-fio-zram-LZ4HC mm_stat (jobs1): 3221225472 23592960 37761024 0 37761024 0 0 mm_stat (jobs2): 3221225472 23592960 37761024 0 37761024 0 0 mm_stat (jobs3): 3221225472 23592960 37761024 0 37761024 0 0 test-fio-zram-LZO mm_stat (jobs1): 3221225472 34603008 50335744 0 50335744 0 0 mm_stat (jobs2): 3221225472 34603008 50335744 0 50335744 0 0 mm_stat (jobs3): 3221225472 34603008 50335744 0 50339840 0 0 This patch (of 8): We don't perform any zstream idle list lookup anymore, so zcomp_strm_find()/zcomp_strm_release() names are not representative. Rename to zcomp_stream_get()/zcomp_stream_put(). Link: http://lkml.kernel.org/r/20160531122017.2878-2-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: user per-cpu compression streamsSergey Senozhatsky2017-09-251-11/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove idle streams list and keep compression streams in per-cpu data. This removes two contented spin_lock()/spin_unlock() calls from write path and also prevent write OP from being preempted while holding the compression stream, which can cause slow downs. For instance, let's assume that we have N cpus and N-2 max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come in with the write requests: TASK1 TASK2 TASK3 zram_bvec_write() spin_lock find stream spin_unlock compress <<preempted>> zram_bvec_write() spin_lock find stream spin_unlock no_stream schedule zram_bvec_write() spin_lock find_stream spin_unlock no_stream schedule spin_lock release stream spin_unlock wake up TASK2 not only TASK2 and TASK3 will not get the stream, TASK1 will be preempted in the middle of its operation; while we would prefer it to finish compression and release the stream. Test environment: x86_64, 4 CPU box, 3G zram, lzo The following fio tests were executed: read, randread, write, randwrite, rw, randrw with the increasing number of jobs from 1 to 10. 4 streams 8 streams per-cpu =========================================================== jobs1 READ: 2520.1MB/s 2566.5MB/s 2491.5MB/s READ: 2102.7MB/s 2104.2MB/s 2091.3MB/s WRITE: 1355.1MB/s 1320.2MB/s 1378.9MB/s WRITE: 1103.5MB/s 1097.2MB/s 1122.5MB/s READ: 434013KB/s 435153KB/s 439961KB/s WRITE: 433969KB/s 435109KB/s 439917KB/s READ: 403166KB/s 405139KB/s 403373KB/s WRITE: 403223KB/s 405197KB/s 403430KB/s jobs2 READ: 7958.6MB/s 8105.6MB/s 8073.7MB/s READ: 6864.9MB/s 6989.8MB/s 7021.8MB/s WRITE: 2438.1MB/s 2346.9MB/s 3400.2MB/s WRITE: 1994.2MB/s 1990.3MB/s 2941.2MB/s READ: 981504KB/s 973906KB/s 1018.8MB/s WRITE: 981659KB/s 974060KB/s 1018.1MB/s READ: 937021KB/s 938976KB/s 987250KB/s WRITE: 934878KB/s 936830KB/s 984993KB/s jobs3 READ: 13280MB/s 13553MB/s 13553MB/s READ: 11534MB/s 11785MB/s 11755MB/s WRITE: 3456.9MB/s 3469.9MB/s 4810.3MB/s WRITE: 3029.6MB/s 3031.6MB/s 4264.8MB/s READ: 1363.8MB/s 1362.6MB/s 1448.9MB/s WRITE: 1361.9MB/s 1360.7MB/s 1446.9MB/s READ: 1309.4MB/s 1310.6MB/s 1397.5MB/s WRITE: 1307.4MB/s 1308.5MB/s 1395.3MB/s jobs4 READ: 20244MB/s 20177MB/s 20344MB/s READ: 17886MB/s 17913MB/s 17835MB/s WRITE: 4071.6MB/s 4046.1MB/s 6370.2MB/s WRITE: 3608.9MB/s 3576.3MB/s 5785.4MB/s READ: 1824.3MB/s 1821.6MB/s 1997.5MB/s WRITE: 1819.8MB/s 1817.4MB/s 1992.5MB/s READ: 1765.7MB/s 1768.3MB/s 1937.3MB/s WRITE: 1767.5MB/s 1769.1MB/s 1939.2MB/s jobs5 READ: 18663MB/s 18986MB/s 18823MB/s READ: 16659MB/s 16605MB/s 16954MB/s WRITE: 3912.4MB/s 3888.7MB/s 6126.9MB/s WRITE: 3506.4MB/s 3442.5MB/s 5519.3MB/s READ: 1798.2MB/s 1746.5MB/s 1935.8MB/s WRITE: 1792.7MB/s 1740.7MB/s 1929.1MB/s READ: 1727.6MB/s 1658.2MB/s 1917.3MB/s WRITE: 1726.5MB/s 1657.2MB/s 1916.6MB/s jobs6 READ: 21017MB/s 20922MB/s 21162MB/s READ: 19022MB/s 19140MB/s 18770MB/s WRITE: 3968.2MB/s 4037.7MB/s 6620.8MB/s WRITE: 3643.5MB/s 3590.2MB/s 6027.5MB/s READ: 1871.8MB/s 1880.5MB/s 2049.9MB/s WRITE: 1867.8MB/s 1877.2MB/s 2046.2MB/s READ: 1755.8MB/s 1710.3MB/s 1964.7MB/s WRITE: 1750.5MB/s 1705.9MB/s 1958.8MB/s jobs7 READ: 21103MB/s 20677MB/s 21482MB/s READ: 18522MB/s 18379MB/s 19443MB/s WRITE: 4022.5MB/s 4067.4MB/s 6755.9MB/s WRITE: 3691.7MB/s 3695.5MB/s 5925.6MB/s READ: 1841.5MB/s 1933.9MB/s 2090.5MB/s WRITE: 1842.7MB/s 1935.3MB/s 2091.9MB/s READ: 1832.4MB/s 1856.4MB/s 1971.5MB/s WRITE: 1822.3MB/s 1846.2MB/s 1960.6MB/s jobs8 READ: 20463MB/s 20194MB/s 20862MB/s READ: 18178MB/s 17978MB/s 18299MB/s WRITE: 4085.9MB/s 4060.2MB/s 7023.8MB/s WRITE: 3776.3MB/s 3737.9MB/s 6278.2MB/s READ: 1957.6MB/s 1944.4MB/s 2109.5MB/s WRITE: 1959.2MB/s 1946.2MB/s 2111.4MB/s READ: 1900.6MB/s 1885.7MB/s 2082.1MB/s WRITE: 1896.2MB/s 1881.4MB/s 2078.3MB/s jobs9 READ: 19692MB/s 19734MB/s 19334MB/s READ: 17678MB/s 18249MB/s 17666MB/s WRITE: 4004.7MB/s 4064.8MB/s 6990.7MB/s WRITE: 3724.7MB/s 3772.1MB/s 6193.6MB/s READ: 1953.7MB/s 1967.3MB/s 2105.6MB/s WRITE: 1953.4MB/s 1966.7MB/s 2104.1MB/s READ: 1860.4MB/s 1897.4MB/s 2068.5MB/s WRITE: 1858.9MB/s 1895.9MB/s 2066.8MB/s jobs10 READ: 19730MB/s 19579MB/s 19492MB/s READ: 18028MB/s 18018MB/s 18221MB/s WRITE: 4027.3MB/s 4090.6MB/s 7020.1MB/s WRITE: 3810.5MB/s 3846.8MB/s 6426.8MB/s READ: 1956.1MB/s 1994.6MB/s 2145.2MB/s WRITE: 1955.9MB/s 1993.5MB/s 2144.8MB/s READ: 1852.8MB/s 1911.6MB/s 2075.8MB/s WRITE: 1855.7MB/s 1914.6MB/s 2078.1MB/s perf stat 4 streams 8 streams per-cpu ==================================================================================================================== jobs1 stalled-cycles-frontend 23,174,811,209 ( 38.21%) 23,220,254,188 ( 38.25%) 23,061,406,918 ( 38.34%) stalled-cycles-backend 11,514,174,638 ( 18.98%) 11,696,722,657 ( 19.27%) 11,370,852,810 ( 18.90%) instructions 73,925,005,782 ( 1.22) 73,903,177,632 ( 1.22) 73,507,201,037 ( 1.22) branches 14,455,124,835 ( 756.063) 14,455,184,779 ( 755.281) 14,378,599,509 ( 758.546) branch-misses 69,801,336 ( 0.48%) 80,225,529 ( 0.55%) 72,044,726 ( 0.50%) jobs2 stalled-cycles-frontend 49,912,741,782 ( 46.11%) 50,101,189,290 ( 45.95%) 32,874,195,633 ( 35.11%) stalled-cycles-backend 27,080,366,230 ( 25.02%) 27,949,970,232 ( 25.63%) 16,461,222,706 ( 17.58%) instructions 122,831,629,690 ( 1.13) 122,919,846,419 ( 1.13) 121,924,786,775 ( 1.30) branches 23,725,889,239 ( 692.663) 23,733,547,140 ( 688.062) 23,553,950,311 ( 794.794) branch-misses 90,733,041 ( 0.38%) 96,320,895 ( 0.41%) 84,561,092 ( 0.36%) jobs3 stalled-cycles-frontend 66,437,834,608 ( 45.58%) 63,534,923,344 ( 43.69%) 42,101,478,505 ( 33.19%) stalled-cycles-backend 34,940,799,661 ( 23.97%) 34,774,043,148 ( 23.91%) 21,163,324,388 ( 16.68%) instructions 171,692,121,862 ( 1.18) 171,775,373,044 ( 1.18) 170,353,542,261 ( 1.34) branches 32,968,962,622 ( 628.723) 32,987,739,894 ( 630.512) 32,729,463,918 ( 717.027) branch-misses 111,522,732 ( 0.34%) 110,472,894 ( 0.33%) 99,791,291 ( 0.30%) jobs4 stalled-cycles-frontend 98,741,701,675 ( 49.72%) 94,797,349,965 ( 47.59%) 54,535,655,381 ( 33.53%) stalled-cycles-backend 54,642,609,615 ( 27.51%) 55,233,554,408 ( 27.73%) 27,882,323,541 ( 17.14%) instructions 220,884,807,851 ( 1.11) 220,930,887,273 ( 1.11) 218,926,845,851 ( 1.35) branches 42,354,518,180 ( 592.105) 42,362,770,587 ( 590.452) 41,955,552,870 ( 716.154) branch-misses 138,093,449 ( 0.33%) 131,295,286 ( 0.31%) 121,794,771 ( 0.29%) jobs5 stalled-cycles-frontend 116,219,747,212 ( 48.14%) 110,310,397,012 ( 46.29%) 66,373,082,723 ( 33.70%) stalled-cycles-backend 66,325,434,776 ( 27.48%) 64,157,087,914 ( 26.92%) 32,999,097,299 ( 16.76%) instructions 270,615,008,466 ( 1.12) 270,546,409,525 ( 1.14) 268,439,910,948 ( 1.36) branches 51,834,046,557 ( 599.108) 51,811,867,722 ( 608.883) 51,412,576,077 ( 729.213) branch-misses 158,197,086 ( 0.31%) 142,639,805 ( 0.28%) 133,425,455 ( 0.26%) jobs6 stalled-cycles-frontend 138,009,414,492 ( 48.23%) 139,063,571,254 ( 48.80%) 75,278,568,278 ( 32.80%) stalled-cycles-backend 79,211,949,650 ( 27.68%) 79,077,241,028 ( 27.75%) 37,735,797,899 ( 16.44%) instructions 319,763,993,731 ( 1.12) 319,937,782,834 ( 1.12) 316,663,600,784 ( 1.38) branches 61,219,433,294 ( 595.056) 61,250,355,540 ( 598.215) 60,523,446,617 ( 733.706) branch-misses 169,257,123 ( 0.28%) 154,898,028 ( 0.25%) 141,180,587 ( 0.23%) jobs7 stalled-cycles-frontend 162,974,812,119 ( 49.20%) 159,290,061,987 ( 48.43%) 88,046,641,169 ( 33.21%) stalled-cycles-backend 92,223,151,661 ( 27.84%) 91,667,904,406 ( 27.87%) 44,068,454,971 ( 16.62%) instructions 369,516,432,430 ( 1.12) 369,361,799,063 ( 1.12) 365,290,380,661 ( 1.38) branches 70,795,673,950 ( 594.220) 70,743,136,124 ( 597.876) 69,803,996,038 ( 732.822) branch-misses 181,708,327 ( 0.26%) 165,767,821 ( 0.23%) 150,109,797 ( 0.22%) jobs8 stalled-cycles-frontend 185,000,017,027 ( 49.30%) 182,334,345,473 ( 48.37%) 99,980,147,041 ( 33.26%) stalled-cycles-backend 105,753,516,186 ( 28.18%) 107,937,830,322 ( 28.63%) 51,404,177,181 ( 17.10%) instructions 418,153,161,055 ( 1.11) 418,308,565,828 ( 1.11) 413,653,475,581 ( 1.38) branches 80,035,882,398 ( 592.296) 80,063,204,510 ( 589.843) 79,024,105,589 ( 730.530) branch-misses 199,764,528 ( 0.25%) 177,936,926 ( 0.22%) 160,525,449 ( 0.20%) jobs9 stalled-cycles-frontend 210,941,799,094 ( 49.63%) 204,714,679,254 ( 48.55%) 114,251,113,756 ( 33.96%) stalled-cycles-backend 122,640,849,067 ( 28.85%) 122,188,553,256 ( 28.98%) 58,360,041,127 ( 17.35%) instructions 468,151,025,415 ( 1.10) 467,354,869,323 ( 1.11) 462,665,165,216 ( 1.38) branches 89,657,067,510 ( 585.628) 89,411,550,407 ( 588.990) 88,360,523,943 ( 730.151) branch-misses 218,292,301 ( 0.24%) 191,701,247 ( 0.21%) 178,535,678 ( 0.20%) jobs10 stalled-cycles-frontend 233,595,958,008 ( 49.81%) 227,540,615,689 ( 49.11%) 160,341,979,938 ( 43.07%) stalled-cycles-backend 136,153,676,021 ( 29.03%) 133,635,240,742 ( 28.84%) 65,909,135,465 ( 17.70%) instructions 517,001,168,497 ( 1.10) 516,210,976,158 ( 1.11) 511,374,038,613 ( 1.37) branches 98,911,641,329 ( 585.796) 98,700,069,712 ( 591.583) 97,646,761,028 ( 728.712) branch-misses 232,341,823 ( 0.23%) 199,256,308 ( 0.20%) 183,135,268 ( 0.19%) per-cpu streams tend to cause significantly less stalled cycles; execute less branches and hit less branch-misses. perf stat reported execution time 4 streams 8 streams per-cpu ==================================================================== jobs1 seconds elapsed 20.909073870 20.875670495 20.817838540 jobs2 seconds elapsed 18.529488399 18.720566469 16.356103108 jobs3 seconds elapsed 18.991159531 18.991340812 16.766216066 jobs4 seconds elapsed 19.560643828 19.551323547 16.246621715 jobs5 seconds elapsed 24.746498464 25.221646740 20.696112444 jobs6 seconds elapsed 28.258181828 28.289765505 22.885688857 jobs7 seconds elapsed 32.632490241 31.909125381 26.272753738 jobs8 seconds elapsed 35.651403851 36.027596308 29.108024711 jobs9 seconds elapsed 40.569362365 40.024227989 32.898204012 jobs10 seconds elapsed 44.673112304 43.874898137 35.632952191 Please see Link: http://marc.info/?l=linux-kernel&m=146166970727530 Link: http://marc.info/?l=linux-kernel&m=146174716719650 for more test results (under low memory conditions). Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Suggested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/zpool: add name argument to create zpoolGanesh Mahendran2017-09-251-2/+3
| | | | | | | | | | | | | | | | | | | | | | | Currently the underlay of zpool: zsmalloc/zbud, do not know who creates them. There is not a method to let zsmalloc/zbud find which caller they belong to. Now we want to add statistics collection in zsmalloc. We need to name the debugfs dir for each pool created. The way suggested by Minchan Kim is to use a name passed by caller(such as zram) to create the zsmalloc pool. /sys/kernel/debug/zsmalloc/zram0 This patch adds an argument `name' to zs_create_pool() and other related functions. Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Seth Jennings <sjennings@variantweb.net> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Dan Streetman <ddstreet@ieee.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zbud: add to mm/Seth Jennings2017-09-251-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zbud is an special purpose allocator for storing compressed pages. It is designed to store up to two compressed pages per physical page. While this design limits storage density, it has simple and deterministic reclaim properties that make it preferable to a higher density approach when reclaim will be used. zbud works by storing compressed pages, or "zpages", together in pairs in a single memory page called a "zbud page". The first buddy is "left justifed" at the beginning of the zbud page, and the last buddy is "right justified" at the end of the zbud page. The benefit is that if either buddy is freed, the freed buddy space, coalesced with whatever slack space that existed between the buddies, results in the largest possible free region within the zbud page. zbud also provides an attractive lower bound on density. The ratio of zpages to zbud pages can not be less than 1. This ensures that zbud can never "do harm" by using more pages to store zpages than the uncompressed zpages would have used on their own. This implementation is a rewrite of the zbud allocator internally used by zcache in the driver/staging tree. The rewrite was necessary to remove some of the zcache specific elements that were ingrained throughout and provide a generic allocation interface that can later be used by zsmalloc and others. This patch adds zbud to mm/ for later use by zswap. Change-Id: I5120b1acd22f15c5dc3d2a0e6f1a34a73f97be3a Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Dan Magenheimer <dan.magenheimer@oracle.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Jenifer Hopper <jhopper@us.ibm.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Dave Hansen <dave@sr71.net> Cc: Joe Perches <joe@perches.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Cody P Schafer <cody@linux.vnet.ibm.com> Cc: Hugh Dickens <hughd@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Bob Liu <bob.liu@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 4e2e2770b1529edc5849c86b29a6febe27e2f083 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git [venkatg@codeaurora.org: keep msm-3.10 changes, add zbud cfg] Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>