aboutsummaryrefslogtreecommitdiff
path: root/include/linux
Commit message (Collapse)AuthorAgeFilesLines
* sched/cpuset/pm: Fix cpuset vs. suspend-resume bugsPeter Zijlstra2019-05-021-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 50e76632339d4655859523a39249dd95ee5e93e7 upstream. Cpusets vs. suspend-resume is _completely_ broken. And it got noticed because it now resulted in non-cpuset usage breaking too. On suspend cpuset_cpu_inactive() doesn't call into cpuset_update_active_cpus() because it doesn't want to move tasks about, there is no need, all tasks are frozen and won't run again until after we've resumed everything. But this means that when we finally do call into cpuset_update_active_cpus() after resuming the last frozen cpu in cpuset_cpu_active(), the top_cpuset will not have any difference with the cpu_active_mask and this it will not in fact do _anything_. So the cpuset configuration will not be restored. This was largely hidden because we would unconditionally create identity domains and mobile users would not in fact use cpusets much. And servers what do use cpusets tend to not suspend-resume much. An addition problem is that we'd not in fact wait for the cpuset work to finish before resuming the tasks, allowing spurious migrations outside of the specified domains. Fix the rebuild by introducing cpuset_force_rebuild() and fix the ordering with cpuset_wait_for_hotplug(). Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: deb7aa308ea2 ("cpuset: reorganize CPU / memory hotplug handling") Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: Ia40ffcf49507af1d5493d7e534e9433ca346db02
* nospec: Include <asm/barrier.h> dependencyDan Williams2019-05-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The nospec.h header expects the per-architecture header file <asm/barrier.h> to optionally define array_index_mask_nospec(). Include that dependency to prevent inadvertent fallback to the default array_index_mask_nospec() implementation. The default implementation may not provide a full mitigation on architectures that perform data value speculation. Change-Id: Ib3bb49179541ca4f187f7ec6b91603a58db358f7 Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/151881605404.17395.1341935530792574707.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* nospec: Allow index argument to have const-qualified typeRasmus Villemoes2019-05-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last expression in a statement expression need not be a bare variable, quoting gcc docs The last thing in the compound statement should be an expression followed by a semicolon; the value of this subexpression serves as the value of the entire construct. and we already use that in e.g. the min/max macros which end with a ternary expression. This way, we can allow index to have const-qualified type, which will in some cases avoid the need for introducing a local copy of index of non-const qualified type. That, in turn, can prevent readers not familiar with the internals of array_index_nospec from wondering about the seemingly redundant extra variable, and I think that's worthwhile considering how confusing the whole _nospec business is. The expression _i&_mask has type unsigned long (since that is the type of _mask, and the BUILD_BUG_ONs guarantee that _i will get promoted to that), so in order not to change the type of the whole expression, add a cast back to typeof(_i). Change-Id: I1fcaea6268a40e10c67d58e906d757b5bce7e8d0 Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/151881604837.17395.10812767547837568328.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* nospec: Kill array_index_nospec_mask_check()Dan Williams2019-05-021-21/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are multiple problems with the dynamic sanity checking in array_index_nospec_mask_check(): * It causes unnecessary overhead in the 32-bit case since integer sized @index values will no longer cause the check to be compiled away like in the 64-bit case. * In the 32-bit case it may trigger with user controllable input when the expectation is that should only trigger during development of new kernel enabling. * The macro reuses the input parameter in multiple locations which is broken if someone passes an expression like 'index++' to array_index_nospec(). Change-Id: I845dbcb830b4d72cf7f0b835bfa64476d3763268 Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/151881604278.17395.6605847763178076520.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* nospec: Move array_index_nospec() parameter checking into separate macroWill Deacon2019-05-021-15/+21
| | | | | | | | | | | | | | | | | | | For architectures providing their own implementation of array_index_mask_nospec() in asm/barrier.h, attempting to use WARN_ONCE() to complain about out-of-range parameters using WARN_ON() results in a mess of mutually-dependent include files. Rather than unpick the dependencies, simply have the core code in nospec.h perform the checking for us. Change-Id: I36857abc8226fcdfd9c9014e3ffee17f9dc2cbca Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1517840166-15399-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* array_index_nospec: Sanitize speculative array de-referencesDan Williams2019-05-021-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | array_index_nospec() is proposed as a generic mechanism to mitigate against Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks via speculative execution. The array_index_nospec() implementation is expected to be safe for current generation CPUs across multiple architectures (ARM, x86). Based on an original implementation by Linus Torvalds, tweaked to remove speculative flows by Alexei Starovoitov, and tweaked again by Linus to introduce an x86 assembly implementation for the mask generation. Change-Id: Id28afe5e117369ab52a9400c2cb7d639630f3a08 Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Co-developed-by: Alexei Starovoitov <ast@kernel.org> Suggested-by: Cyril Novikov <cnovikov@lynx.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: gregkh@linuxfoundation.org Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727414229.33451.18411580953862676575.stgit@dwillia2-desk3.amr.corp.intel.com
* Make file credentials available to the seqfile interfacesLinus Torvalds2019-05-021-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 34dbbcdbf63360661ff7bda6c5f52f99ac515f92 upstream. A lot of seqfile users seem to be using things like %pK that uses the credentials of the current process, but that is actually completely wrong for filesystem interfaces. The unix semantics for permission checking files is to check permissions at _open_ time, not at read or write time, and that is not just a small detail: passing off stdin/stdout/stderr to a suid application and making the actual IO happen in privileged context is a classic exploit technique. So if we want to be able to look at permissions at read time, we need to use the file open credentials, not the current ones. Normal file accesses can just use "f_cred" (or any of the helper functions that do that, like file_ns_capable()), but the seqfile interfaces do not have any such options. It turns out that seq_file _does_ save away the user_ns information of the file, though. Since user_ns is just part of the full credential information, replace that special case with saving off the cred pointer instead, and suddenly seq_file has all the permission information it needs. Change-Id: Ic6fc339d315f6aa5774f7f5356747aebf47cb45f Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* allow the temp files created by open() to be linked toAl Viro2019-05-021-0/+1
| | | | | | | | | O_TMPFILE | O_CREAT => linkat() with AT_SYMLINK_FOLLOW and /proc/self/fd/<n> as oldpath (i.e. flink()) will create a link O_TMPFILE | O_CREAT | O_EXCL => ENOENT on attempt to link those guys Change-Id: I5e28485680c3320cd0fccc0ba1bea8b963fca7fe Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* it's still short a few helpers, but infrastructure should be OK now...Al Viro2019-05-022-0/+3
| | | | | Change-Id: I0adb8fe9c5029bad3ac52629003c3b78e9442936 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iio:st sensors: remove custom sampling frequence attribute in favour of core ↵Jonathan Cameron2018-12-111-11/+1
| | | | | | | | | | | | | support. This allows in kernel client drivers to access this Signed-off-by: Jonathan Cameron <jic23@kernel.org> Cc: Denis Ciocca <denis.ciocca@st.com> Reviewed-by: Hartmut Knaack <knaack.h@gmx.de> Signed-off-by: Moyster <oysterized@gmail.com> omitted pressure driver changes (missing in this kernel)
* iio: Added ST-sensors platform data to select the DRDY interrupt pinDenis CIOCCA2018-12-111-3/+11
| | | | | | | | | | | This patch add support to redirect the DRDY interrupt on INT1 or INT2 on accelerometer and pressure sensors. Signed-off-by: Denis Ciocca <denis.ciocca@st.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Moyster <oysterized@gmail.com> omitted pressure drivers changes (missing on this kernel)
* iio:common: Removed stuff macros, added num_data_channels on st_sensors ↵Denis CIOCCA2018-12-111-2/+2
| | | | | | | | | | | | | | struct and added support on one-shot sysfs reads to 3 byte channel This patch introduce num_data_channels variable on st_sensors struct to manage different type of channels (size or number) in st_sensors_get_buffer_element function. Removed ST_SENSORS_NUMBER_DATA_CHANNELS and ST_SENSORS_BYTE_FOR_CHANNEL and used struct iio_chan_spec const *ch to catch data. Added 3 byte channel data support on one-shot reads. Signed-off-by: Denis Ciocca <denis.ciocca@st.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
* iio:common: ST_SENSORS_LSM_CHANNELS macro changedDenis CIOCCA2018-12-111-11/+9
| | | | | Signed-off-by: Denis Ciocca <denis.ciocca@st.com> Signed-off-by: Jonathan Cameron <jic23@kernel.org>
* mm: vmpressure: dynamic window sizing.Martijn Coenen2018-12-111-2/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The window size used for calculating vm pressure events was previously fixed at 512 pages. The window size has a big impact on the rate of notifications sent off to userspace, in particular when using the "low" level. On machines with a lot of memory, the current value is likely excessive. On the other hand, if the window size is too big, we might delay memory pressure events for too long, especially at critical levels of memory pressure. This patch attempts to address that problem with two changes. The first change is to calculate the window size based on the machine size, quite similar to how the vm watermarks are being calculated. This reduces the chance of false positives at any pressure level. Using the machine size only makes sense on the root cgroup though; for non-root cgroups, their hard memory limit is used to calculate the window size. If no hard memory limit is set, we fall back to the default window size that was previously used. The second change is based on an idea from Johannes Weiner, to only report medium and low pressure levels for every X windows that we scan. This reduces the frequency with which we report low/medium pressure levels, but at the same time will still report critical memory pressure immediately. Change-Id: Ieffca055b2fb6aa27ae0179e0a588e6fcb173a61 Signed-off-by: Martijn Coenen <maco@google.com>
* mm, oom: make dump_tasks publicLiam Mark2018-12-011-0/+3
| | | | | | | | Allow other functions to dump the list of tasks. Useful for when debugging memory leaks. Change-Id: I76c33a118a9765b4c2276e8c76de36399c78dbf6 Signed-off-by: Liam Mark <lmark@codeaurora.org>
* mm: add CMA page allocationfire8552018-11-303-16/+10
| | | | needed for new ion driver
* gcc6: avoid declaration of unused static struct in cpufeature.hAndreas Fenkart2018-11-301-2/+13
| | | | | | | The lookup table is generated but not used if the driver including this header is not compiled as a module. Signed-off-by: Andreas Fenkart <andreas.fenkart@digitalstrom.com>
* blk: rq_data_dir() should not return a booleanLinus Torvalds2018-11-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 10fbd36e362a0f367e34a7cd876a81295d8fc5ca upstream. rq_data_dir() returns either READ or WRITE (0 == READ, 1 == WRITE), not a boolean value. Now, admittedly the "!= 0" doesn't really change the value (0 stays as zero, 1 stays as one), but it's not only redundant, it confuses gcc, and causes gcc to warn about the construct switch (rq_data_dir(req)) { case READ: ... case WRITE: ... that we have in a few drivers. Now, the gcc warning is silly and stupid (it seems to warn not about the switch value having a different type from the case statements, but about _any_ boolean switch value), but in this case the code itself is silly and stupid too, so let's just change it, and get rid of warnings like this: drivers/block/hd.c: In function ‘hd_request’: drivers/block/hd.c:630:11: warning: switch condition has boolean value [-Wswitch-bool] switch (rq_data_dir(req)) { The odd '!= 0' came in when "cmd_flags" got turned into a "u64" in commit 5953316dbf90 ("block: make rq->cmd_flags be 64-bit") and is presumably because the old code (that just did a logical 'and' with 1) would then end up making the type of rq_data_dir() be u64 too. But if we want to retain the old regular integer type, let's just cast the result to 'int' rather than use that rather odd '!= 0'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Revert "GCC: Fix up for gcc 5+"Moyster2018-11-301-3/+0
| | | | This reverts commit ff505baaf412985af758d5820cd620ed9f1a7e05.
* Replace <asm/uaccess.h> with <linux/uaccess.h> globallyLinus Torvalds2018-11-294-4/+4
| | | | | | | | | | | | | | This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Moyster <oysterized@gmail.com>
* net: core: move mac_pton() to lib/net_utils.cAndy Shevchenko2018-11-292-1/+2
| | | | | | | | | | | | | Since we have at least one user of this function outside of CONFIG_NET scope, we have to provide this function independently. The proposed solution is to move it under lib/net_utils.c with corresponding configuration variable and select wherever it is needed. Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com> Reported-by: Arnd Bergmann <arnd@arndb.de> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* include: remove redefine in fb.h (CONFIG_DMA_SHARED_BUFFER)Moyster2018-11-291-2/+0
|
* GCC: Fix up for gcc 5+mydongistiny2018-11-291-0/+3
| | | | | Signed-off-by: mydongistiny <jaysonedson@gmail.com> Signed-off-by: Mister Oyster <oysterized@gmail.com>
* add vmalloc alloc_size attributesDaniel Micay2018-05-161-10/+11
|
* add kmalloc alloc_size attributesDaniel Micay2018-05-161-3/+3
|
* add slub free list XOR encryptionDaniel Micay2018-05-161-0/+2
| | | | Based on the grsecurity feature, but with a per-cache random value.
* add page sanitization / verificationDaniel Micay2018-05-161-0/+22
|
* sched: replace INIT_COMPLETION with reinit_completionWolfram Sang2018-05-161-3/+15
| | | | | | | | | | | | | | | | | For the casual device driver writer, it is hard to remember when to use init_completion (to init a completion structure) or INIT_COMPLETION (to *reinit* a completion structure). Furthermore, while all other completion functions exepct a pointer as a parameter, INIT_COMPLETION does not. To make it easier to remember which function to use and to make code more readable, introduce a new inline function with the proper name and consistent argument type. Update the kernel-doc for init_completion while we are here. Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13) Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* etherdevice: Use ether_addr_copy to copy an Ethernet addressJoe Perches2018-05-161-1/+23
| | | | | | | | | | Some systems can use the normally known u16 alignment of Ethernet addresses to save some code/text bytes and cycles. This does not change currently emitted code on x86 by gcc 4.8. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ether_addr_equal: Optimize implementation, remove unused compare_ether_addrJoe Perches2018-05-161-33/+18
| | | | | | | | | | | | | | | | | | | Add a new check for CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to reduce the number of or's used in the ether_addr_equal comparison to very slightly improve function performance. Simplify the ether_addr_equal_64bits implementation. Integrate and remove the zap_last_2bytes helper as it's now used only once. Remove the now unused compare_ether_addr function. Update the unaligned-memory-access documentation to remove the compare_ether_addr description and show how unaligned accesses could occur with ether_addr_equal. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: etherdevice: add address inherit helperBjørn Mork2018-05-161-0/+15
| | | | | | | | | | Some etherdevices inherit their address from a parent or master device. The addr_assign_type should be updated along with the address in these cases. Adding a helper function to simplify this. Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* BACKPORT: netlink: add a start callback for starting a netlink dumpTom Herbert2018-04-131-0/+2
| | | | | | | | | | | | | | | | commit fc9e50f5a5a4e1fa9ba2756f745a13e693cf6a06 upstream. The start callback allows the caller to set up a context for the dump callbacks. Presumably, the context can then be destroyed in the done callback. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 142afbc6b2f33832f332ce5b561aa817edfff0b4) Change-Id: Ibaaffde651e76be2defeaa081ae56ca9e8f93602
* Revert "crypto: api - prevent helper ciphers from being used"Mister Oyster2018-01-051-6/+0
| | | | This reverts commit 467b365068b0376fd670b1b97c22679e9a280bb1.
* fs/file.c: don't acquire files->file_lock in fd_install()Eric Dumazet2017-12-311-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mateusz Guzik reported : Currently obtaining a new file descriptor results in locking fdtable twice - once in order to reserve a slot and second time to fill it. Holding the spinlock in __fd_install() is needed in case a resize is done, or to prevent a resize. Mateusz provided an RFC patch and a micro benchmark : http://people.redhat.com/~mguzik/pipebench.c A resize is an unlikely operation in a process lifetime, as table size is at least doubled at every resize. We can use RCU instead of the spinlock. __fd_install() must wait if a resize is in progress. The resize must block new __fd_install() callers from starting, and wait that ongoing install are finished (synchronize_sched()) resize should be attempted by a single thread to not waste resources. rcu_sched variant is used, as __fd_install() and expand_fdtable() run from process context. It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R) CPU E5-2696 v2 @ 2.50GHz Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mateusz Guzik <mguzik@redhat.com> Acked-by: Mateusz Guzik <mguzik@redhat.com> Tested-by: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [@nathanchance: Leave out documentation, https://github.com/torvalds/linux/commit/8a81252b774b53e628a8a0fe18e2b8fc236d92cc] Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Joe Maples <joe@frap129.org>
* BACKPORT: [UPSTREAM] mbcache2: reimplement mbcacheJan Kara2017-12-311-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (Cherry-pick from commit f9a61eb4e2471c56a63cd804c7474128138c38ac) Original mbcache was designed to have more features than what ext? filesystems ended up using. It supported entry being in more hashes, it had a home-grown rwlocking of each entry, and one cache could cache entries from multiple filesystems. This genericity also resulted in more complex locking, larger cache entries, and generally more code complexity. This is reimplementation of the mbcache functionality to exactly fit the purpose ext? filesystems use it for. Cache entries are now considerably smaller (7 instead of 13 longs), the code is considerably smaller as well (414 vs 913 lines of code), and IMO also simpler. The new code is also much more lightweight. I have measured the speed using artificial xattr-bench benchmark, which spawns P processes, each process sets xattr for F different files, and the value of xattr is randomly chosen from a pool of V values. Averages of runtimes for 5 runs for various combinations of parameters are below. The first value in each cell is old mbache, the second value is the new mbcache. V=10 F\P 1 2 4 8 16 32 64 10 0.158,0.157 0.208,0.196 0.500,0.277 0.798,0.400 3.258,0.584 13.807,1.047 61.339,2.803 100 0.172,0.167 0.279,0.222 0.520,0.275 0.825,0.341 2.981,0.505 12.022,1.202 44.641,2.943 1000 0.185,0.174 0.297,0.239 0.445,0.283 0.767,0.340 2.329,0.480 6.342,1.198 16.440,3.888 V=100 F\P 1 2 4 8 16 32 64 10 0.162,0.153 0.200,0.186 0.362,0.257 0.671,0.496 1.433,0.943 3.801,1.345 7.938,2.501 100 0.153,0.160 0.221,0.199 0.404,0.264 0.945,0.379 1.556,0.485 3.761,1.156 7.901,2.484 1000 0.215,0.191 0.303,0.246 0.471,0.288 0.960,0.347 1.647,0.479 3.916,1.176 8.058,3.160 V=1000 F\P 1 2 4 8 16 32 64 10 0.151,0.129 0.210,0.163 0.326,0.245 0.685,0.521 1.284,0.859 3.087,2.251 6.451,4.801 100 0.154,0.153 0.211,0.191 0.276,0.282 0.687,0.506 1.202,0.877 3.259,1.954 8.738,2.887 1000 0.145,0.179 0.202,0.222 0.449,0.319 0.899,0.333 1.577,0.524 4.221,1.240 9.782,3.579 V=10000 F\P 1 2 4 8 16 32 64 10 0.161,0.154 0.198,0.190 0.296,0.256 0.662,0.480 1.192,0.818 2.989,2.200 6.362,4.746 100 0.176,0.174 0.236,0.203 0.326,0.255 0.696,0.511 1.183,0.855 4.205,3.444 19.510,17.760 1000 0.199,0.183 0.240,0.227 1.159,1.014 2.286,2.154 6.023,6.039 ---,10.933 ---,36.620 V=100000 F\P 1 2 4 8 16 32 64 10 0.171,0.162 0.204,0.198 0.285,0.230 0.692,0.500 1.225,0.881 2.990,2.243 6.379,4.771 100 0.151,0.171 0.220,0.210 0.295,0.255 0.720,0.518 1.226,0.844 3.423,2.831 19.234,17.544 1000 0.192,0.189 0.249,0.225 1.162,1.043 2.257,2.093 5.853,4.997 ---,10.399 ---,32.198 We see that the new code is faster in pretty much all the cases and starting from 4 processes there are significant gains with the new code resulting in upto 20-times shorter runtimes. Also for large numbers of cached entries all values for the old code could not be measured as the kernel started hitting softlockups and died before the test completed. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* BACKPORT: [UPSTREAM] mm: new shrinker APIDave Chinner2017-12-311-9/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (Cherry-pick from commit 24f7c6b981fb70084757382da464ea85d72af300) The current shrinker callout API uses an a single shrinker call for multiple functions. To determine the function, a special magical value is passed in a parameter to change the behaviour. This complicates the implementation and return value specification for the different behaviours. Separate the two different behaviours into separate operations, one to return a count of freeable objects in the cache, and another to scan a certain number of objects in the cache for freeing. In defining these new operations, ensure the return values and resultant behaviours are clearly defined and documented. Modify shrink_slab() to use the new API and implement the callouts for all the existing shrinkers. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Cc: Arve Hjønnevåg <arve@android.com> Cc: Carlos Maiolino <cmaiolino@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: David Rientjes <rientjes@google.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: J. Bruce Fields <bfields@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Stultz <john.stultz@linaro.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Kent Overstreet <koverstreet@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* CHROMIUM: dm: boot time specification of dm=Will Drewry2017-12-271-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a wrap-up of three patches pending upstream approval. I'm bundling them because they are interdependent, and it'll be easier to drop it on rebase later. 1. dm: allow a dm-fs-style device to be shared via dm-ioctl Integrates feedback from Alisdair, Mike, and Kiyoshi. Two main changes occur here: - One function is added which allows for a programmatically created mapped device to be inserted into the dm-ioctl hash table. This binds the device to a name and, optional, uuid which is needed by udev and allows for userspace management of the mapped device. - dm_table_complete() was extended to handle all of the final functional changes required for the table to be operational once called. 2. init: boot to device-mapper targets without an initr* Add a dm= kernel parameter modeled after the md= parameter from do_mounts_md. It allows for device-mapper targets to be configured at boot time for use early in the boot process (as the root device or otherwise). It also replaces /dev/XXX calls with major:minor opportunistically. The format is dm="name uuid ro,table line 1,table line 2,...". The parser expects the comma to be safe to use as a newline substitute but, otherwise, uses the normal separator of space. Some attempt has been made to make it forgiving of additional spaces (using skip_spaces()). A mapped device created during boot will be assigned a minor of 0 and may be access via /dev/dm-0. An example dm-linear root with no uuid may look like: root=/dev/dm-0 dm="lroot none ro, 0 4096 linear /dev/ubdb 0, 4096 4096 linear /dv/ubdc 0" Once udev is started, /dev/dm-0 will become /dev/mapper/lroot. Older upstream threads: http://marc.info/?l=dm-devel&m=127429492521964&w=2 http://marc.info/?l=dm-devel&m=127429499422096&w=2 http://marc.info/?l=dm-devel&m=127429493922000&w=2 Latest upstream threads: https://patchwork.kernel.org/patch/104859/ https://patchwork.kernel.org/patch/104860/ https://patchwork.kernel.org/patch/104861/ BUG: 27175947 Signed-off-by: Will Drewry <wad@chromium.org> Review URL: http://codereview.chromium.org/2020011 Change-Id: I92bd53432a11241228d2e5ac89a3b20d19b05a31
* vfs: Fix pathological performance case for __alloc_fd()Linus Torvalds2017-12-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Al Viro points out that: > > * [Linux-specific aside] our __alloc_fd() can degrade quite badly > > with some use patterns. The cacheline pingpong in the bitmap is probably > > inevitable, unless we accept considerably heavier memory footprint, > > but we also have a case when alloc_fd() takes O(n) and it's _not_ hard > > to trigger - close(3);open(...); will have the next open() after that > > scanning the entire in-use bitmap. And Eric Dumazet has a somewhat realistic multithreaded microbenchmark that opens and closes a lot of sockets with minimal work per socket. This patch largely fixes it. We keep a 2nd-level bitmap of the open file bitmaps, showing which words are already full. So then we can traverse that second-level bitmap to efficiently skip already allocated file descriptors. On his benchmark, this improves performance by up to an order of magnitude, by avoiding the excessive open file bitmap scanning. Tested-and-acked-by: Eric Dumazet <edumazet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com> Signed-off-by: Joe Maples <joe@frap129.org>
* perf: Use hrtimers for event multiplexingStephane Eranian2017-12-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current scheme of using the timer tick was fine for per-thread events. However, it was causing bias issues in system-wide mode (including for uncore PMUs). Event groups would not get their fair share of runtime on the PMU. With tickless kernels, if a core is idle there is no timer tick, and thus no event rotation (multiplexing). However, there are events (especially uncore events) which do count even though cores are asleep. This patch changes the timer source for multiplexing. It introduces a per-PMU per-cpu hrtimer. The advantage is that even when a core goes idle, it will come back to service the hrtimer, thus multiplexing on system-wide events works much better. The per-PMU implementation (suggested by PeterZ) enables adjusting the multiplexing interval per PMU. The preferred interval is stashed into the struct pmu. If not set, it will be forced to the default interval value. In order to minimize the impact of the hrtimer, it is turned on and off on demand. When the PMU on a CPU is overcommited, the hrtimer is activated. It is stopped when the PMU is not overcommitted. In order for this to work properly, we had to change the order of initialization in start_kernel() such that hrtimer_init() is run before perf_event_init(). The default interval in milliseconds is set to a timer tick just like with the old code. We will provide a sysctl to tune this in another patch. Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Link: http://lkml.kernel.org/r/1364991694-5876-2-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: mydongistiny <jaysonedson@gmail.com> Signed-off-by: Joe Maples <joe@frap129.org>
* cpu: Defer smpboot kthread unparking until CPU known to schedulerPaul E. McKenney2017-12-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, smpboot_unpark_threads() is invoked before the incoming CPU has been added to the scheduler's runqueue structures. This might potentially cause the unparked kthread to run on the wrong CPU, since the correct CPU isn't fully set up yet. That causes a sporadic, hard to debug boot crash triggering on some systems, reported by Borislav Petkov, and bisected down to: 2a442c9c6453 ("x86: Use common outgoing-CPU-notification code") This patch places smpboot_unpark_threads() in a CPU hotplug notifier with priority set so that these kthreads are unparked just after the CPU has been added to the runqueues. Change-Id: I8921987de9c2a2f475cc63dc82662d6ebf6e8725 Reported-and-tested-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Git-commit: 00df35f991914db6b8bde8cf09808e19a9cffc3d Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Matt Wagantall <mattw@codeaurora.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com> Signed-off-by: Joe Maples <joe@frap129.org>
* arm64: use the new *_relaxed macros for lower power usagefranciscofranco2017-12-252-3/+4
| | | | | | Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com> Signed-off-by: Joe Maples <joe@frap129.org> Signed-off-by: Mister Oyster <oysterized@gmail.com>
* pmem: cleanup last bits of itMister Oyster2017-12-241-111/+0
|
* fsync: revert to a version with fbsuspend instead of earlysuspendMister Oyster2017-12-231-6/+0
| | | | | This reverts commit fdd3da71eb0fae04ef0e67cd02eab371a26449c6. This reverts commit 384cf00787041088f91a0604dec112317135a369.
* Provide a binary to hex conversion functionDavid Howells2017-12-221-0/+1
| | | | | | | | Provide a function to convert a buffer of binary data into an unterminated ascii hex string representation of that data. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com>
* compiler.h: add support for malloc attributeRasmus Villemoes2017-12-182-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc as far back as at least 3.04 documents the function attribute __malloc__. Add a shorthand for attaching that to a function declaration. This was also suggested by Andi Kleen way back in 2002 [1], but didn't get applied, perhaps because gcc at that time generated the exact same code with and without this attribute. This attribute tells the compiler that the return value (if non-NULL) can be assumed not to alias any other valid pointers at the time of the call. Please note that the documentation for a range of gcc versions (starting from around 4.7) contained a somewhat confusing and self-contradicting text: The malloc attribute is used to tell the compiler that a function may be treated as if any non-NULL pointer it returns cannot alias any other pointer valid when the function returns and *that the memory has undefined content*. [...] Standard functions with this property include malloc and *calloc*. (emphasis mine). The intended meaning has later been clarified [2]: This tells the compiler that a function is malloc-like, i.e., that the pointer P returned by the function cannot alias any other pointer valid when the function returns, and moreover no pointers to valid objects occur in any storage addressed by P. What this means is that we can apply the attribute to kmalloc and friends, and it is ok for the returned memory to have well-defined contents (__GFP_ZERO). But it is not ok to apply it to kmemdup(), nor to other functions which both allocate and possibly initialize the memory with existing pointers. So unless someone is doing something pretty perverted kstrdup() should also be a fine candidate. [1] http://thread.gmane.org/gmane.linux.kernel/57172 [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56955 Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* compiler-gcc: disable -ftracer for __noclone functionsPaolo Bonzini2017-12-181-1/+1
| | | | | | | | | | | | | | | | | | | | -ftracer can duplicate asm blocks causing compilation to fail in noclone functions. For example, KVM declares a global variable in an asm like asm("2: ... \n .pushsection data \n .global vmx_return \n vmx_return: .long 2b"); and -ftracer causes a double declaration. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Marek <mmarek@suse.cz> Cc: stable@vger.kernel.org Cc: kvm@vger.kernel.org Reported-by: Linda Walsh <lkml@tlinx.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* mm: page_alloc: add kasan hooks on alloc and free pathsAndrey Ryabinin2017-12-181-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add kernel address sanitizer hooks to mark allocated page's addresses as accessible in corresponding shadow region. Mark freed pages as inaccessible. Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* compiler-gcc.h: use "proved" instead of "proofed"Benjamin Peterson2017-12-181-1/+1
| | | | | | | Link: http://lkml.kernel.org/r/1477894241.1103202.772260161.1B0A5995@webmail.messagingengine.com Signed-off-by: Benjamin Peterson <bp@benjamin.pe> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* compiler-gcc.h: neateningJoe Perches2017-12-181-40/+45
| | | | | | | | | | | | | | | | | | | | | | - Move the inline and noinline blocks together - Comment neatening - Alignment of __attribute__ uses - Consistent naming of __must_be_array macro argument - Multiline macro neatening Signed-off-by: Joe Perches <joe@perches.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Michal Marek <mmarek@suse.cz> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Sasha Levin <levinsasha928@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Alan Modra <amodra@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* compiler: introduce __alias(symbol) shortcutAndrey Ryabinin2017-12-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | To be consistent with other compiler attributes introduce __alias(symbol) macro expanding into __attribute__((alias(#symbol))) Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Konstantin Serebryany <kcc@google.com> Cc: Dmitry Chernenkov <dmitryc@google.com> Signed-off-by: Andrey Konovalov <adech.fo@gmail.com> Cc: Yuri Gribov <tetra2005@gmail.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Christoph Lameter <cl@linux.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>