aboutsummaryrefslogtreecommitdiff
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* perf/core: Fix event inheritance on fork()Peter Zijlstra2017-06-171-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e7cc4865f0f31698ef2f7aac01a50e78968985b7 upstream. While hunting for clues to a use-after-free, Oleg spotted that perf_event_init_context() can loose an error value with the result that fork() can succeed even though we did not fully inherit the perf event context. Spotted-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: oleg@redhat.com Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists") Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* locking/static_keys: Add static_key_{en,dis}able() helpersPeter Zijlstra2017-06-171-4/+2
| | | | | | | | | | | | | | | | | | | | | | | commit e33886b38cc82a9fc3b2d655dfc7f50467594138 upstream. Add two helpers to make it easier to treat the refcount as boolean. [js] do not involve WARN_ON_ONCE as it causes build failures Suggested-by: Jason Baron <jasonbaron0@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> [wt: only backported for use in next fix ; s/static_key_count(key)/atomic_read(&key->enabled)/] Signed-off-by: Willy Tarreau <w@1wt.eu>
* hotplug: Make register and unregister notifier API symmetricMichal Hocko2017-06-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 777c6e0daebb3fcefbbd6f620410a946b07ef6d0 upstream. Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its notifiers when HOTPLUG_CPU=y while the registration might succeed even when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap might keep a stale notifier on the list on the manual clean up during the pool tear down and thus corrupt the list. Resulting in the following [ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78 [ 144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40 <snipped> [ 145.122628] Call Trace: [ 145.125086] [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20 [ 145.131350] [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400 [ 145.137268] [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300 [ 145.143188] [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10 [ 145.149018] [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30 [ 145.154940] [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0 [ 145.160511] [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20 [ 145.167035] [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0 [ 145.172694] [<ffffffffa290848d>] module_attr_store+0x1d/0x30 [ 145.178443] [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70 [ 145.183925] [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180 [ 145.189761] [<ffffffffa2a99248>] __vfs_write+0x18/0x40 [ 145.194982] [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0 [ 145.200122] [<ffffffffa2a9a732>] SyS_write+0x52/0xa0 [ 145.205177] [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17 This can be even triggered manually by changing /sys/module/zswap/parameters/compressor multiple times. Fix this issue by making unregister APIs symmetric to the register so there are no surprises. [js] backport to 3.12 Fixes: 47e627bc8c9a ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU") Reported-and-tested-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Streetman <ddstreet@ieee.org> Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
* locking/rtmutex: Prevent dequeue vs. unlock raceThomas Gleixner2017-06-171-2/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit dbb26055defd03d59f678cb5f2c992abe05b064a upstream. David reported a futex/rtmutex state corruption. It's caused by the following problem: CPU0 CPU1 CPU2 l->owner=T1 rt_mutex_lock(l) lock(l->wait_lock) l->owner = T1 | HAS_WAITERS; enqueue(T2) boost() unlock(l->wait_lock) schedule() rt_mutex_lock(l) lock(l->wait_lock) l->owner = T1 | HAS_WAITERS; enqueue(T3) boost() unlock(l->wait_lock) schedule() signal(->T2) signal(->T3) lock(l->wait_lock) dequeue(T2) deboost() unlock(l->wait_lock) lock(l->wait_lock) dequeue(T3) ===> wait list is now empty deboost() unlock(l->wait_lock) lock(l->wait_lock) fixup_rt_mutex_waiters() if (wait_list_empty(l)) { owner = l->owner & ~HAS_WAITERS; l->owner = owner ==> l->owner = T1 } lock(l->wait_lock) rt_mutex_unlock(l) fixup_rt_mutex_waiters() if (wait_list_empty(l)) { owner = l->owner & ~HAS_WAITERS; cmpxchg(l->owner, T1, NULL) ===> Success (l->owner = NULL) l->owner = owner ==> l->owner = T1 } That means the problem is caused by fixup_rt_mutex_waiters() which does the RMW to clear the waiters bit unconditionally when there are no waiters in the rtmutexes rbtree. This can be fatal: A concurrent unlock can release the rtmutex in the fastpath because the waiters bit is not set. If the cmpxchg() gets in the middle of the RMW operation then the previous owner, which just unlocked the rtmutex is set as the owner again when the write takes place after the successfull cmpxchg(). The solution is rather trivial: verify that the owner member of the rtmutex has the waiters bit set before clearing it. This does not require a cmpxchg() or other atomic operations because the waiters bit can only be set and cleared with the rtmutex wait_lock held. It's also safe against the fast path unlock attempt. The unlock attempt via cmpxchg() will either see the bit set and take the slowpath or see the bit cleared and release it atomically in the fastpath. It's remarkable that the test program provided by David triggers on ARM64 and MIPS64 really quick, but it refuses to reproduce on x86-64, while the problem exists there as well. That refusal might explain that this got not discovered earlier despite the bug existing from day one of the rtmutex implementation more than 10 years ago. Thanks to David for meticulously instrumenting the code and providing the information which allowed to decode this subtle problem. Reported-by: David Daney <ddaney@caviumnetworks.com> Tested-by: David Daney <david.daney@cavium.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> [wt: s/{READ,WRITE}_ONCE/ACCESS_ONCE/] Signed-off-by: Willy Tarreau <w@1wt.eu>
* Revert "jiffies conversions: Use compile time constants when possible"Mister Oyster2017-06-021-12/+94
| | | | | Build breaks when any of this code is included This reverts commit 9722cd7360077819a5af4937c0f742149fcec82c.
* fs: add dirtytime_expire_seconds sysctlTheodore Ts'o2017-05-291-0/+8
| | | | | | | | Add a tuning knob so we can adjust the dirtytime expiration timeout, which is very useful for testing lazytime. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* sched: add bit_wait_io for 3.18 ext4 backportTheodore Ts'o2017-05-271-0/+9
| | | | | | | Excerpted from commit 743162013: "sched: Remove proliferation of wait_on_bit() action functions" Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* nohz: Avoid tick's double reprogramming in highres modeViresh Kumar2017-05-241-0/+4
| | | | | | | | | | | | | | | | | | | | | In highres mode, the tick reschedules itself unconditionally to the next jiffies. However while this clock reprogramming is relevant when the tick is in periodic mode, it's not that interesting when we run in dynticks mode because irq exit is likely going to overwrite the next tick to some randomly deferred future. So lets just get rid of this tick self rescheduling in dynticks mode. This way we can avoid some clockevents double write in favourable scenarios like when we stop the tick completely in idle while no other hrtimer is pending. Change-Id: I4e73da9bffbcac49789e5f279b67873f414b22a8 Suggested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* nohz: Fix spurious periodic tick behaviour in low-res dynticks modeViresh Kumar2017-05-241-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we reach the end of the tick handler, we unconditionally reschedule the next tick to the next jiffy. Then on irq exit, the nohz code overrides that setting if needed and defers the next tick as far away in the future as possible. Now in the best dynticks case, when we actually don't need any tick in the future (ie: expires == KTIME_MAX), low-res and high-res behave differently. What we want in this case is to cancel the next tick programmed by the previous one. That's what we do in high-res mode. OTOH we lack a low-res mode equivalent of hrtimer_cancel() so we simply don't do anything in this case and the next tick remains scheduled to jiffies + 1. As a result, in low-res mode, when the dynticks code determines that no tick is needed in the future, we can recursively get a spurious tick every jiffy because then the next tick is always reprogrammed from the tick handler and is never cancelled. And this can happen indefinetly until some subsystem actually needs a precise tick in the future and only then we eventually overwrite the previous tick handler setting to defer the next tick. We are fixing this by introducing the ONESHOT_STOPPED mode which will let us pause a clockevent when no further interrupt is needed. Meanwhile we can't expect all drivers to support this new mode. So lets reduce much of the symptoms by skipping the nohz-blind tick rescheduling from the tick-handler when the CPU is in dynticks mode. That tick rescheduling wrongly assumed periodicity and the low-res dynticks code can't cancel such decision. This breaks the recursive (and thus the worst) part of the problem. In the worst case now, we'll get only one extra tick due to uncancelled tick scheduled before we entered dynticks mode. This also removes a needless clockevent write on idle ticks. Since those clock write are usually considered to be slow, it's a general win. Change-Id: Id416f55d0356c0f3d178e87d9821ea50d2e1ed69 Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* nohz: Get timekeeping max deferment outside jiffies_lockFrederic Weisbecker2017-05-241-1/+2
| | | | | | | | | | | | | | | | | | | | | We don't need to fetch the timekeeping max deferment under the jiffies_lock seqlock. If the clocksource is updated concurrently while we stop the tick, stop machine is called and the tick will be reevaluated again along with uptodate jiffies and its related values. Change-Id: I2889f0b3da1be1b78dfc74e62986deaafd27b679 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Kevin Hilman <khilman@linaro.org> Link: http://lkml.kernel.org/r/1387320692-28460-9-git-send-email-fweisbec@gmail.com Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* kernel: Only expose su when daemon is runningTom Marshall2017-05-213-0/+39
| | | | | | | | | | | | | | It has been claimed that the PG implementation of 'su' has security vulnerabilities even when disabled. Unfortunately, the people that find these vulnerabilities often like to keep them private so they can profit from exploits while leaving users exposed to malicious hackers. In order to reduce the attack surface for vulnerabilites, it is therefore necessary to make 'su' completely inaccessible when it is not in use (except by the root and system users). Change-Id: I79716c72f74d0b7af34ec3a8054896c6559a181d
* UPSTREAM: tracing: Fix trace_printk() to print when not using bprintk()Joe Maples2017-05-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The trace_printk() code will allocate extra buffers if the compile detects that a trace_printk() is used. To do this, the format of the trace_printk() is saved to the __trace_printk_fmt section, and if that section is bigger than zero, the buffers are allocated (along with a message that this has happened). If trace_printk() uses a format that is not a constant, and thus something not guaranteed to be around when the print happens, the compiler optimizes the fmt out, as it is not used, and the __trace_printk_fmt section is not filled. This means the kernel will not allocate the special buffers needed for the trace_printk() and the trace_printk() will not write anything to the tracing buffer. Adding a "__used" to the variable in the __trace_printk_fmt section will keep it around, even though it is set to NULL. This will keep the string from being printed in the debugfs/tracing/printk_formats section as it is not needed. Reported-by: Vlastimil Babka <vbabka@suse.cz> Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()" Cc: stable@vger.kernel.org # v3.5+ Bug: 34277115 Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Change-Id: I10ce56caa41c7644d9d290d9ed272a6d156c938c Signed-off-by: Joe Maples <joe@frap129.org>
* perf: don't leave group_entry on sibling list (use-after-free)John Dias2017-05-071-0/+7
| | | | | | | | | | | | | When perf_group_detach is called on a group leader, it should empty its sibling list. Otherwise, when a sibling is later deallocated, list_del_event() removes the sibling's group_entry from its current list, which can be the now-deallocated group leader's sibling list (use-after-free bug). Bug: 32402548 Change-Id: I99f6bc97c8518df1cb0035814368012ba72ab1f1 Signed-off-by: John Dias <joaodias@google.com>
* um: fix buildMister Oyster2017-05-041-0/+2
|
* BACKPORT: trace: resolve stack corruption due to string copyAmey Telawane2017-05-021-1/+1
| | | | | | | | | | | Strcpy has no limit on string being copied which causes stack corruption leading to kernel panic. Use strlcpy to resolve the issue by providing length of string to be copied. CRs-fixed: 1048480 Bug: 35399704 Change-Id: Ib290b25f7e0ff96927b8530e5c078869441d409f Signed-off-by: Amey Telawane <ameyt@codeaurora.org>
* tracing: do not leak kernel addressesNick Desaulniers2017-05-021-1/+1
| | | | | | | | This likely breaks tracing tools like trace-cmd. It logs in the same format but now addresses are all 0x0. Bug: 34277115 Change-Id: Ifb0d4d2a184bf0d95726de05b1acee0287a375d9
* sched: Disabled Gentle Fair Sleepers for better UI performanceCristoforo Cataldo2017-04-251-1/+1
|
* drivers:lmk: Fix double delete issueHong-Mei Li2017-04-171-0/+3
| | | | | | | | | | | | | | | | | someone may change a process's oom_score_adj by proc fs, even though the process has exited. In that case, the task was deleted from the rb tree already, and the redundant deleting would trigger rb_erase panic finally. In this patch, we make sure to clear the node after deteting and check its empty status before rb_erase. Change-Id: I26098ca3350f111e94567f9e65ec3dce413197aa Signed-off-by: Hong-Mei Li <a21834@motorola.com> Reviewed-on: http://gerrit.mot.com/727760 SME-Granted: SME Approvals Granted SLTApproved: Slta Waiver <sltawvr@motorola.com> Tested-by: Jira Key <jirakey@motorola.com> Reviewed-by: Sheng-Zhe Zhao <a18689@motorola.com> Submit-Approved: Jira Key <jirakey@motorola.com>
* alarmtimer: fix log and indentMister Oyster2017-04-161-3/+6
|
* mtk: alarmtimer: get rid of xlog and debugMister Oyster2017-04-131-29/+4
|
* Set the device to wake up 1 1/2 minutes beforeMarcos Marado2017-04-131-1/+2
| | | | | | | | | | | | | Setting RTC_PWRON_SEC to -90, so it will wake up 1 and 1/2 minutes before the requested time, giving time for the system to boot up and AlarmManager to be ready for when the time comes. Note: Original commit was for 120 seconds, reduced because of faster hardware. Change-Id: Ie579529f417ad2f6044e347a55e88d1b72503731 Ticket: PORRIDGE-12 Signed-off-by: Mister Oyster <oysterized@gmail.com>
* power: Remove HAS_WAKELOCK config and document WAKELOCKDylan Reid2017-04-131-5/+6
| | | | | | | | | | Remove the HAS_WAKELOCK config as it doesn't seem to have been used in the 3.10 or 3.14 kernels. Add some Documentation to CONFIG_WAKELOCK so that it is selectable and can be disabled is desired. Signed-off-by: Dylan Reid <dgreid@chromium.org>
* wakeup_reason: use vsnprintf instead of snsprintf for vargs.Ruchi Kandoi2017-04-131-1/+1
| | | | | Bug: 22368519 Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
* power: wakeup_reason: fix suspend time reportingAmit Pundir2017-04-131-16/+25
| | | | | | | | | | | | | | | | | Suspend time reporting Change-Id: I2cb9a9408a5fd12166aaec11b935a0fd6a408c63 (Power: Report suspend times from last_suspend_time), is broken on 3.16+ kernels because get_xtime_and_monotonic_and_sleep_offset() hrtimer helper routine is removed from kernel timekeeping. The replacement helper routines ktime_get_update_offsets_{tick,now}() are private to core kernel timekeeping so we can't use them, hence using ktime_get() and ktime_get_boottime() instead and sampling the time twice. Idea is to use Monotonic boottime offset to calculate total time spent in last suspend state and CLOCK_MONOTONIC to calculate time spent in last suspend-resume process. Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
* alarmtimer: Export symbols of functions declared in linux/alarmtimer.hMarcus Gelderie2017-04-131-1/+9
| | | | | | | | | | | Export symbols so they can be used by drivers/staging/android/alarm-dev.c if it is built as a module. So far alarm-dev is built-in but module support is planned (see drivers/staging/android/TODO). Signed-off-by: Marcus Gelderie <redmnic@gmail.com> [jstultz: tweaked commit message, also export newly added functions] Signed-off-by: John Stultz <john.stultz@linaro.org>
* alarmtimer: Export symbols of alarmtimer_get_rtcdevPramod Gurav2017-04-131-1/+1
| | | | | | | | | | | Export symbol of alarmtimer_get_rtcdev so that it is used by any driver when built as module like, drivers/staging/android/alarm-dev.c. CC: John Stultz <john.stultz@linaro.org> CC: Marcus Gelderie <redmnic@gmail.com> Signed-off-by: Pramod Gurav <pramod.gurav.etc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* UPSTREAM: seccomp: always propagate NO_NEW_PRIVS on tsyncJann Horn2017-04-131-11/+11
| | | | | | | | | | | | | | | | Before this patch, a process with some permissive seccomp filter that was applied by root without NO_NEW_PRIVS was able to add more filters to itself without setting NO_NEW_PRIVS by setting the new filter from a throwaway thread with NO_NEW_PRIVS. Signed-off-by: Jann Horn <jann@thejh.net> Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Bug: 36656103 (cherry-picked from commit 103502a35cfce0710909da874f092cb44823ca03) Signed-off-by: Paul Lawrence <paullawrence@google.com> Change-Id: I5abd7daab9172f1dfd53e11706b7c7f331f2f4f1
* f2fs: avoid hungtask problem caused by losing wake_upYunlei He2017-04-131-0/+1
| | | | | | | | | | | | | | | | | | | The D state of wait_on_all_pages_writeback should be waken by function f2fs_write_end_io when all writeback pages have been succesfully written to device. It's possible that wake_up comes between get_pages and io_schedule. Maybe in this case it will lost wake_up and still in D state even if all pages have been write back to device, and finally, the whole system will be into the hungtask state. if (!get_pages(sbi, F2FS_WRITEBACK)) break; <--------- wake_up io_schedule(); Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Biao He <hebiao6@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* perf: Cure event->pending_disable racePeter Zijlstra2017-04-131-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 28a967c3a2f99fa3b5f762f25cb2a319d933571b upstream. Because event_sched_out() checks event->pending_disable _before_ actually disabling the event, it can happen that the event fires after it checks but before it gets disabled. This would leave event->pending_disable set and the queued irq_work will try and process it. However, if the event trigger was during schedule(), the event might have been de-scheduled by the time the irq_work runs, and perf_event_disable_local() will fail. Fix this by checking event->pending_disable _after_ we call event->pmu->del(). This depends on the latter being a compiler barrier, such that the compiler does not lift the load and re-creates the problem. Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174948.040469884@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: mydongistiny <jaysonedson@gmail.com>
* sched/fair: Optimize find_idlest_cpu() when there is no choiceMorten Rasmussen2017-04-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | In the current find_idlest_group()/find_idlest_cpu() search we end up calling find_idlest_cpu() in a sched_group containing only one CPU in the end. Checking idle-states becomes pointless when there is no alternative, so bail out instead. Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dietmar.eggemann@arm.com Cc: linux-kernel@vger.kernel.org Cc: mgalbraith@suse.de Cc: vincent.guittot@linaro.org Cc: yuyang.du@intel.com Link: http://lkml.kernel.org/r/1466615004-3503-4-git-send-email-morten.rasmussen@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: RyTek <rytek1128@outlook.com>
* idle: Implement a per-cpu idle-polling modeVikram Mulukutla2017-04-131-2/+25
| | | | | | | | | | | | | | | | cpu_idle_poll_ctrl provides a way of switching the idle thread to use cpu_idle_poll instead of the arch specific lower power mode callbacks (arch_cpu_idle). cpu_idle_poll spins on a flag in a tight loop with interrupts enabled. In some cases it may be useful to enter the tight loop polling mode only on a particular CPU. This allows other CPUs to continue using the arch specific low power mode callbacks. Provide an API that allows this. Change-Id: I7c47c3590eb63345996a1c780faa79dbd1d9fdb4 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
* idle: exit the cpu_idle_poll loop if cpu_idle_force_poll is clearedVikram Mulukutla2017-04-131-1/+1
| | | | | | | | | | | | | | | | | | | | cpu_idle_poll_ctrl allows the enabling/disabling of the idle polling mode; this mode allows a CPU to spin waiting for a new task to be scheduled rather than having to execute the arch specific idle code. However, the loop that checks for a new task does not look at the flag that enables idle polling mode. So, the CPU may continue to spin even though the aforementioned flag has been cleared. Since the CPU is already in idle, it may be a while before a task is scheduled, precluding potential power savings. Modify the while loop conditional in question to also check if the cpu_idle_force_poll flag is set. Change-Id: Ia2e83af97890dc399b86e090459a41d31ce28b6c Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
* idle: Add a memory barrier after setting cpu_idle_force_pollVikram Mulukutla2017-04-131-0/+3
| | | | | | | | | To ensure that CPUs see cpu_idle_force_poll flag updates, add a memory barrier after writing to the flag. Change-Id: Ic3fdef7d17b673247bce5093530ce8aa08694632 Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
* kernel: trace: fix misleading-indentation warningNathan Chancellor2017-04-131-1/+1
| | | | | | | | | | | | kernel/trace/trace_output.c: In function 'trace_graph_ret_raw': kernel/trace/trace_output.c:1198:2: warning: this 'if' clause does not guard... [-Wmisleading-indentation] if (!trace_seq_printf(&iter->seq, "%lx %lld %lld %ld %d\n", ^~ kernel/trace/trace_output.c:1204:3: note: ...this statement, but the latter is misleadingly indented as if it is guarded by the 'if' return TRACE_TYPE_PARTIAL_LINE; ^~~~~~ Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
* sysctl: fix maybe-uninitialized warningsNathan Chancellor2017-04-131-2/+2
| | | | | | | | | | | | | | | | | | | | | kernel/sysctl.c: In function '__do_proc_dointvec.isra.3': kernel/sysctl.c:2030:8: warning: 'kbuf' may be used uninitialized in this function [-Wmaybe-uninitialized] char *tmp = skip_spaces(*buf); ^~~ kernel/sysctl.c:2183:8: note: 'kbuf' was declared here char *kbuf; ^~~~ kernel/sysctl.c: In function '__do_proc_doulongvec_minmax': kernel/sysctl.c:2030:8: warning: 'kbuf' may be used uninitialized in this function [-Wmaybe-uninitialized] char *tmp = skip_spaces(*buf); ^~~ kernel/sysctl.c:2433:8: note: 'kbuf' was declared here char *kbuf; ^~~~ This will be initialized to NULL normally. Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
* kernel/panic.c: add missing \nJiri Slaby2017-04-131-1/+1
| | | | | | | | | | When a system panics, the "Rebooting in X seconds.." message is never printed because it lacks a new line. Fix it. Link: http://lkml.kernel.org/r/20170119114751.2724-1-jslaby@suse.cz Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mtk: squashed security updatesMoyster2017-04-111-1/+1
|
* Get rid of __cpuinitMoyster2017-04-1119-42/+42
| | | | | | | | | | | | | | | | | | | | | This commit is the result of find . -name '*.c' | xargs sed -i 's/ __cpuinit / /g' find . -name '*.c' | xargs sed -i 's/ __cpuexit / /g' find . -name '*.c' | xargs sed -i 's/ __cpuinitdata / /g' find . -name '*.c' | xargs sed -i 's/ __cpuinit$//g' find ./arch/ -name '*.h' | xargs sed -i 's/ __cpuinit//g' find . -name '*.c' | xargs sed -i 's/^__cpuinit //g' find . -name '*.c' | xargs sed -i 's/^__cpuinitdata //g' find . -name '*.c' | xargs sed -i 's/\*__cpuinit /\*/g' find . -name '*.c' | xargs sed -i 's/ __cpuinitconst / /g' find . -name '*.h' | xargs sed -i 's/ __cpuinit / /g' find . -name '*.h' | xargs sed -i 's/ __cpuinitdata / /g' git add . git reset include/linux/init.h git checkout -- include/linux/init.h based off : https://github.com/jollaman999/jolla-kernel_bullhead/commit/bc15db84a622eed7d61d3ece579b577154d0ec29
* Power: Report suspend times from last_suspend_timejinqian2017-04-111-0/+36
| | | | | | | | | This node epxorts two values separated by space. From left to right: 1. time spent in suspend/resume process 2. time spent sleep in suspend state Change-Id: I2cb9a9408a5fd12166aaec11b935a0fd6a408c63
* perf: Do not double freePeter Zijlstra2017-04-111-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | In case of: err_file: fput(event_file), we'll end up calling perf_release() which in turn will free the event. Do not then free the event _again_. Change-Id: Ic1de33d0e29e577a1fc2e00c35bf44df26d96ab6 Tested-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174947.697350349@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* kernel: support task's adj rbtreeYi-wei Zhao2017-04-112-0/+2
| | | | | | | | | | | | | Add (or del) a task to (or from) task's adj rbtree when a task is created or exit. Change-Id: Ic63e03355a1fed8c500097bad223c59c742a2346 Signed-off-by: Hong-Mei Li <a21834@motorola.com> Signed-off-by: Yi-wei Zhao <gbjc64@motorola.com> Reviewed-on: http://gerrit.mot.com/701207 SLTApproved: Slta Waiver <sltawvr@motorola.com> Tested-by: Jira Key <jirakey@motorola.com> Submit-Approved: Jira Key <jirakey@motorola.com>
* perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' racePeter Zijlstra2017-04-111-4/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 321027c1fe77f892f4ea07846aeae08cefbbb290 upstream. Di Shen reported a race between two concurrent sys_perf_event_open() calls where both try and move the same pre-existing software group into a hardware context. The problem is exactly that described in commit: f63a8daa5812 ("perf: Fix event->ctx locking") ... where, while we wait for a ctx->mutex acquisition, the event->ctx relation can have changed under us. That very same commit failed to recognise sys_perf_event_context() as an external access vector to the events and thereby didn't apply the established locking rules correctly. So while one sys_perf_event_open() call is stuck waiting on mutex_lock_double(), the other (which owns said locks) moves the group about. So by the time the former sys_perf_event_open() acquires the locks, the context we've acquired is stale (and possibly dead). Apply the established locking rules as per perf_event_ctx_lock_nested() to the mutex_lock_double() for the 'move_group' case. This obviously means we need to validate state after we acquire the locks. Change-Id: I83d360303e812232ae7aae492350813f0e79cc71 Reported-by: Di Shen (Keen Lab) Tested-by: John Dias <joaodias@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Min Chong <mchong@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: f63a8daa5812 ("perf: Fix event->ctx locking") Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> [bwh: Backported to 3.2: - Use ACCESS_ONCE() instead of READ_ONCE() - Test perf_event::group_flags instead of group_caps - Add the err_locked cleanup block, which we didn't need before - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* BACKPORT: perf: Fix event->ctx lockingJohn Dias2017-04-111-37/+207
| | | | | | | | | | | | | | | | | | | | | | | | | There have been a few reported issues wrt. the lack of locking around changing event->ctx. This patch tries to address those. It avoids the whole rwsem thing; and while it appears to work, please give it some thought in review. What I did fail at is sensible runtime checks on the use of event->ctx, the RCU use makes it very hard. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.209535886@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit f63a8daa5812afef4f06c962351687e1ff9ccb2b) Bug: 30955111 Bug: 31095224 Change-Id: I5bab713034e960fad467637e98e914440de5666d
* perf: protect group_leader from races that cause ctx double-freeJohn Dias2017-04-111-0/+15
| | | | | | | | | | | | | | | | | | When moving a group_leader perf event from a software-context to a hardware-context, there's a race in checking and updating that context. The existing locking solution doesn't work; note that it tries to grab a lock inside the group_leader's context object, which you can only get at by going through a pointer that should be protected from these races. To avoid that problem, and to produce a simple solution, we can just use a lock per group_leader to protect all checks on the group_leader's context. The new lock is grabbed and released when no context locks are held. Bug: 30955111 Bug: 31095224 Change-Id: If37124c100ca6f4aa962559fba3bd5dbbec8e052
* rcu: Merge rcu_sched_force_quiescent_state() with rcu_force_quiescent_state()Andreea-Cristina Bernat2017-04-112-19/+9
| | | | | | | | | | | | | | | This patch merges the function rcu_force_quiescent_state() with rcu_sched_force_quiescent_state(), using the rcu_state pointer. Firstly, the rcu_sched_force_quiescent_state() function is deleted from the file kernel/rcu/tree.c. Also, the rcu_force_quiescent_state() function that was calling force_quiescent_state with the argument rcu_preempt_state pointer was deleted as well. The new function that combines the old ones uses the rcu_state pointer and is located after rcu_batches_completed_bh() in kernel/rcu/tree.c. Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Consolidate kfree_call_rcu() to use rcu_state pointerAndreea-Cristina Bernat2017-04-112-30/+14
| | | | | | | | | | | | kfree_call_rcu is defined two times. When defined under CONFIG_TREE_PREEMPT_RCU, it uses rcu_preempt_state. Otherwise, it uses rcu_sched_state. This patch uses the rcu_state_pointer to combine the two definitions into one. The resulting function is placed after the closing of the preprocessor conditional CONFIG_TREE_PREEMPT_RCU. Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Replace NR_CPUS with nr_cpu_idsHimangi Saraogi2017-04-111-2/+2
| | | | | | | | | This patch replaces NR_CPUS with nr_cpu_ids as NR_CPUS should consider cpumask_var_t. Signed-off-by: Himangi Saraogi <himangi774@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Print negatives for stall-warning counter wraparoundPaul E. McKenney2017-04-111-4/+5
| | | | | | | | | | | The print_other_cpu_stall() and print_cpu_stall() functions print grace-period numbers using an unsigned format, which means that the number one less than zero is a very large number. This commit therefore causes these numbers to be printed with a signed format in order to improve readability of the RCU CPU stall-warning output. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Stop tracking FSF's postal addressPaul E. McKenney2017-04-1110-20/+20
| | | | | | | | | | | | All of the RCU source files have the usual GPL header, which contains a long-obsolete postal address for FSF. To avoid the need to track the FSF office's movements, this commit substitutes the URL where GPL may be found. Reported-by: Greg KH <gregkh@linuxfoundation.org> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
* rcu: Add ACCESS_ONCE() to ->n_force_qs_lh accessesPaul E. McKenney2017-04-112-3/+3
| | | | | | | | | | | The ->n_force_qs_lh field is accessed without the benefit of any synchronization, so this commit adds the needed ACCESS_ONCE() wrappers. Yes, increments to ->n_force_qs_lh can be lost, but contention should be low and the field is strictly statistical in nature, so this is not a problem. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>