aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* sched: print_rq(): Don't use tasklist_lockOleg Nesterov2019-05-021-5/+2
| | | | | | | | | | | | | | | | | | read_lock_irqsave(tasklist_lock) in print_rq() looks strange. We do not need to disable irqs, and they are already disabled by the caller. And afaics this lock buys nothing, we can rely on rcu_read_lock(). In this case it makes sense to also move rcu_read_lock/unlock from the caller to print_rq(). Change-Id: Iadf0de148e27623af4535abc40c77c1dfd1f9c76 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140921193341.GA28628@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched: s/do_each_thread/for_each_process_thread/ in debug.cOleg Nesterov2019-05-021-4/+2
| | | | | | | | | | | | | | | | | | | Change kernel/sched/debug.c to use for_each_process_thread(). Change-Id: Idb9f4ffca0b60746a1109be17ce22cc06f3cc690 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sanjay Rao <srao@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140813191956.GA19324@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* memcg: optimize the "Search everything else" loop in mm_update_next_owner()Oleg Nesterov2019-05-021-3/+9
| | | | | | | | | | | | | | | | | for_each_process_thread() is sub-optimal. All threads share the same ->mm, we can swicth to the next process once we found a thread with ->mm != NULL and ->mm != mm. Change-Id: I44d09ed1c475e5640049d87d54e643bf2beca877 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Peter Chiang <pchiang@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memcg: mm_update_next_owner() should skip kthreadsOleg Nesterov2019-05-021-6/+4
| | | | | | | | | | | | | | | | | | | | "Search through everything else" in mm_update_next_owner() can hit a kthread which adopted this "mm" via use_mm(), it should not be used as mm->owner. Add the PF_KTHREAD check. While at it, change this code to use for_each_process_thread() instead of deprecated do_each_thread/while_each_thread. Change-Id: I1f3740bb5be19deb35bfeec0f958d5ed5ba3c7ed Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Peter Chiang <pchiang@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processesMoyster2019-05-023-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling freeze_processes sets a global flag that will cause any process that calls try_to_freeze to enter the refrigerator. It skips sending a signal to the current task, but if the current task ever hits try_to_freeze, all threads will be frozen and the system will deadlock. Set a new flag, PF_SUSPEND_TASK, on the task that calls freeze_processes. The flag notifies the freezer that the thread is involved in suspend and should not be frozen. Also add a WARN_ON in thaw_processes if the caller does not have the PF_SUSPEND_TASK flag set to catch if a different task calls thaw_processes than the one that called freeze_processes, leaving a task with PF_SUSPEND_TASK permanently set on it. Threads that spawn off a task with PF_SUSPEND_TASK set (which swsusp does) will also have PF_SUSPEND_TASK set, preventing them from freezing while they are helping with suspend, but they need to be dead by the time suspend is triggered, otherwise they may run when userspace is expected to be frozen. Add a WARN_ON in thaw_processes if more than one thread has the PF_SUSPEND_TASK flag set. Change-Id: I621e70c26203a2470b3e60f3f3f5991a00762e09 Reported-and-tested-by: Michael Leun <lkml20130126@newton.leun.net> Signed-off-by: Colin Cross <ccross@android.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Moyster <oysterized@gmail.com>
* clean up process flagsCorinna Vinschen2019-05-024-3/+8
| | | | | | | | | | * Move PF_WAKE_UP_IDLE to 0x00000002 to make room for PF_SUSPEND_TASK * Drop PF_SU in favor of a bit 'task_is_su' in the task_struct bitfield which has still lots of room without changing the struct size. Change-Id: I2af053ebcbb3c41b7407560008da8150a73c8c05 Signed-off-by: Corinna Vinschen <xda@vinschen.de> Signed-off-by: Moyster <oysterized@gmail.com>
* tracing: syscall_regfunc() should not skip kernel threadsOleg Nesterov2019-05-021-3/+1
| | | | | | | | | | | | | | | | | | | | | | syscall_regfunc() ignores the kernel threads because "it has no effect", see cc3b13c1 "Don't trace kernel thread syscalls" which added this check. However, this means that a user-space task spawned by call_usermodehelper() will run without TIF_SYSCALL_TRACEPOINT if sys_tracepoint_refcount != 0. Remove this check. The unnecessary report from ret_from_fork path mentioned by cc3b13c1 is no longer possible, see See commit fb45550d76bb5 "make sure that kernel_thread() callbacks call do_exit() themselves". A kernel_thread() callback can only return and take the int_ret_from_sys_call path after do_execve() succeeds, otherwise the kernel will crash. But in this case it is no longer a kernel thread and thus is needs TIF_SYSCALL_TRACEPOINT. Link: http://lkml.kernel.org/p/20140413185938.GD20668@redhat.com Change-Id: Ic61b8b2d4f2cec27d7cf681a8b39020ed65e2e43 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* tracing: Change syscall_*regfunc() to check PF_KTHREAD and use ↵Oleg Nesterov2019-05-021-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | for_each_process_thread() 1. Remove _irqsafe from syscall_regfunc/syscall_unregfunc, read_lock(tasklist) doesn't need to disable irqs. 2. Change this code to avoid the deprecated do_each_thread() and use for_each_process_thread() (stolen from the patch from Frederic). 3. Change syscall_regfunc() to check PF_KTHREAD to skip the kernel threads, ->mm != NULL is the common mistake. Note: probably this check should be simply removed, needs another patch. [fweisbec@gmail.com: s/do_each_thread/for_each_process_thread/] Link: http://lkml.kernel.org/p/20140413185918.GC20668@redhat.com Change-Id: I878868747db7d5873f85faae52132631604fb678 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* sched: s/do_each_thread/for_each_process_thread/ in core.cOleg Nesterov2019-05-021-7/+6
| | | | | | | | | | | | | | | | | | | | | Change kernel/sched/core.c to use for_each_process_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sanjay Rao <srao@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140813191953.GA19315@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 5d07f4202c5d63b73ba1734ed38e08461a689313) Signed-off-by: Alex Shi <alex.shi@linaro.org> Change-Id: Iecb3aa3e69df0147d5c9402dcb8250bfec309ef4
* sched: Change thread_group_cputime() to use for_each_thread()Oleg Nesterov2019-05-021-8/+2
| | | | | | | | | | | | | | | | | | | | | | | Change thread_group_cputime() to use for_each_thread() instead of buggy while_each_thread(). This also makes the pid_alive() check unnecessary. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sanjay Rao <srao@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140813192000.GA19327@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit 1e4dda08b4c39b3d8f4a3ee7269d49e0200c8af8) Signed-off-by: Alex Shi <alex.shi@linaro.org> Change-Id: I5aa603f3e31275e607f039b6f037ddc630755d95
* memcg: kill CONFIG_MM_OWNEROleg Nesterov2019-05-025-14/+7
| | | | | | | | | | | | | CONFIG_MM_OWNER makes no sense. It is not user-selectable, it is only selected by CONFIG_MEMCG automatically. So we can kill this option in init/Kconfig and do s/CONFIG_MM_OWNER/CONFIG_MEMCG/ globally. Change-Id: I07980d9557cef16a102ed293bc4a8ad1f9302777 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Staging: android: binder: Ratelimit binder debug messagesChris Fries2019-05-021-1/+3
| | | | | | | | | | | | | Ratelimit the binder debug messages, since they can get spammy and flood the entire kernel log. In some cases, enabling serial console with a spammy binder error can cause a watchdog panic (and we don't have reports of this happening with serial console disabled). Bug: 17613664 Change-Id: Iecdb4c3c80ccf00c43459e93c17f5369fd55e6e7 Signed-off-by: Chris Fries <cfries@motorola.com>
* selinux: KASAN: slab-out-of-bounds in xattr_getsecuritySachin Grover2019-05-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Call trace: [<ffffff9203a8d7a8>] dump_backtrace+0x0/0x428 [<ffffff9203a8dbf8>] show_stack+0x28/0x38 [<ffffff920409bfb8>] dump_stack+0xd4/0x124 [<ffffff9203d187e8>] print_address_description+0x68/0x258 [<ffffff9203d18c00>] kasan_report.part.2+0x228/0x2f0 [<ffffff9203d1927c>] kasan_report+0x5c/0x70 [<ffffff9203d1776c>] check_memory_region+0x12c/0x1c0 [<ffffff9203d17cdc>] memcpy+0x34/0x68 [<ffffff9203d75348>] xattr_getsecurity+0xe0/0x160 [<ffffff9203d75490>] vfs_getxattr+0xc8/0x120 [<ffffff9203d75d68>] getxattr+0x100/0x2c8 [<ffffff9203d76fb4>] SyS_fgetxattr+0x64/0xa0 [<ffffff9203a83f70>] el0_svc_naked+0x24/0x28 If user get root access and calls security.selinux setxattr() with an embedded NUL on a file and then if some process performs a getxattr() on that file with a length greater than the actual length of the string, it would result in a panic. To fix this, add the actual length of the string to the security context instead of the length passed by the userspace process. Change-Id: Ie0b8bfc7c96bc12282b955fb3adf41b3c2d011cd Signed-off-by: Sachin Grover <sgrover@codeaurora.org>
* alarmtimer: Prevent overflow for relative nanosleepThomas Gleixner2019-05-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5f936e19cc0ef97dbe3a56e9498922ad5ba1edef upstream. Air Icy reported: UBSAN: Undefined behaviour in kernel/time/alarmtimer.c:811:7 signed integer overflow: 1529859276030040771 + 9223372036854775807 cannot be represented in type 'long long int' Call Trace: alarm_timer_nsleep+0x44c/0x510 kernel/time/alarmtimer.c:811 __do_sys_clock_nanosleep kernel/time/posix-timers.c:1235 [inline] __se_sys_clock_nanosleep kernel/time/posix-timers.c:1213 [inline] __x64_sys_clock_nanosleep+0x326/0x4e0 kernel/time/posix-timers.c:1213 do_syscall_64+0xb8/0x3a0 arch/x86/entry/common.c:290 alarm_timer_nsleep() uses ktime_add() to add the current time and the relative expiry value. ktime_add() has no sanity checks so the addition can overflow when the relative timeout is large enough. Use ktime_add_safe() which has the necessary sanity checks in place and limits the result to the valid range. Fixes: 9a7adcf5c6de ("timers: Posix interface for alarm-timers") Change-Id: Iee4f4d2c4fed03833643bf7fee90ad1c54b6b065 Reported-by: Team OWL337 <icytxw@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1807020926360.1595@nanos.tec.linutronix.de Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* skbuff: Fix not waking applications when errors are enqueuedVinicius Costa Gomes2019-05-021-1/+1
| | | | | | | | | | | | | | | | | | | commit 6e5d58fdc9bedd0255a8781b258f10bbdc63e975 upstream. When errors are enqueued to the error queue via sock_queue_err_skb() function, it is possible that the waiting application is not notified. Calling 'sk->sk_data_ready()' would not notify applications that selected only POLLERR events in poll() (for example). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Change-Id: Ibb67d26fefc757ff49c7e38d92ae23b6b48b9008 Reported-by: Randy E. Witt <randy.e.witt@intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Moyster <oysterized@gmail.com>
* l2tp: do not accept arbitrary socketsEric Dumazet2019-05-021-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 17cfe79a65f98abe535261856c5aef14f306dff7 upstream. syzkaller found an issue caused by lack of sufficient checks in l2tp_tunnel_create() RAW sockets can not be considered as UDP ones for instance. In another patch, we shall replace all pr_err() by less intrusive pr_debug() so that syzkaller can find other bugs faster. Acked-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: James Chapman <jchapman@katalix.com> ================================================================== BUG: KASAN: slab-out-of-bounds in setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69 dst_release: dst:00000000d53d0d0f refcnt:-1 Write of size 1 at addr ffff8801d013b798 by task syz-executor3/6242 CPU: 1 PID: 6242 Comm: syz-executor3 Not tainted 4.16.0-rc2+ #253 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x24d lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x23b/0x360 mm/kasan/report.c:412 __asan_report_store1_noabort+0x17/0x20 mm/kasan/report.c:435 setup_udp_tunnel_sock+0x3ee/0x5f0 net/ipv4/udp_tunnel.c:69 l2tp_tunnel_create+0x1354/0x17f0 net/l2tp/l2tp_core.c:1596 pppol2tp_connect+0x14b1/0x1dd0 net/l2tp/l2tp_ppp.c:707 SYSC_connect+0x213/0x4a0 net/socket.c:1640 SyS_connect+0x24/0x30 net/socket.c:1621 do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x42/0xb7 Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Change-Id: I2e4a9acf94cccee7ffd8aba1eaa878090e5780a0 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* l2tp: remove l2tp_tunnel_count and l2tp_session_countGuillaume Nault2019-05-021-10/+0
| | | | | | | | | | | commit c7fa745d988812c4dea7dbc645f025c5bfa4917e upstream. These variables have never been used. Change-Id: I1a9e09c42ea02545eb753a03ba3b870eda81be32 Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* netfilter: ipv6: fix use-after-free Write in nf_nat_ipv6_manip_pktFlorian Westphal2019-05-021-0/+4
| | | | | | | | | | | | | | commit b078556aecd791b0e5cb3a59f4c3a14273b52121 upstream. l4proto->manip_pkt() can cause reallocation of skb head so pointer to the ipv6 header must be reloaded. Change-Id: Ib9d20d8a0c62e880ed2adc6ee666654c47ceb7f9 Reported-and-tested-by: <syzbot+10005f4292fc9cc89de7@syzkaller.appspotmail.com> Fixes: 58a317f1061c89 ("netfilter: ipv6: add IPv6 NAT support") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* netfilter: IDLETIMER: be syzkaller friendlyEric Dumazet2019-05-021-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit cfc2c740533368b96e2be5e0a4e8c3cace7d9814 upstream. We had one report from syzkaller [1] First issue is that INIT_WORK() should be done before mod_timer() or we risk timer being fired too soon, even with a 1 second timer. Second issue is that we need to reject too big info->timeout to avoid overflows in msecs_to_jiffies(info->timeout * 1000), or risk looping, if result after overflow is 0. [1] WARNING: CPU: 1 PID: 5129 at kernel/workqueue.c:1444 __queue_work+0xdf4/0x1230 kernel/workqueue.c:1444 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 5129 Comm: syzkaller159866 Not tainted 4.16.0-rc1+ #230 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 panic+0x1e4/0x41c kernel/panic.c:183 __warn+0x1dc/0x200 kernel/panic.c:547 report_bug+0x211/0x2d0 lib/bug.c:184 fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178 fixup_bug arch/x86/kernel/traps.c:247 [inline] do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296 do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315 invalid_op+0x22/0x40 arch/x86/entry/entry_64.S:988 RIP: 0010:__queue_work+0xdf4/0x1230 kernel/workqueue.c:1444 RSP: 0018:ffff8801db507538 EFLAGS: 00010006 RAX: ffff8801aeb46080 RBX: ffff8801db530200 RCX: ffffffff81481404 RDX: 0000000000000100 RSI: ffffffff86b42640 RDI: 0000000000000082 RBP: ffff8801db507758 R08: 1ffff1003b6a0de5 R09: 000000000000000c R10: ffff8801db5073f0 R11: 0000000000000020 R12: 1ffff1003b6a0eb6 R13: ffff8801b1067ae0 R14: 00000000000001f8 R15: dffffc0000000000 queue_work_on+0x16a/0x1c0 kernel/workqueue.c:1488 queue_work include/linux/workqueue.h:488 [inline] schedule_work include/linux/workqueue.h:546 [inline] idletimer_tg_expired+0x44/0x60 net/netfilter/xt_IDLETIMER.c:116 call_timer_fn+0x228/0x820 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x7ee/0xb70 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2d7/0xb85 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1cc/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:541 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:829 </IRQ> RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:777 [inline] RIP: 0010:__raw_spin_unlock_irqrestore include/linux/spinlock_api_smp.h:160 [inline] RIP: 0010:_raw_spin_unlock_irqrestore+0x5e/0xba kernel/locking/spinlock.c:184 RSP: 0018:ffff8801c20173c8 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff12 RAX: dffffc0000000000 RBX: 0000000000000282 RCX: 0000000000000006 RDX: 1ffffffff0d592cd RSI: 1ffff10035d68d23 RDI: 0000000000000282 RBP: ffff8801c20173d8 R08: 1ffff10038402e47 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff8820e5c8 R13: ffff8801b1067ad8 R14: ffff8801aea7c268 R15: ffff8801aea7c278 __debug_object_init+0x235/0x1040 lib/debugobjects.c:378 debug_object_init+0x17/0x20 lib/debugobjects.c:391 __init_work+0x2b/0x60 kernel/workqueue.c:506 idletimer_tg_create net/netfilter/xt_IDLETIMER.c:152 [inline] idletimer_tg_checkentry+0x691/0xb00 net/netfilter/xt_IDLETIMER.c:213 xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:850 check_target net/ipv6/netfilter/ip6_tables.c:533 [inline] find_check_entry.isra.7+0x935/0xcf0 net/ipv6/netfilter/ip6_tables.c:575 translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:744 do_replace net/ipv6/netfilter/ip6_tables.c:1160 [inline] do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1686 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ipv6_setsockopt+0x10b/0x130 net/ipv6/ipv6_sockglue.c:927 udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2976 SYSC_setsockopt net/socket.c:1850 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1829 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 Fixes: 0902b469bd25 ("netfilter: xtables: idletimer target implementation") Change-Id: If78a45d8ed4d2470f3a4eb427237727d032f8742 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzkaller <syzkaller@googlegroups.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* cfg80211: fix station info handling bugsJohannes Berg2019-05-021-2/+1
| | | | | | | | | | | | | | | commit 5762d7d3eda25c03cc2d9d45227be3f5ab6bec9e upstream. Fix two places where the structure isn't initialized to zero, and thus can't be filled properly by the driver. Fixes: 4a4b8169501b ("cfg80211: Accept multiple RSSI thresholds for CQM") Fixes: 9930380f0bd8 ("cfg80211: implement IWRATE") Change-Id: I2cd090709e9349aa70a5196951e582a29679c4ab Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: drop change in cfg80211_cqm_rssi_update()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* sctp: verify size of a new chunk in _sctp_make_chunk()Alexey Kodanev2019-05-021-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 07f2c7ab6f8d0a7e7c5764c4e6cc9c52951b9d9c upstream. When SCTP makes INIT or INIT_ACK packet the total chunk length can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when transmitting these packets, e.g. the crash on sending INIT_ACK: [ 597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168 put:120156 head:000000007aa47635 data:00000000d991c2de tail:0x1d640 end:0xfec0 dev:<NULL> ... [ 597.976970] ------------[ cut here ]------------ [ 598.033408] kernel BUG at net/core/skbuff.c:104! [ 600.314841] Call Trace: [ 600.345829] <IRQ> [ 600.371639] ? sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.436934] skb_put+0x16c/0x200 [ 600.477295] sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.540630] ? sctp_packet_config+0x890/0x890 [sctp] [ 600.601781] ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp] [ 600.671356] ? sctp_cmp_addr_exact+0x3f/0x90 [sctp] [ 600.731482] sctp_outq_flush+0x663/0x30d0 [sctp] [ 600.788565] ? sctp_make_init+0xbf0/0xbf0 [sctp] [ 600.845555] ? sctp_check_transmitted+0x18f0/0x18f0 [sctp] [ 600.912945] ? sctp_outq_tail+0x631/0x9d0 [sctp] [ 600.969936] sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp] [ 601.041593] ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp] [ 601.104837] ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp] [ 601.175436] ? sctp_eat_data+0x1710/0x1710 [sctp] [ 601.233575] sctp_do_sm+0x182/0x560 [sctp] [ 601.284328] ? sctp_has_association+0x70/0x70 [sctp] [ 601.345586] ? sctp_rcv+0xef4/0x32f0 [sctp] [ 601.397478] ? sctp6_rcv+0xa/0x20 [sctp] ... Here the chunk size for INIT_ACK packet becomes too big, mostly because of the state cookie (INIT packet has large size with many address parameters), plus additional server parameters. Later this chunk causes the panic in skb_put_data(): skb_packet_transmit() sctp_packet_pack() skb_put_data(nskb, chunk->skb->data, chunk->skb->len); 'nskb' (head skb) was previously allocated with packet->size from u16 'chunk->chunk_hdr->length'. As suggested by Marcelo we should check the chunk's length in _sctp_make_chunk() before trying to allocate skb for it and discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN. Change-Id: Ic769f68f5dc0fee07f13b0a6415529cc2bbc4983 Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leinter@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: - Keep using WORD_ROUND() instead of SCTP_PAD4() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* netfilter: nat: cope with negative port rangePaolo Abeni2019-05-021-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit db57ccf0f2f4624b4c4758379f8165277504fbd7 upstream. syzbot reported a division by 0 bug in the netfilter nat code: divide error: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4168 Comm: syzkaller034710 Not tainted 4.16.0-rc1+ #309 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: 0018:ffff8801b2466778 EFLAGS: 00010246 RAX: 000000000000f153 RBX: ffff8801b2466dd8 RCX: ffff8801b2466c7c RDX: 0000000000000000 RSI: ffff8801b2466c58 RDI: ffff8801db5293ac RBP: ffff8801b24667d8 R08: ffff8801b8ba6dc0 R09: ffffffff88af5900 R10: ffff8801b24666f0 R11: 0000000000000000 R12: 000000002990f153 R13: 0000000000000001 R14: 0000000000000000 R15: ffff8801b2466c7c FS: 00000000017e3880(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000208fdfe4 CR3: 00000001b5340002 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: dccp_unique_tuple+0x40/0x50 net/netfilter/nf_nat_proto_dccp.c:30 get_unique_tuple+0xc28/0x1c10 net/netfilter/nf_nat_core.c:362 nf_nat_setup_info+0x1c2/0xe00 net/netfilter/nf_nat_core.c:406 nf_nat_redirect_ipv6+0x306/0x730 net/netfilter/nf_nat_redirect.c:124 redirect_tg6+0x7f/0xb0 net/netfilter/xt_REDIRECT.c:34 ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365 ip6table_nat_do_chain+0x65/0x80 net/ipv6/netfilter/ip6table_nat.c:41 nf_nat_ipv6_fn+0x594/0xa80 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:302 nf_nat_ipv6_local_fn+0x33/0x5d0 net/ipv6/netfilter/nf_nat_l3proto_ipv6.c:407 ip6table_nat_local_fn+0x2c/0x40 net/ipv6/netfilter/ip6table_nat.c:69 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook include/linux/netfilter.h:243 [inline] NF_HOOK include/linux/netfilter.h:286 [inline] ip6_xmit+0x10ec/0x2260 net/ipv6/ip6_output.c:277 inet6_csk_xmit+0x2fc/0x580 net/ipv6/inet6_connection_sock.c:139 dccp_transmit_skb+0x9ac/0x10f0 net/dccp/output.c:142 dccp_connect+0x369/0x670 net/dccp/output.c:564 dccp_v6_connect+0xe17/0x1bf0 net/dccp/ipv6.c:946 __inet_stream_connect+0x2d4/0xf00 net/ipv4/af_inet.c:620 inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:684 SYSC_connect+0x213/0x4a0 net/socket.c:1639 SyS_connect+0x24/0x30 net/socket.c:1620 do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x26/0x9b RIP: 0033:0x441c69 RSP: 002b:00007ffe50cc0be8 EFLAGS: 00000217 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 0000000000441c69 RDX: 000000000000001c RSI: 00000000208fdfe4 RDI: 0000000000000003 RBP: 00000000006cc018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000538 R11: 0000000000000217 R12: 0000000000403590 R13: 0000000000403620 R14: 0000000000000000 R15: 0000000000000000 Code: 48 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85 46 02 00 00 48 8b 45 c8 44 0f b7 20 e8 88 97 04 fd 31 d2 41 0f b7 c4 4c 89 f9 <41> f7 f6 48 c1 e9 03 48 b8 00 00 00 00 00 fc ff df 0f b6 0c 01 RIP: nf_nat_l4proto_unique_tuple+0x291/0x530 net/netfilter/nf_nat_proto_common.c:88 RSP: ffff8801b2466778 The problem is that currently we don't have any check on the configured port range. A port range == -1 triggers the bug, while other negative values may require a very long time to complete the following loop. This commit addresses the issue swapping the two ends on negative ranges. The check is performed in nf_nat_l4proto_unique_tuple() since the nft nat loads the port values from nft registers at runtime. v1 -> v2: use the correct 'Fixes' tag v2 -> v3: update commit message, drop unneeded READ_ONCE() Fixes: 5b1158e909ec ("[NETFILTER]: Add NAT support for nf_conntrack") Change-Id: I3c14b742d61a7b3a9e2fcabf40c33b9ea30e52d7 Reported-by: syzbot+8012e198bd037f4871e5@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* netfilter: x_tables: fix missing timer initialization in xt_LEDPaolo Abeni2019-05-021-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 10414014bc085aac9f787a5890b33b5605fbcfc4 upstream. syzbot reported that xt_LED may try to use the ledinternal->timer without previously initializing it: ------------[ cut here ]------------ kernel BUG at kernel/time/timer.c:958! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 1826 Comm: kworker/1:2 Not tainted 4.15.0+ #306 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: ipv6_addrconf addrconf_dad_work RIP: 0010:__mod_timer kernel/time/timer.c:958 [inline] RIP: 0010:mod_timer+0x7d6/0x13c0 kernel/time/timer.c:1102 RSP: 0018:ffff8801d24fe9f8 EFLAGS: 00010293 RAX: ffff8801d25246c0 RBX: ffff8801aec6cb50 RCX: ffffffff816052c6 RDX: 0000000000000000 RSI: 00000000fffbd14b RDI: ffff8801aec6cb68 RBP: ffff8801d24fec98 R08: 0000000000000000 R09: 1ffff1003a49fd6c R10: ffff8801d24feb28 R11: 0000000000000005 R12: dffffc0000000000 R13: ffff8801d24fec70 R14: 00000000fffbd14b R15: ffff8801af608f90 FS: 0000000000000000(0000) GS:ffff8801db500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000206d6fd0 CR3: 0000000006a22001 CR4: 00000000001606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: led_tg+0x1db/0x2e0 net/netfilter/xt_LED.c:75 ip6t_do_table+0xc2a/0x1a30 net/ipv6/netfilter/ip6_tables.c:365 ip6table_raw_hook+0x65/0x80 net/ipv6/netfilter/ip6table_raw.c:42 nf_hook_entry_hookfn include/linux/netfilter.h:120 [inline] nf_hook_slow+0xba/0x1a0 net/netfilter/core.c:483 nf_hook.constprop.27+0x3f6/0x830 include/linux/netfilter.h:243 NF_HOOK include/linux/netfilter.h:286 [inline] ndisc_send_skb+0xa51/0x1370 net/ipv6/ndisc.c:491 ndisc_send_ns+0x38a/0x870 net/ipv6/ndisc.c:633 addrconf_dad_work+0xb9e/0x1320 net/ipv6/addrconf.c:4008 process_one_work+0xbbf/0x1af0 kernel/workqueue.c:2113 worker_thread+0x223/0x1990 kernel/workqueue.c:2247 kthread+0x33c/0x400 kernel/kthread.c:238 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:429 Code: 85 2a 0b 00 00 4d 8b 3c 24 4d 85 ff 75 9f 4c 8b bd 60 fd ff ff e8 bb 57 10 00 65 ff 0d 94 9a a1 7e e9 d9 fc ff ff e8 aa 57 10 00 <0f> 0b e8 a3 57 10 00 e9 14 fb ff ff e8 99 57 10 00 4c 89 bd 70 RIP: __mod_timer kernel/time/timer.c:958 [inline] RSP: ffff8801d24fe9f8 RIP: mod_timer+0x7d6/0x13c0 kernel/time/timer.c:1102 RSP: ffff8801d24fe9f8 ---[ end trace f661ab06f5dd8b3d ]--- The ledinternal struct can be shared between several different xt_LED targets, but the related timer is currently initialized only if the first target requires it. Fix it by unconditionally initializing the timer struct. v1 -> v2: call del_timer_sync() unconditionally, too. Fixes: 268cb38e1802 ("netfilter: x_tables: add LED trigger target") Change-Id: Ib3384560e30dab47ea6a8df6a269a66586764b87 Reported-by: syzbot+10c98dc5725c6c8fc7fb@syzkaller.appspotmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> [bwh: Backported to 3.16: Keep using setup_timer()] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* netfilter: on sockopt() acquire sock lock only in the required scopePaolo Abeni2019-05-024-29/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3f34cfae1238848fd53f25e5c8fd59da57901f4b upstream. Syzbot reported several deadlocks in the netfilter area caused by rtnl lock and socket lock being acquired with a different order on different code paths, leading to backtraces like the following one: ====================================================== WARNING: possible circular locking dependency detected 4.15.0-rc9+ #212 Not tainted ------------------------------------------------------ syzkaller041579/3682 is trying to acquire lock: (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] lock_sock include/net/sock.h:1463 [inline] (sk_lock-AF_INET6){+.+.}, at: [<000000008775e4dd>] do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167 but task is already holding lock: (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (rtnl_mutex){+.+.}: __mutex_lock_common kernel/locking/mutex.c:756 [inline] __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893 mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908 rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 register_netdevice_notifier+0xad/0x860 net/core/dev.c:1607 tee_tg_check+0x1a0/0x280 net/netfilter/xt_TEE.c:106 xt_check_target+0x22c/0x7d0 net/netfilter/x_tables.c:845 check_target net/ipv6/netfilter/ip6_tables.c:538 [inline] find_check_entry.isra.7+0x935/0xcf0 net/ipv6/netfilter/ip6_tables.c:580 translate_table+0xf52/0x1690 net/ipv6/netfilter/ip6_tables.c:749 do_replace net/ipv6/netfilter/ip6_tables.c:1165 [inline] do_ip6t_set_ctl+0x370/0x5f0 net/ipv6/netfilter/ip6_tables.c:1691 nf_sockopt net/netfilter/nf_sockopt.c:106 [inline] nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115 ipv6_setsockopt+0x115/0x150 net/ipv6/ipv6_sockglue.c:928 udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 entry_SYSCALL_64_fastpath+0x29/0xa0 -> #0 (sk_lock-AF_INET6){+.+.}: lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3914 lock_sock_nested+0xc2/0x110 net/core/sock.c:2780 lock_sock include/net/sock.h:1463 [inline] do_ipv6_setsockopt.isra.8+0x3c5/0x39d0 net/ipv6/ipv6_sockglue.c:167 ipv6_setsockopt+0xd7/0x150 net/ipv6/ipv6_sockglue.c:922 udpv6_setsockopt+0x45/0x80 net/ipv6/udp.c:1422 sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2978 SYSC_setsockopt net/socket.c:1849 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1828 entry_SYSCALL_64_fastpath+0x29/0xa0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(rtnl_mutex); lock(sk_lock-AF_INET6); lock(rtnl_mutex); lock(sk_lock-AF_INET6); *** DEADLOCK *** 1 lock held by syzkaller041579/3682: #0: (rtnl_mutex){+.+.}, at: [<000000004342eaa9>] rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74 The problem, as Florian noted, is that nf_setsockopt() is always called with the socket held, even if the lock itself is required only for very tight scopes and only for some operation. This patch addresses the issues moving the lock_sock() call only where really needed, namely in ipv*_getorigdst(), so that nf_setsockopt() does not need anymore to acquire both locks. Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers") Change-Id: I5aba6f18f9b2c7b4f353f6ca2948be6720b05845 Reported-by: syzbot+a4c2dc980ac1af699b36@syzkaller.appspotmail.com Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* sctp: do not allow the v4 socket to bind a v4mapped v6 addressXin Long2019-05-021-8/+6
| | | | | | | | | | | | | | | | | | | | | | | commit c5006b8aa74599ce19104b31d322d2ea9ff887cc upstream. The check in sctp_sockaddr_af is not robust enough to forbid binding a v4mapped v6 addr on a v4 socket. The worse thing is that v4 socket's bind_verify would not convert this v4mapped v6 addr to a v4 addr. syzbot even reported a crash as the v4 socket bound a v6 addr. This patch is to fix it by doing the common sa.sa_family check first, then AF_INET check for v4mapped v6 addrs. Fixes: 7dab83de50c7 ("sctp: Support ipv6only AF_INET6 sockets.") Change-Id: I86dc74ac4606693e19a60afefd10973e2d02e882 Reported-by: syzbot+7b7b518b1228d2743963@syzkaller.appspotmail.com Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* net/sctp: Always set scope_id in sctp_inet6_skb_msgnameEric W. Biederman2019-05-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7c8a61d9ee1df0fb4747879fa67a99614eb62fec upstream. Alexandar Potapenko while testing the kernel with KMSAN and syzkaller discovered that in some configurations sctp would leak 4 bytes of kernel stack. Working with his reproducer I discovered that those 4 bytes that are leaked is the scope id of an ipv6 address returned by recvmsg. With a little code inspection and a shrewd guess I discovered that sctp_inet6_skb_msgname only initializes the scope_id field for link local ipv6 addresses to the interface index the link local address pertains to instead of initializing the scope_id field for all ipv6 addresses. That is almost reasonable as scope_id's are meaniningful only for link local addresses. Set the scope_id in all other cases to 0 which is not a valid interface index to make it clear there is nothing useful in the scope_id field. There should be no danger of breaking userspace as the stack leak guaranteed that previously meaningless random data was being returned. Fixes: 372f525b495c ("SCTP: Resync with LKSCTP tree.") Change-Id: I24fe3eb3a92fb0435dc162ef62ed3a5b44b612c3 History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Reported-by: Alexander Potapenko <glider@google.com> Tested-by: Alexander Potapenko <glider@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: - Adjust context - Add braces] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_headerXin Long2019-05-021-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 76cc0d3282d4b933fa144fa41fbc5318e0fdca24 upstream. Now in ip6gre_header before packing the ipv6 header, it skb_push t->hlen which only includes encap_hlen + tun_hlen. It means greh and inner header would be over written by ipv6 stuff and ipv6h might have no chance to set up. Jianlin found this issue when using remote any on ip6_gre, the packets he captured on gre dev are truncated: 22:50:26.210866 Out ethertype IPv6 (0x86dd), length 120: truncated-ip6 -\ 8128 bytes missing!(flowlabel 0x92f40, hlim 0, next-header Options (0) \ payload length: 8192) ::1:2000:0 > ::1:0:86dd: HBH [trunc] ip-proto-128 \ 8184 It should also skb_push ipv6hdr so that ipv6h points to the right position to set ipv6 stuff up. This patch is to skb_push hlen + sizeof(*ipv6h) and also fix some indents in ip6gre_header. Fixes: c12b395a4664 ("gre: Support GRE over IPv6") Change-Id: I2955a306d3b4c7d0378eb1745f749dbed5b4008d Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* BACKPORT: tty: Use unbound workqueue for all input workersPeter Hurley2019-05-022-2/+2
| | | | | | | | | | | | | | | | | | The commonly accepted wisdom that scheduling work on the same cpu that handled interrupt i/o benefits from cache-locality is only true if the cpu is idle (since bound kworkers are often the highest vruntime and thus the lowest priority). Measurements of scheduling via the unbound queue show lowered worst-case latency responses of up to 5x over bound workqueue, without increase in average latency or throughput. pty i/o test measurements show >3x (!) reduced total running time; tests previously taking ~8s now complete in <2.5s. Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: I949ba6dfa56a802bba7c03e7ddbc2bde37b868ba
* path_openat(): fix double fput()Al Viro2019-05-021-1/+2
| | | | | | | | | | | | | [ Upstream commit f15133df088ecadd141ea1907f2c96df67c729f0 ] path_openat() jumps to the wrong place after do_tmpfile() - it has already done path_cleanup() (as part of path_lookupat() called by do_tmpfile()), so doing that again can lead to double fput(). Change-Id: Ia74c130ae5e379b512532c0feebea871b5f73668 Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
* sched/cpuset/pm: Fix cpuset vs. suspend-resume bugsPeter Zijlstra2019-05-024-6/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 50e76632339d4655859523a39249dd95ee5e93e7 upstream. Cpusets vs. suspend-resume is _completely_ broken. And it got noticed because it now resulted in non-cpuset usage breaking too. On suspend cpuset_cpu_inactive() doesn't call into cpuset_update_active_cpus() because it doesn't want to move tasks about, there is no need, all tasks are frozen and won't run again until after we've resumed everything. But this means that when we finally do call into cpuset_update_active_cpus() after resuming the last frozen cpu in cpuset_cpu_active(), the top_cpuset will not have any difference with the cpu_active_mask and this it will not in fact do _anything_. So the cpuset configuration will not be restored. This was largely hidden because we would unconditionally create identity domains and mobile users would not in fact use cpusets much. And servers what do use cpusets tend to not suspend-resume much. An addition problem is that we'd not in fact wait for the cpuset work to finish before resuming the tasks, allowing spurious migrations outside of the specified domains. Fix the rebuild by introducing cpuset_force_rebuild() and fix the ordering with cpuset_wait_for_hotplug(). Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: deb7aa308ea2 ("cpuset: reorganize CPU / memory hotplug handling") Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: Ia40ffcf49507af1d5493d7e534e9433ca346db02
* nospec: Include <asm/barrier.h> dependencyDan Williams2019-05-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The nospec.h header expects the per-architecture header file <asm/barrier.h> to optionally define array_index_mask_nospec(). Include that dependency to prevent inadvertent fallback to the default array_index_mask_nospec() implementation. The default implementation may not provide a full mitigation on architectures that perform data value speculation. Change-Id: Ib3bb49179541ca4f187f7ec6b91603a58db358f7 Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/151881605404.17395.1341935530792574707.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* nospec: Allow index argument to have const-qualified typeRasmus Villemoes2019-05-021-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The last expression in a statement expression need not be a bare variable, quoting gcc docs The last thing in the compound statement should be an expression followed by a semicolon; the value of this subexpression serves as the value of the entire construct. and we already use that in e.g. the min/max macros which end with a ternary expression. This way, we can allow index to have const-qualified type, which will in some cases avoid the need for introducing a local copy of index of non-const qualified type. That, in turn, can prevent readers not familiar with the internals of array_index_nospec from wondering about the seemingly redundant extra variable, and I think that's worthwhile considering how confusing the whole _nospec business is. The expression _i&_mask has type unsigned long (since that is the type of _mask, and the BUILD_BUG_ONs guarantee that _i will get promoted to that), so in order not to change the type of the whole expression, add a cast back to typeof(_i). Change-Id: I1fcaea6268a40e10c67d58e906d757b5bce7e8d0 Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/151881604837.17395.10812767547837568328.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* nospec: Kill array_index_nospec_mask_check()Dan Williams2019-05-021-21/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are multiple problems with the dynamic sanity checking in array_index_nospec_mask_check(): * It causes unnecessary overhead in the 32-bit case since integer sized @index values will no longer cause the check to be compiled away like in the 64-bit case. * In the 32-bit case it may trigger with user controllable input when the expectation is that should only trigger during development of new kernel enabling. * The macro reuses the input parameter in multiple locations which is broken if someone passes an expression like 'index++' to array_index_nospec(). Change-Id: I845dbcb830b4d72cf7f0b835bfa64476d3763268 Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/151881604278.17395.6605847763178076520.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* nospec: Move array_index_nospec() parameter checking into separate macroWill Deacon2019-05-021-15/+21
| | | | | | | | | | | | | | | | | | | For architectures providing their own implementation of array_index_mask_nospec() in asm/barrier.h, attempting to use WARN_ONCE() to complain about out-of-range parameters using WARN_ON() results in a mess of mutually-dependent include files. Rather than unpick the dependencies, simply have the core code in nospec.h perform the checking for us. Change-Id: I36857abc8226fcdfd9c9014e3ffee17f9dc2cbca Signed-off-by: Will Deacon <will.deacon@arm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1517840166-15399-1-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* array_index_nospec: Sanitize speculative array de-referencesDan Williams2019-05-021-0/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | array_index_nospec() is proposed as a generic mechanism to mitigate against Spectre-variant-1 attacks, i.e. an attack that bypasses boundary checks via speculative execution. The array_index_nospec() implementation is expected to be safe for current generation CPUs across multiple architectures (ARM, x86). Based on an original implementation by Linus Torvalds, tweaked to remove speculative flows by Alexei Starovoitov, and tweaked again by Linus to introduce an x86 assembly implementation for the mask generation. Change-Id: Id28afe5e117369ab52a9400c2cb7d639630f3a08 Co-developed-by: Linus Torvalds <torvalds@linux-foundation.org> Co-developed-by: Alexei Starovoitov <ast@kernel.org> Suggested-by: Cyril Novikov <cnovikov@lynx.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: linux-arch@vger.kernel.org Cc: kernel-hardening@lists.openwall.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: gregkh@linuxfoundation.org Cc: torvalds@linux-foundation.org Cc: alan@linux.intel.com Link: https://lkml.kernel.org/r/151727414229.33451.18411580953862676575.stgit@dwillia2-desk3.amr.corp.intel.com
* shmem: fix buildMoyster2019-05-021-1/+1
|
* tmpfs: fix uninitialized return value in shmem_linkDarrick J. Wong2019-05-021-1/+1
| | | | | | | | | | | | | | | | | When we made the shmem_reserve_inode call in shmem_link conditional, we forgot to update the declaration for ret so that it always has a known value. Dan Carpenter pointed out this deficiency in the original patch. Fixes: 1062af920c07 ("tmpfs: fix link accounting when a tmpfile is linked in") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Hugh Dickins <hughd@google.com> Cc: Matej Kupljen <matej.kupljen@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 29b00e609960ae0fcff382f4c7079dd0874a5311) Change-Id: I648253008c7977fabb9d04655c11f99a536f43ca
* tmpfs: fix link accounting when a tmpfile is linked inDarrick J. Wong2019-05-021-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tmpfs has a peculiarity of accounting hard links as if they were separate inodes: so that when the number of inodes is limited, as it is by default, a user cannot soak up an unlimited amount of unreclaimable dcache memory just by repeatedly linking a file. But when v3.11 added O_TMPFILE, and the ability to use linkat() on the fd, we missed accommodating this new case in tmpfs: "df -i" shows that an extra "inode" remains accounted after the file is unlinked and the fd closed and the actual inode evicted. If a user repeatedly links tmpfiles into a tmpfs, the limit will be hit (ENOSPC) even after they are deleted. Just skip the extra reservation from shmem_link() in this case: there's a sense in which this first link of a tmpfile is then cheaper than a hard link of another file, but the accounting works out, and there's still good limiting, so no need to do anything more complicated. Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1902182134370.7035@eggly.anvils Fixes: f4e0c30c191 ("allow the temp files created by open() to be linked to") Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Matej Kupljen <matej.kupljen@gmail.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 1062af920c07f5b54cf5060fde3339da6df0cf6b) Change-Id: I194ccdf709b1a4a385e00e07f97085521bcabdc1
* /proc/iomem: only expose physical resource addresses to privileged usersLinus Torvalds2019-05-021-2/+11
| | | | | | | | | | | | | | | | | commit 51d7b120418e99d6b3bf8df9eb3cc31e8171dee4 upstream. In commit c4004b02f8e5b ("x86: remove the kernel code/data/bss resources from /proc/iomem") I was hoping to remove the phyiscal kernel address data from /proc/iomem entirely, but that had to be reverted because some system programs actually use it. This limits all the detailed resource information to properly credentialed users instead. Change-Id: I3b308d197beb828de6807ef53aaeaeee2fab8527 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mark Salyzyn <salyzyn@android.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Make file credentials available to the seqfile interfacesLinus Torvalds2019-05-022-12/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 34dbbcdbf63360661ff7bda6c5f52f99ac515f92 upstream. A lot of seqfile users seem to be using things like %pK that uses the credentials of the current process, but that is actually completely wrong for filesystem interfaces. The unix semantics for permission checking files is to check permissions at _open_ time, not at read or write time, and that is not just a small detail: passing off stdin/stdout/stderr to a suid application and making the actual IO happen in privileged context is a classic exploit technique. So if we want to be able to look at permissions at read time, we need to use the file open credentials, not the current ones. Normal file accesses can just use "f_cred" (or any of the helper functions that do that, like file_ns_capable()), but the seqfile interfaces do not have any such options. It turns out that seq_file _does_ save away the user_ns information of the file, though. Since user_ns is just part of the full credential information, replace that special case with saving off the cred pointer instead, and suddenly seq_file has all the permission information it needs. Change-Id: Ic6fc339d315f6aa5774f7f5356747aebf47cb45f Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jann Horn <jannh@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ip6mr: Fix potential Spectre v1 vulnerabilityGustavo A. R. Silva2019-05-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 69d2c86766da2ded2b70281f1bf242cb0d58a778 ] vr.mifi is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/ipv6/ip6mr.c:1845 ip6mr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap) net/ipv6/ip6mr.c:1919 ip6mr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap) Fix this by sanitizing vr.mifi before using it to index mrt->vif_table' Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Change-Id: Iac12abbdf84e1cba5c2a61e8385505c94e5f02b0 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ipv4: Fix potential Spectre v1 vulnerabilityGustavo A. R. Silva2019-05-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 5648451e30a0d13d11796574919a359025d52cce ] vr.vifi is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/ipv4/ipmr.c:1616 ipmr_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap) net/ipv4/ipmr.c:1690 ipmr_compat_ioctl() warn: potential spectre issue 'mrt->vif_table' [r] (local cap) Fix this by sanitizing vr.vifi before using it to index mrt->vif_table' Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Change-Id: I24d37ed62547451a12b9764119122fb6f97aa549 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* mremap: properly flush TLB before releasing the pageLinus Torvalds2019-05-022-6/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit eb66ae030829605d61fbef1909ce310e29f78821 upstream. This is a backport to stable 3.18.y, based on Will Deacon's 4.4.y backport. Jann Horn points out that our TLB flushing was subtly wrong for the mremap() case. What makes mremap() special is that we don't follow the usual "add page to list of pages to be freed, then flush tlb, and then free pages". No, mremap() obviously just _moves_ the page from one page table location to another. That matters, because mremap() thus doesn't directly control the lifetime of the moved page with a freelist: instead, the lifetime of the page is controlled by the page table locking, that serializes access to the entry. As a result, we need to flush the TLB not just before releasing the lock for the source location (to avoid any concurrent accesses to the entry), but also before we release the destination page table lock (to avoid the TLB being flushed after somebody else has already done something to that page). This also makes the whole "need_flush" logic unnecessary, since we now always end up flushing the TLB for every valid entry. Bug: 118836219 Reported-and-tested-by: Jann Horn <jannh@google.com> Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [will: backport to 4.4 stable] Signed-off-by: Will Deacon <will.deacon@arm.com> [ghackmann@google.com: adjust context] Signed-off-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: I653b28b6c2fd6ec00e4b0be2b3289dcab1dcc4b1 Signed-off-by: Greg Hackmann <ghackmann@google.com>
* ext4: only look at the bg_flags field if it is validTheodore Ts'o2019-05-021-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8844618d8aa7a9973e7b527d038a2a589665002c upstream. The bg_flags field in the block group descripts is only valid if the uninit_bg or metadata_csum feature is enabled. We were not consistently looking at this field; fix this. Also block group #0 must never have uninitialized allocation bitmaps, or need to be zeroed, since that's where the root inode, and other special inodes are set up. Check for these conditions and mark the file system as corrupted if they are detected. This addresses CVE-2018-10876. https://bugzilla.kernel.org/show_bug.cgi?id=199403 Bug: 116406122 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org [bwh: Backported to 3.16: - ext4_read_block_bitmap_nowait() and ext4_read_inode_bitmap() return a pointer (NULL on error) instead of an error code - Open-code sb_rdonly() - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [ghackmann@google.com: forward-port to 3.18: adjust context] Signed-off-by: Greg Hackmann <ghackmann@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: I11ed41d9fa916662f7eb010854ce4aaaf23ad99a
* Fix up non-directory creation in SGID directoriesLinus Torvalds2019-05-021-0/+6
| | | | | | | | | | | | | | | | | | | | | | | commit 0fa3ecd87848c9c93c2c828ef4c3a8ca36ce46c7 upstream. sgid directories have special semantics, making newly created files in the directory belong to the group of the directory, and newly created subdirectories will also become sgid. This is historically used for group-shared directories. But group directories writable by non-group members should not imply that such non-group members can magically join the group, so make sure to clear the sgid bit on non-directories for non-members (but remember that sgid without group execute means "mandatory locking", just to confuse things even more). Reported-by: Jann Horn <jannh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Bug: 113452403 Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Change-Id: Ib8065c0ab05ead9e9d38e23b95b6891ece4d99e5
* f2fs: fix to do sanity check with reserved blkaddr of inline inodeChao Yu2019-05-021-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Wen Xu reported in bugzilla, after image was injected with random data by fuzzing, inline inode would contain invalid reserved blkaddr, then during inline conversion, we will encounter illegal memory accessing reported by KASAN, the root cause of this is when writing out converted inline page, we will use invalid reserved blkaddr to update sit bitmap, result in accessing memory beyond sit bitmap boundary. In order to fix this issue, let's do sanity check with reserved block address of inline inode to avoid above condition. https://bugzilla.kernel.org/show_bug.cgi?id=200179 [ 1428.846352] BUG: KASAN: use-after-free in update_sit_entry+0x80/0x7f0 [ 1428.846618] Read of size 4 at addr ffff880194483540 by task a.out/2741 [ 1428.846855] CPU: 0 PID: 2741 Comm: a.out Tainted: G W 4.17.0+ #1 [ 1428.846858] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 1428.846860] Call Trace: [ 1428.846868] dump_stack+0x71/0xab [ 1428.846875] print_address_description+0x6b/0x290 [ 1428.846881] kasan_report+0x28e/0x390 [ 1428.846888] ? update_sit_entry+0x80/0x7f0 [ 1428.846898] update_sit_entry+0x80/0x7f0 [ 1428.846906] f2fs_allocate_data_block+0x6db/0xc70 [ 1428.846914] ? f2fs_get_node_info+0x14f/0x590 [ 1428.846920] do_write_page+0xc8/0x150 [ 1428.846928] f2fs_outplace_write_data+0xfe/0x210 [ 1428.846935] ? f2fs_do_write_node_page+0x170/0x170 [ 1428.846941] ? radix_tree_tag_clear+0xff/0x130 [ 1428.846946] ? __mod_node_page_state+0x22/0xa0 [ 1428.846951] ? inc_zone_page_state+0x54/0x100 [ 1428.846956] ? __test_set_page_writeback+0x336/0x5d0 [ 1428.846964] f2fs_convert_inline_page+0x407/0x6d0 [ 1428.846971] ? f2fs_read_inline_data+0x3b0/0x3b0 [ 1428.846978] ? __get_node_page+0x335/0x6b0 [ 1428.846987] f2fs_convert_inline_inode+0x41b/0x500 [ 1428.846994] ? f2fs_convert_inline_page+0x6d0/0x6d0 [ 1428.847000] ? kasan_unpoison_shadow+0x31/0x40 [ 1428.847005] ? kasan_kmalloc+0xa6/0xd0 [ 1428.847024] f2fs_file_mmap+0x79/0xc0 [ 1428.847029] mmap_region+0x58b/0x880 [ 1428.847037] ? arch_get_unmapped_area+0x370/0x370 [ 1428.847042] do_mmap+0x55b/0x7a0 [ 1428.847048] vm_mmap_pgoff+0x16f/0x1c0 [ 1428.847055] ? vma_is_stack_for_current+0x50/0x50 [ 1428.847062] ? __fsnotify_update_child_dentry_flags.part.1+0x160/0x160 [ 1428.847068] ? do_sys_open+0x206/0x2a0 [ 1428.847073] ? __fget+0xb4/0x100 [ 1428.847079] ksys_mmap_pgoff+0x278/0x360 [ 1428.847085] ? find_mergeable_anon_vma+0x50/0x50 [ 1428.847091] do_syscall_64+0x73/0x160 [ 1428.847098] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1428.847102] RIP: 0033:0x7fb1430766ba [ 1428.847103] Code: 89 f5 41 54 49 89 fc 55 53 74 35 49 63 e8 48 63 da 4d 89 f9 49 89 e8 4d 63 d6 48 89 da 4c 89 ee 4c 89 e7 b8 09 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 56 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 00 [ 1428.847162] RSP: 002b:00007ffc651d9388 EFLAGS: 00000246 ORIG_RAX: 0000000000000009 [ 1428.847167] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fb1430766ba [ 1428.847170] RDX: 0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000 [ 1428.847173] RBP: 0000000000000003 R08: 0000000000000003 R09: 0000000000000000 [ 1428.847176] R10: 0000000000008002 R11: 0000000000000246 R12: 0000000000000000 [ 1428.847179] R13: 0000000000001000 R14: 0000000000008002 R15: 0000000000000000 [ 1428.847252] Allocated by task 2683: [ 1428.847372] kasan_kmalloc+0xa6/0xd0 [ 1428.847380] kmem_cache_alloc+0xc8/0x1e0 [ 1428.847385] getname_flags+0x73/0x2b0 [ 1428.847390] user_path_at_empty+0x1d/0x40 [ 1428.847395] vfs_statx+0xc1/0x150 [ 1428.847401] __do_sys_newlstat+0x7e/0xd0 [ 1428.847405] do_syscall_64+0x73/0x160 [ 1428.847411] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1428.847466] Freed by task 2683: [ 1428.847566] __kasan_slab_free+0x137/0x190 [ 1428.847571] kmem_cache_free+0x85/0x1e0 [ 1428.847575] filename_lookup+0x191/0x280 [ 1428.847580] vfs_statx+0xc1/0x150 [ 1428.847585] __do_sys_newlstat+0x7e/0xd0 [ 1428.847590] do_syscall_64+0x73/0x160 [ 1428.847596] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1428.847648] The buggy address belongs to the object at ffff880194483300 which belongs to the cache names_cache of size 4096 [ 1428.847946] The buggy address is located 576 bytes inside of 4096-byte region [ffff880194483300, ffff880194484300) [ 1428.848234] The buggy address belongs to the page: [ 1428.848366] page:ffffea0006512000 count:1 mapcount:0 mapping:ffff8801f3586380 index:0x0 compound_mapcount: 0 [ 1428.848606] flags: 0x17fff8000008100(slab|head) [ 1428.848737] raw: 017fff8000008100 dead000000000100 dead000000000200 ffff8801f3586380 [ 1428.848931] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000 [ 1428.849122] page dumped because: kasan: bad access detected [ 1428.849305] Memory state around the buggy address: [ 1428.849436] ffff880194483400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1428.849620] ffff880194483480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1428.849804] >ffff880194483500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1428.849985] ^ [ 1428.850120] ffff880194483580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1428.850303] ffff880194483600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1428.850498] ================================================================== Bug: 113148515 Change-Id: Ie782ce3c5b469101c9e70998d3a73d3dfe1041d5 Reported-by: Wen Xu <wen.xu@gatech.edu> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* allow build_open_flags() to return an errorAl Viro2019-05-024-31/+41
| | | | | Change-Id: Ief5582ec8c4aeb4fb7bf9b2abd51b3d65ae9e81c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* do_last(): fix missing checks for LAST_BIND caseAl Viro2019-05-021-21/+3
| | | | | | | | /proc/self/cwd with O_CREAT should fail with EISDIR. /proc/self/exe, OTOH, should fail with ENOTDIR when opened with O_DIRECTORY. Change-Id: Id0e52bc3afee67f2af1277d6bea65b073cf440d6 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* media: uvcvideo: Prevent heap overflow when accessing mapped controlsGuenter Roeck2019-05-021-0/+7
| | | | | | | | | | | | | | | | | | commit 7e09f7d5c790278ab98e5f2c22307ebe8ad6e8ba upstream. The size of uvc_control_mapping is user controlled leading to a potential heap overflow in the uvc driver. This adds a check to verify the user provided size fits within the bounds of the defined buffer size. Originally-from: Richard Simmons <rssimmo@amazon.com> Change-Id: Iecff8a3172e2096379d647ba4373cbd0b35d7479 Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* HID: hiddev: fix potential Spectre v1Gustavo A. R. Silva2019-05-021-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4f65245f2d178b9cba48350620d76faa4a098841 upstream. uref->field_index, uref->usage_index, finfo.field_index and cinfo.index can be indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/hid/usbhid/hiddev.c:473 hiddev_ioctl_usage() warn: potential spectre issue 'report->field' (local cap) drivers/hid/usbhid/hiddev.c:477 hiddev_ioctl_usage() warn: potential spectre issue 'field->usage' (local cap) drivers/hid/usbhid/hiddev.c:757 hiddev_ioctl() warn: potential spectre issue 'report->field' (local cap) drivers/hid/usbhid/hiddev.c:801 hiddev_ioctl() warn: potential spectre issue 'hid->collection' (local cap) Fix this by sanitizing such structure fields before using them to index report->field, field->usage and hid->collection Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Change-Id: Iaf59ae8d06c17da1240eec8e3b558ba41592cc05 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>