aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* lowmemorykiller: account for unevictable pagesTim Murray2017-09-181-0/+1
| | | | | | | | | | | lowmemorykiller was not taking into account unevictable pages when deciding what level to kill. If significant amounts of memory were pinned, this caused lowmemorykiller to effectively stop at a much higher level than it should. bug 31255977 Change-Id: I763ecbfef8c56d65bb8f6147ae810692bd81b6e2
* staging: android: lowmemorykiller: set TIF_MEMDIE before send kill sigWeijie Yang2017-09-181-1/+1
| | | | | | | | | | | | | Set TIF_MEMDIE tsk_thread flag before send kill signal to the selected thread. This is to fit a usual code sequence and avoid potential race issue. Signed-off-by: Weijie Yang <weijie.yang@samsung.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 6bc2b856bb7c49f238914d965c0b1057ec78226e) Change-Id: I3c4869d525ce80d339ec3742382beae2ee45f76e
* lmk: remove unused codeMister Oyster2017-09-181-9/+0
|
* staging: android: lowmemorykiller: neglect swap cached pages in other_fileVinayak Menon2017-09-181-5/+2
| | | | | | | | | | | | | | | | | | | With ZRAM enabled it is observed that lowmemory killer doesn't trigger properly. swap cached pages are accounted in NR_FILE, and lowmemorykiller considers this as reclaimable and adds to other_file. But these pages can't be reclaimed unless lowmemorykiller triggers. So subtract swap pages from other_file. Signed-off-by: Vinayak Menon <vinayakm.list@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 058dbde928597e7a8bd04e28e77e5cfc4270591d) Change-Id: I217e831bbe1db830e6d61c7943e442a32a7548a1 Reverts some Mediatek customisation to lmk Signed-off-by: Mister Oyster <oysterized@gmail.com>
* lowmemorykiller: trace kill events.Martijn Coenen2017-09-182-3/+50
| | | | | | | Allows for capturing lmk kill events and their rationale. Change-Id: Ibe215db5bb9806fc550c72c0b9832c85cbd56bf6 Signed-off-by: Martijn Coenen <maco@google.com>
* drivers: mtk: remove mlog driverMister Oyster2017-09-186-984/+0
| | | | best it can do is crash the whole kernel when zram is used
* battery: mtk: remove meizu fuelgauge_dump_info log, taking stupid amount of ↵Mister Oyster2017-09-171-54/+0
| | | | space in /data, doing stupid kernel file manipulation
* defconfig: regenMister Oyster2017-09-161-0/+1
|
* ANDROID: uid_sys_stats: Fix implicit declaration of get_cmdline()Amit Pundir2017-09-161-0/+1
| | | | | | | | | Include linux/mm.h for get_cmdline() declaration. Change-Id: Icad6ef7deef4d93d92d423c96bfa61fb5d66d0b7 Fixes: Change-Id: I30083b757eaef8c61e55a213a883ce8d0c9cf2b1 ("uid_sys_stats: log task io with a debug flag") Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
* uid_sys_stats: log task io with a debug flagYang Jin2017-09-162-60/+267
| | | | | | | | | Add a hashmap inside each uid_entry to keep track of task name and io. Task full name is a combination of thread and process name. Bug: 63739275 Change-Id: I30083b757eaef8c61e55a213a883ce8d0c9cf2b1 Signed-off-by: Yang Jin <yajin@google.com>
* binder: fix redefine of READ/WRITE_ONCE in compiler.hMister Oyster2017-09-161-2/+0
|
* FROMLIST: binder: fix an ret value overrideXu YiPing2017-09-161-1/+0
| | | | | | | | | | | | (from https://patchwork.kernel.org/patch/9939409/) commit 372e3147df70 ("binder: guarantee txn complete / errors delivered in-order") incorrectly defined a local ret value. This ret value will be invalid when out of the if block Change-Id: If7bd963ac7e67d135aa949133263aac27bf15d1a Signed-off-by: Xu YiPing <xuyiping@hislicon.com> Signed-off-by: Todd Kjos <tkjos@google.com>
* FROMLIST: binder: fix memory corruption in binder_transaction binderXu YiPing2017-09-161-0/+1
| | | | | | | | | | | | | | | | | (from https://patchwork.kernel.org/patch/9939405/) commit 7a4408c6bd3e ("binder: make sure accesses to proc/thread are safe") made a change to enqueue tcomplete to thread->todo before enqueuing the transaction. However, in err_dead_proc_or_thread case, the tcomplete is directly freed, without dequeued. It may cause the thread->todo list to be corrupted. So, dequeue it before freeing. Bug: 65333488 Change-Id: Id063a4db18deaa634f4d44aa6ebca47bea32537a Signed-off-by: Xu YiPing <xuyiping@hisilicon.com> Signed-off-by: Todd Kjos <tkjos@google.com>
* binder: make FIFO inheritance a per-context optionTim Murray2017-09-162-1/+37
| | | | | | | | | | | | | Add a new ioctl to binder to control whether FIFO inheritance should happen. In particular, hwbinder should inherit FIFO priority from callers, but standard binder threads should not. Test: boots bug 36516194 Signed-off-by: Tim Murray <timmurray@google.com> Change-Id: I8100c4364b7d15d1bf00a8ca5c286e4d4b23ce85
* drivers: merged Android Binder from 4.9Lukas06102017-09-169-1596/+3826
| | | | | Change-Id: I857ef86b2d502293fb8c37398383dceaa21dd29f Signed-off-by: Mister Oyster <oysterized@gmail.com>
* sensor: fix memory leak issuehongxu.zhao2017-09-161-1/+1
| | | | | | | | | | | | [Detail] dat initialize to 0 [Solution] Change-Id: Ib539e9624b1b8153eda8dd8f7ce55cb67052be59 CR-Id: ALPS03288635 Feature: Others Signed-off-by: hongxu.zhao <hongxu.zhao@mediatek.com> (cherry picked from commit ba50a5f9d3254520dda3a70db87a35401e4e14ac)
* display: fbconfig use after freeQinglong Chai2017-09-161-0/+6
| | | | | | | | | | | [Detail] add mutex protect list_add and list_del to avoid use after free Change-Id: Ic7d02a5b97955eaee4d3684e13a4a67f3349b42b Signed-off-by: Qinglong Chai <qinglong.chai@mediatek.com> CR-Id: ALPS03275524 Feature: disp
* Revert to dynamic sync control v1.5Mister Oyster2017-09-154-144/+182
| | | | | This reverts commit 6e4b043748ce83d6c7ba6dbf9ba50bd857d659d6. This reverts commit 1abef3b2cf1b835192ba2484e42f4a1dbba26807.
* ANDROID: fiq_debugger: Fix minor bug in codeGreg Kaiser2017-09-141-1/+1
| | | | | | | | | | | We fix a typo in the code which had us comparing a pointer instead of the value which was being pointed to. This turns out to be a relatively benign bug, as we'd incorrectly pass in the empty string instead of NULL to the function, and the function can handle both. But we fix it so the code is clearly doing what we intend. Signed-off-by: Greg Kaiser <gkaiser@google.com> Change-Id: Ib059819775a3bebca357d4ce684be779853156e3
* drivers: cpufreq: checks to avoid kernel crash in cpufreq_interactivegaurav jindal2017-09-141-1/+2
| | | | | | | | | | | | In cpufreq_governor_interactive, driver throws warning with WARN_ON for !tunables and event != CPUFREQ_GOV_POLICY_INIT. In case when tunables is NULL for event other than CPUFREQ_GOV_POLICY_INIT, kernel will crash as there is no safe check available before accessing tunables. So to handle such case and avoid the kernel crash, return -EINVAL if WARN_ON returns TRUE. Change-Id: I7a3a22d58e3c8a315a1cc1d31143649dc8807dee Signed-off-by: gaurav jindal <gauravjindal1104@gmail.com>
* lib: vsprintf: whitelist stack tracesDave Weinstein2017-09-143-4/+4
| | | | | | | | | Use the %pP functionality to explicitly allow kernel pointers to be logged for stack traces BUG: 30368199 Change-Id: I495915465565293e9e4da5aa28fbd1d14538d99b Signed-off-by: Dave Weinstein <olorin@google.com>
* ion: Fix permissions on source fileAlex Naidis2017-09-141-0/+0
| | | | | | | | | | | The source file ion.c should not be executable. This patch resets the permissions to "644". Signed-off-by: Alex Naidis <alex.naidis@linux.com> Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
* usb: gadget: f_fs: Increase EP_ALLOC ioctl numberJerry Zhang2017-09-141-1/+1
| | | | | | | | | Prevent conflict with possible new upstream ioctls before it itself is upstreamed. Test: None Change-Id: I10cbc01c25f920a626ea7559e8ca80ee08865333 Signed-off-by: Jerry Zhang <zhangjerry@google.com>
* usb: gadget: f_fs: Add ioctl for allocating endpoint buffers.Jerry Zhang2017-09-142-6/+46
| | | | | | | | | | | This creates an ioctl named FUNCTIONFS_ENDPOINT_ALLOC which will preallocate buffers for a given size. Any reads/writes on that endpoint below that size will use those buffers instead of allocating their own. If the endpoint is not active, the buffer will not be allocated until it becomes active. Change-Id: I4da517620ed913161ea9e21a31f6b92c9a012b44 Signed-off-by: Jerry Zhang <zhangjerry@google.com>
* usb: gadget: f_fs: add ioctl returning ep descriptorRobert Baldyga2017-09-142-0/+29
| | | | | | | | | | | | This patch introduces ioctl named FUNCTIONFS_ENDPOINT_DESC, which returns endpoint descriptor to userspace. It works only if function is active. Signed-off-by: Robert Baldyga <r.baldyga@samsung.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Jerry Zhang <zhangjerry@google.com> Change-Id: I55987bf0c6744327f7763b567b5a2b39c50d18e6
* mm: Fix incorrect type conversion for size during dma allocationRohit Vaswani2017-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | This was found during userspace fuzzing test when a large size allocation is made from ion [<ffffffc00008a098>] show_stack+0x10/0x1c [<ffffffc00119c390>] dump_stack+0x74/0xc8 [<ffffffc00020d9a0>] kasan_report_error+0x2b0/0x408 [<ffffffc00020dbd4>] kasan_report+0x34/0x40 [<ffffffc00020cfec>] __asan_storeN+0x15c/0x168 [<ffffffc00020d228>] memset+0x20/0x44 [<ffffffc00009b730>] __dma_alloc_coherent+0x114/0x18c [<ffffffc00009c6e8>] __dma_alloc_noncoherent+0xbc/0x19c [<ffffffc000c2b3e0>] ion_cma_allocate+0x178/0x2f0 [<ffffffc000c2b750>] ion_secure_cma_allocate+0xdc/0x190 [<ffffffc000c250dc>] ion_alloc+0x264/0xb88 [<ffffffc000c25e94>] ion_ioctl+0x1f4/0x480 [<ffffffc00022f650>] do_vfs_ioctl+0x67c/0x764 [<ffffffc00022f790>] SyS_ioctl+0x58/0x8c Change-Id: Idc9c19977a8cc62c7d092f689d30368704b400bc Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
* fs: f2fs: remove duplicate lib llist helper from f2fs since it was merged ↵Mister Oyster2017-09-131-22/+0
| | | | https://github.com/Moyster/android_kernel_m2note/commit/cb0f38652f775a21e09939cac3031ffe7417c563
* ANDROID: sdcardfs: Add missing breakDaniel Rosenberg2017-09-131-0/+1
| | | | | | Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 63245673 Change-Id: I5fc596420301045895e5a9a7e297fd05434babf9
* ANDROID: Sdcardfs: Move gid derivation under flagDaniel Rosenberg2017-09-135-4/+20
| | | | | | | | | This moves the code to adjust the gid/uid of lower filesystem files under the mount flag derive_gid. Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: I44eaad4ef67c7fcfda3b6ea3502afab94442610c Bug: 63245673
* ANDROID: mnt: Fix freeing of mount dataDaniel Rosenberg2017-09-131-3/+1
| | | | | | | | | Fix double free on error paths Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: I1c25a175e87e5dd5cafcdcf9d78bf4c0dc3f88ef Bug: 65386954 Fixes: aa6d3ace42f9 ("mnt: Add filesystem private data to mount points")
* Bluetooth: Properly check L2CAP config option output buffer lengthBen Seri2017-09-131-37/+43
| | | | | | | | | | Validate the output buffer length for L2CAP config requests and responses to avoid overflowing the stack buffer used for building the option blocks. Cc: stable@vger.kernel.org Signed-off-by: Ben Seri <ben@armis.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* llists: move llist_reverse_order from raid5 to llist.cChristoph Hellwig2017-09-112-0/+24
| | | | | | | | | | | | | | | | | Make this useful helper available for other users. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Conflicts: drivers/md/raid5.c Change-Id: Ibfc31e7289ffe9bda511c88543bc2deb70a4691b Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
* perf/core: Fix group {cpu,task} validationMark Rutland2017-09-111-20/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 64aee2a965cf2954a038b5522f11d2cd2f0f8f3e upstream. Regardless of which events form a group, it does not make sense for the events to target different tasks and/or CPUs, as this leaves the group inconsistent and impossible to schedule. The core perf code assumes that these are consistent across (successfully intialised) groups. Core perf code only verifies this when moving SW events into a HW context. Thus, we can violate this requirement for pure SW groups and pure HW groups, unless the relevant PMU driver happens to perform this verification itself. These mismatched groups subsequently wreak havoc elsewhere. For example, we handle watchpoints as SW events, and reserve watchpoint HW on a per-CPU basis at pmu::event_init() time to ensure that any event that is initialised is guaranteed to have a slot at pmu::add() time. However, the core code only checks the group leader's cpu filter (via event_filter_match()), and can thus install follower events onto CPUs violating thier (mismatched) CPU filters, potentially installing them into a CPU without sufficient reserved slots. This can be triggered with the below test case, resulting in warnings from arch backends. #define _GNU_SOURCE #include <linux/hw_breakpoint.h> #include <linux/perf_event.h> #include <sched.h> #include <stdio.h> #include <sys/prctl.h> #include <sys/syscall.h> #include <unistd.h> static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags); } char watched_char; struct perf_event_attr wp_attr = { .type = PERF_TYPE_BREAKPOINT, .bp_type = HW_BREAKPOINT_RW, .bp_addr = (unsigned long)&watched_char, .bp_len = 1, .size = sizeof(wp_attr), }; int main(int argc, char *argv[]) { int leader, ret; cpu_set_t cpus; /* * Force use of CPU0 to ensure our CPU0-bound events get scheduled. */ CPU_ZERO(&cpus); CPU_SET(0, &cpus); ret = sched_setaffinity(0, sizeof(cpus), &cpus); if (ret) { printf("Unable to set cpu affinity\n"); return 1; } /* open leader event, bound to this task, CPU0 only */ leader = perf_event_open(&wp_attr, 0, 0, -1, 0); if (leader < 0) { printf("Couldn't open leader: %d\n", leader); return 1; } /* * Open a follower event that is bound to the same task, but a * different CPU. This means that the group should never be possible to * schedule. */ ret = perf_event_open(&wp_attr, 0, 1, leader, 0); if (ret < 0) { printf("Couldn't open mismatched follower: %d\n", ret); return 1; } else { printf("Opened leader/follower with mismastched CPUs\n"); } /* * Open as many independent events as we can, all bound to the same * task, CPU0 only. */ do { ret = perf_event_open(&wp_attr, 0, 0, -1, 0); } while (ret >= 0); /* * Force enable/disble all events to trigger the erronoeous * installation of the follower event. */ printf("Opened all events. Toggling..\n"); for (;;) { prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0); prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0); } return 0; } Fix this by validating this requirement regardless of whether we're moving events. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Zhou Chengming <zhouchengming1@huawei.com> Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
* mm: Fix incorrect type conversion for size during dma allocationMaggie White2017-09-112-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | This was found during userspace fuzzing test when a large size allocation is made from ion [<ffffffc00008a098>] show_stack+0x10/0x1c [<ffffffc00119c390>] dump_stack+0x74/0xc8 [<ffffffc00020d9a0>] kasan_report_error+0x2b0/0x408 [<ffffffc00020dbd4>] kasan_report+0x34/0x40 [<ffffffc00020cfec>] __asan_storeN+0x15c/0x168 [<ffffffc00020d228>] memset+0x20/0x44 [<ffffffc00009b730>] __dma_alloc_coherent+0x114/0x18c [<ffffffc00009c6e8>] __dma_alloc_noncoherent+0xbc/0x19c [<ffffffc000c2b3e0>] ion_cma_allocate+0x178/0x2f0 [<ffffffc000c2b750>] ion_secure_cma_allocate+0xdc/0x190 [<ffffffc000c250dc>] ion_alloc+0x264/0xb88 [<ffffffc000c25e94>] ion_ioctl+0x1f4/0x480 [<ffffffc00022f650>] do_vfs_ioctl+0x67c/0x764 [<ffffffc00022f790>] SyS_ioctl+0x58/0x8c Bug: 38195738 Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org> Signed-off-by: Maggie White <maggiewhite@google.com> Change-Id: I6b1a0a3eaec10500cd4e73290efad4023bc83da5
* ALSA: core: Fix unexpected error at replacing user TLVTakashi Iwai2017-09-111-1/+1
| | | | | | | | | | | | | | | | | | commit 88c54cdf61f508ebcf8da2d819f5dfc03e954d1d upstream. When user tries to replace the user-defined control TLV, the kernel checks the change of its content via memcmp(). The problem is that the kernel passes the return value from memcmp() as is. memcmp() gives a non-zero negative value depending on the comparison result, and this shall be recognized as an error code. The patch covers that corner-case, return 1 properly for the changed TLV. Fixes: 8aa9b586e420 ("[ALSA] Control API - more robust TLV implementation") Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
* cifs: Fix df output for users with quota limitsSachin Prabhu2017-09-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 42bec214d8bd432be6d32a1acb0a9079ecd4d142 upstream. The df for a SMB2 share triggers a GetInfo call for FS_FULL_SIZE_INFORMATION. The values returned are used to populate struct statfs. The problem is that none of the information returned by the call contains the total blocks available on the filesystem. Instead we use the blocks available to the user ie. quota limitation when filling out statfs.f_blocks. The information returned does contain Actual free units on the filesystem and is used to populate statfs.f_bfree. For users with quota enabled, it can lead to situations where the total free space reported is more than the total blocks on the system ending up with df reports like the following # df -h /mnt/a Filesystem Size Used Avail Use% Mounted on //192.168.22.10/a 2.5G -2.3G 2.5G - /mnt/a To fix this problem, we instead populate both statfs.f_bfree with the same value as statfs.f_bavail ie. CallerAvailableAllocationUnits. This is similar to what is done already in the code for cifs and df now reports the quota information for the user used to mount the share. # df --si /mnt/a Filesystem Size Used Avail Use% Mounted on //192.168.22.10/a 2.7G 101M 2.6G 4% /mnt/a Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Pierguido Lambri <plambri@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
* hrtimer: Prevent enqueue of hrtimer on dead CPUMichael Bohan2017-09-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Date Wed, 10 Apr 2013 14:07:48 -0700 When switching the hrtimer cpu_base, we briefly allow for preemption to become enabled by unlocking the cpu_base lock. During this time, the CPU corresponding to the new cpu_base that was selected may in fact go offline. In this scenario, the hrtimer is enqueued to a CPU that's not online, and therefore it never fires. As an example, consider this example: CPU #0 CPU #1 ---- ---- ... hrtimer_start() lock_hrtimer_base() switch_hrtimer_base() cpu = hrtimer_get_target() -> 1 spin_unlock(&cpu_base->lock) <migrate thread to CPU #0> <offline> spin_lock(&new_base->lock) this_cpu = 0 cpu != this_cpu enqueue_hrtimer(cpu_base #1) To prevent this scenario, verify that the CPU corresponding to the new cpu_base is indeed online before selecting it in hrtimer_switch_base(). If it's not online, fallback to using the base of the current CPU. Change-Id: I3aaf5b806a25d5a8b96d6ccea7a042d2718091f7 Signed-off-by: Michael Bohan <mbohan@codeaurora.org>
* hrtimer: Consider preemption when migrating hrtimer cpu_basesMichael Bohan2017-09-111-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Date Wed, 10 Apr 2013 14:07:47 -0700 When switching to a new cpu_base in switch_hrtimer_base(), we briefly enable preemption by unlocking the cpu_base lock in two places. During this interval it's possible for the running thread to be swapped to a different CPU. Consider the following example: CPU #0 CPU #1 ---- ---- hrtimer_start() ... lock_hrtimer_base() switch_hrtimer_base() this_cpu = 0; target_cpu_base = 0; raw_spin_unlock(&cpu_base->lock) <migrate to CPU 1> ... this_cpu == 0 cpu == this_cpu timer->base = CPU #0 timer->base != LOCAL_CPU Since the cached this_cpu is no longer accurate, we'll skip the hrtimer_check_target() check. Once we eventually go to program the hardware, we'll decide not to do so since it knows the real CPU that we're running on is not the same as the chosen base. As a consequence, we may end up missing the hrtimer's deadline. Fix this by updating the local CPU number each time we retake a cpu_base lock in switch_hrtimer_base(). Another possibility is to disable preemption across the whole of switch_hrtimer_base. This looks suboptimal since preemption would be disabled while waiting for lock(s). Change-Id: I46be5cf3069f8e6683ad8fff0841949bdb801684 Signed-off-by: Michael Bohan <mbohan@codeaurora.org>
* sched: Micro-optimize the smart wake-affine logicPeter Zijlstra2017-09-113-2/+8
| | | | | | | | | | | | | | | | | | | | | Smart wake-affine is using node-size as the factor currently, but the overhead of the mask operation is high. Thus, this patch introduce the 'sd_llc_size' percpu variable, which will record the highest cache-share domain size, and make it to be the new factor, in order to reduce the overhead and make it more reasonable. Tested-by: Davidlohr Bueso <davidlohr.bueso@hp.com> Tested-by: Michael Wang <wangyun@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Michael Wang <wangyun@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Link: http://lkml.kernel.org/r/51D5008E.6030102@linux.vnet.ibm.com [ Tidied up the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Change-Id: I8fb3ceac1e6944db078932172c99d544a4e304e4 Signed-off-by: Paul Reioux <reioux@gmail.com>
* sched: Implement smarter wake-affine logicMichael Wang2017-09-112-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The wake-affine scheduler feature is currently always trying to pull the wakee close to the waker. In theory this should be beneficial if the waker's CPU caches hot data for the wakee, and it's also beneficial in the extreme ping-pong high context switch rate case. Testing shows it can benefit hackbench up to 15%. However, the feature is somewhat blind, from which some workloads such as pgbench suffer. It's also time-consuming algorithmically. Testing shows it can damage pgbench up to 50% - far more than the benefit it brings in the best case. So wake-affine should be smarter and it should realize when to stop its thankless effort at trying to find a suitable CPU to wake on. This patch introduces 'wakee_flips', which will be increased each time the task flips (switches) its wakee target. So a high 'wakee_flips' value means the task has more than one wakee, and the bigger the number, the higher the wakeup frequency. Now when making the decision on whether to pull or not, pay attention to the wakee with a high 'wakee_flips', pulling such a task may benefit the wakee. Also imply that the waker will face cruel competition later, it could be very cruel or very fast depends on the story behind 'wakee_flips', waker therefore suffers. Furthermore, if waker also has a high 'wakee_flips', that implies that multiple tasks rely on it, then waker's higher latency will damage all of them, so pulling wakee seems to be a bad deal. Thus, when 'waker->wakee_flips / wakee->wakee_flips' becomes higher and higher, the cost of pulling seems to be worse and worse. The patch therefore helps the wake-affine feature to stop its pulling work when: wakee->wakee_flips > factor && waker->wakee_flips > (factor * wakee->wakee_flips) The 'factor' here is the number of CPUs in the current CPU's NUMA node, so a bigger node will lead to more pulling since the trial becomes more severe. After applying the patch, pgbench shows up to 40% improvements and no regressions. Tested with 12 cpu x86 server and tip 3.10.0-rc7. The percentages in the final column highlight the areas with the biggest wins, all other areas improved as well: pgbench base smart | db_size | clients | tps | | tps | +---------+---------+-------+ +-------+ | 22 MB | 1 | 10598 | | 10796 | | 22 MB | 2 | 21257 | | 21336 | | 22 MB | 4 | 41386 | | 41622 | | 22 MB | 8 | 51253 | | 57932 | | 22 MB | 12 | 48570 | | 54000 | | 22 MB | 16 | 46748 | | 55982 | +19.75% | 22 MB | 24 | 44346 | | 55847 | +25.93% | 22 MB | 32 | 43460 | | 54614 | +25.66% | 7484 MB | 1 | 8951 | | 9193 | | 7484 MB | 2 | 19233 | | 19240 | | 7484 MB | 4 | 37239 | | 37302 | | 7484 MB | 8 | 46087 | | 50018 | | 7484 MB | 12 | 42054 | | 48763 | | 7484 MB | 16 | 40765 | | 51633 | +26.66% | 7484 MB | 24 | 37651 | | 52377 | +39.11% | 7484 MB | 32 | 37056 | | 51108 | +37.92% | 15 GB | 1 | 8845 | | 9104 | | 15 GB | 2 | 19094 | | 19162 | | 15 GB | 4 | 36979 | | 36983 | | 15 GB | 8 | 46087 | | 49977 | | 15 GB | 12 | 41901 | | 48591 | | 15 GB | 16 | 40147 | | 50651 | +26.16% | 15 GB | 24 | 37250 | | 52365 | +40.58% | 15 GB | 32 | 36470 | | 50015 | +37.14% Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/51D50057.9000809@linux.vnet.ibm.com [ Improved the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Change-Id: I70018a00435ea795121b70a576d74bbbd00b7464 Signed-off-by: Paul Reioux <reioux@gmail.com>
* sched: reinitialize rq->next_balance when a CPU is hot-addedPaul Walmsley2017-09-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reinitialize rq->next_balance when a CPU is hot-added. Otherwise, scheduler domain rebalancing may be skipped if rq->next_balance was set to a future time when the CPU was last active, and the newly-re-added CPU is in idle_balance(). As a result, the newly-re-added CPU will remain idle with no tasks scheduled until the softlockup watchdog runs - potentially 4 seconds later. This can waste energy and reduce performance. This behavior can be observed in some SoC kernels, which use CPU hotplug to dynamically remove and add CPUs in response to load. In one case that triggered this behavior, 0. the system started with all cores enabled, running multi-threaded CPU-bound code; 1. the system entered some single-threaded code; 2. a CPU went idle and was hot-removed; 3. the system started executing a multi-threaded CPU-bound task; 4. the CPU from event 2 was re-added, to respond to the load. The time interval between events 2 and 4 was approximately 300 milliseconds. Of course, ideally CPU hotplug would not be used in this manner, but this patch does appear to fix a real bug. Nvidia folks: this patch is submitted as at least a partial fix for bug 1243368 ("[sched] Load-balancing not happening correctly after cores brought online") Change-Id: Iabac21e110402bb581b7db40c42babc951d378d0 Signed-off-by: Paul Walmsley <pwalmsley@nvidia.com> Cc: Peter Boonstoppel <pboonstoppel@nvidia.com> Reviewed-on: http://git-master/r/208927 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Peter Zu <pzu@nvidia.com> Reviewed-by: Peter Boonstoppel <pboonstoppel@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Peter Zu <pzu@nvidia.com> Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com> Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
* PM / suspend: Remove unnecessary !!Viresh Kumar2017-09-111-1/+1
| | | | | | | | | | Double ! or !! are normally required to get 0 or 1 out of a expression. A comparision always returns 0 or 1 and hence there is no need to apply double ! over it again. Change-Id: Ia705389240381dd5cb04b81566021ea29af6b1d0 Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* fbdev: core: Initialise structure to prevent kernel information leakKrishna Manikandan2017-09-081-0/+7
| | | | | | | | The structure fix is initialised before its usage to prevent kernel information leak during copy_to_user. Change-Id: Ice4da4c9bd6095a4387e1d16cb20ca474accb9dc Signed-off-by: Krishna Manikandan <mkrishn@codeaurora.org>
* FROMLIST: f2fs: introduce discard_granularity sysfs entryChao Yu2017-09-084-15/+123
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (url: https://patchwork.kernel.org/patch/9876921/) Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables f2fs to issue 4K size discard in real-time discard mode. However, issuing smaller discard may cost more lifetime but releasing less free space in flash device. Since f2fs has ability of separating hot/cold data and garbage collection, we can expect that small-sized invalid region would expand soon with OPU, deletion or garbage collection on valid datas, so it's better to delay or skip issuing smaller size discards, it could help to reduce overmuch consumption of IO bandwidth and lifetime of flash storage. This patch makes f2fs selectng 64K size as its default minimal granularity, and issue discard with the size which is not smaller than minimal granularity. Also it exposes discard granularity as sysfs entry for configuration in different scenario. Jaegeuk Kim: We must issue all the accumulated discard commands when fstrim is called. So, I've added pend_list_tag[] to indicate whether we should issue the commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them. P_TRIM is set once at a time, given fstrim trigger. In addition, issue_discard_thread is calling too much due to the number of discard commands remaining in the pending list. I added a timer to control it likewise gc_thread. Change-Id: Ia90dd686c25cb27f144137ea3c9bcc1c943a9aea Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* FROMLIST: f2fs: give a try to do atomic write in -ENOMEM caseJaegeuk Kim2017-09-081-2/+6
| | | | | | | | | | It'd be better to retry writing atomic pages when we get -ENOMEM. (url https://www.spinics.net/lists/linux-fsdevel/msg113844.html) Bug: 63260873 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* FROMLIST: f2fs: use IPU for cold filesJaegeuk Kim2017-09-081-0/+4
| | | | | | | | | | | | (url: https://patchwork.kernel.org/patch/9886419/) We expect cold files write data sequentially, but sometimes some of small data can be updated, which incurs fragmentation. Let's avoid that. Change-Id: Ib4a8db92e05bc88b1c7809707078efd249421045 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* FROMLIST: f2fs: make background threads of f2fs being aware of freezingChao Yu2017-09-082-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (url: https://patchwork.kernel.org/patch/9857935/) When ->freeze_fs is called from lvm for doing snapshot, it needs to make sure there will be no more changes in filesystem's data, however, previously, background threads like GC thread wasn't aware of freezing, so in environment with active background threads, data of snapshot becomes unstable. This patch fixes this issue by adding sb_{start,end}_intwrite in below background threads: - GC thread - flush thread - discard thread Note that, don't use sb_start_intwrite() in gc_thread_func() due to: generic/241 reports below bug: ====================================================== WARNING: possible circular locking dependency detected 4.13.0-rc1+ #32 Tainted: G O ------------------------------------------------------ f2fs_gc-250:0/22186 is trying to acquire lock: (&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs] but task is already holding lock: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sb_internal#2){++++.-}: __lock_acquire+0x405/0x7b0 lock_acquire+0xae/0x220 __sb_start_write+0x11d/0x1f0 f2fs_evict_inode+0x2d6/0x4e0 [f2fs] evict+0xa8/0x170 iput+0x1fb/0x2c0 f2fs_sync_inode_meta+0x3f/0xf0 [f2fs] write_checkpoint+0x1b1/0x750 [f2fs] f2fs_sync_fs+0x85/0x1b0 [f2fs] f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs] f2fs_sync_file+0x34/0x40 [f2fs] vfs_fsync_range+0x4a/0xa0 do_fsync+0x3c/0x60 SyS_fdatasync+0x15/0x20 do_fast_syscall_32+0xa1/0x1b0 entry_SYSENTER_32+0x4c/0x7b -> #1 (&sbi->cp_mutex){+.+...}: __lock_acquire+0x405/0x7b0 lock_acquire+0xae/0x220 __mutex_lock+0x4f/0x830 mutex_lock_nested+0x25/0x30 write_checkpoint+0x2f/0x750 [f2fs] f2fs_sync_fs+0x85/0x1b0 [f2fs] sync_filesystem+0x67/0x80 generic_shutdown_super+0x27/0x100 kill_block_super+0x22/0x50 kill_f2fs_super+0x3a/0x40 [f2fs] deactivate_locked_super+0x3d/0x70 deactivate_super+0x40/0x60 cleanup_mnt+0x39/0x70 __cleanup_mnt+0x10/0x20 task_work_run+0x69/0x80 exit_to_usermode_loop+0x57/0x92 do_fast_syscall_32+0x18c/0x1b0 entry_SYSENTER_32+0x4c/0x7b -> #0 (&sbi->gc_mutex){+.+...}: validate_chain.isra.36+0xc50/0xdb0 __lock_acquire+0x405/0x7b0 lock_acquire+0xae/0x220 __mutex_lock+0x4f/0x830 mutex_lock_nested+0x25/0x30 f2fs_sync_fs+0x7b/0x1b0 [f2fs] f2fs_balance_fs_bg+0xb9/0x200 [f2fs] gc_thread_func+0x302/0x4a0 [f2fs] kthread+0xe9/0x120 ret_from_fork+0x19/0x24 other info that might help us debug this: Chain exists of: &sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sb_internal#2); lock(&sbi->cp_mutex); lock(sb_internal#2); lock(&sbi->gc_mutex); *** DEADLOCK *** 1 lock held by f2fs_gc-250:0/22186: #0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs] stack backtrace: CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 Call Trace: dump_stack+0x5f/0x92 print_circular_bug+0x1b3/0x1bd validate_chain.isra.36+0xc50/0xdb0 ? __this_cpu_preempt_check+0xf/0x20 __lock_acquire+0x405/0x7b0 lock_acquire+0xae/0x220 ? f2fs_sync_fs+0x7b/0x1b0 [f2fs] __mutex_lock+0x4f/0x830 ? f2fs_sync_fs+0x7b/0x1b0 [f2fs] mutex_lock_nested+0x25/0x30 ? f2fs_sync_fs+0x7b/0x1b0 [f2fs] f2fs_sync_fs+0x7b/0x1b0 [f2fs] f2fs_balance_fs_bg+0xb9/0x200 [f2fs] gc_thread_func+0x302/0x4a0 [f2fs] ? preempt_schedule_common+0x2f/0x4d ? f2fs_gc+0x540/0x540 [f2fs] kthread+0xe9/0x120 ? f2fs_gc+0x540/0x540 [f2fs] ? kthread_create_on_node+0x30/0x30 ret_from_fork+0x19/0x24 The deadlock occurs in below condition: GC Thread Thread B - sb_start_intwrite - f2fs_sync_file - f2fs_sync_fs - mutex_lock(&sbi->gc_mutex) - write_checkpoint - block_operations - f2fs_sync_inode_meta - iput - sb_start_intwrite - mutex_lock(&sbi->gc_mutex) Fix this by altering sb_start_intwrite to sb_start_write_trylock. Change-Id: Ibea8cff73d684e5aebc950f29ef4d611fc10df76 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* defconfig: disable config_have_xlog_featureMister Oyster2017-09-051-1/+1
|
* defconfig: disable CONFIG_MTK_COMBO_DISABLE_5G_FOR_P2PMister Oyster2017-09-051-1/+1
|
* Fix CVE-2012-6703 (integer overflow in ALSA subsystem)Tobias Tefke2017-09-051-0/+5
| | | | Change-Id: I995b152a3766ebb8faec244849d90d7d2bd5c672