| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
| |
lowmemorykiller was not taking into account unevictable pages when
deciding what level to kill. If significant amounts of memory were
pinned, this caused lowmemorykiller to effectively stop at a much higher
level than it should.
bug 31255977
Change-Id: I763ecbfef8c56d65bb8f6147ae810692bd81b6e2
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Set TIF_MEMDIE tsk_thread flag before send kill signal to the
selected thread. This is to fit a usual code sequence and avoid
potential race issue.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6bc2b856bb7c49f238914d965c0b1057ec78226e)
Change-Id: I3c4869d525ce80d339ec3742382beae2ee45f76e
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With ZRAM enabled it is observed that lowmemory killer
doesn't trigger properly. swap cached pages are
accounted in NR_FILE, and lowmemorykiller considers
this as reclaimable and adds to other_file. But these
pages can't be reclaimed unless lowmemorykiller triggers.
So subtract swap pages from other_file.
Signed-off-by: Vinayak Menon <vinayakm.list@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 058dbde928597e7a8bd04e28e77e5cfc4270591d)
Change-Id: I217e831bbe1db830e6d61c7943e442a32a7548a1
Reverts some Mediatek customisation to lmk
Signed-off-by: Mister Oyster <oysterized@gmail.com>
|
| |
|
|
|
|
|
| |
Allows for capturing lmk kill events and their rationale.
Change-Id: Ibe215db5bb9806fc550c72c0b9832c85cbd56bf6
Signed-off-by: Martijn Coenen <maco@google.com>
|
| |
|
|
| |
best it can do is crash the whole kernel when zram is used
|
| |
|
|
| |
space in /data, doing stupid kernel file manipulation
|
| | |
|
| |
|
|
|
|
|
|
|
| |
Include linux/mm.h for get_cmdline() declaration.
Change-Id: Icad6ef7deef4d93d92d423c96bfa61fb5d66d0b7
Fixes: Change-Id: I30083b757eaef8c61e55a213a883ce8d0c9cf2b1
("uid_sys_stats: log task io with a debug flag")
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
|
| |
|
|
|
|
|
|
|
| |
Add a hashmap inside each uid_entry to keep track of task name and io.
Task full name is a combination of thread and process name.
Bug: 63739275
Change-Id: I30083b757eaef8c61e55a213a883ce8d0c9cf2b1
Signed-off-by: Yang Jin <yajin@google.com>
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
(from https://patchwork.kernel.org/patch/9939409/)
commit 372e3147df70 ("binder: guarantee txn complete / errors delivered
in-order") incorrectly defined a local ret value. This ret value will
be invalid when out of the if block
Change-Id: If7bd963ac7e67d135aa949133263aac27bf15d1a
Signed-off-by: Xu YiPing <xuyiping@hislicon.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(from https://patchwork.kernel.org/patch/9939405/)
commit 7a4408c6bd3e ("binder: make sure accesses to proc/thread are
safe") made a change to enqueue tcomplete to thread->todo before
enqueuing the transaction. However, in err_dead_proc_or_thread case,
the tcomplete is directly freed, without dequeued. It may cause the
thread->todo list to be corrupted.
So, dequeue it before freeing.
Bug: 65333488
Change-Id: Id063a4db18deaa634f4d44aa6ebca47bea32537a
Signed-off-by: Xu YiPing <xuyiping@hisilicon.com>
Signed-off-by: Todd Kjos <tkjos@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new ioctl to binder to control whether FIFO inheritance should happen.
In particular, hwbinder should inherit FIFO priority from callers, but standard
binder threads should not.
Test: boots
bug 36516194
Signed-off-by: Tim Murray <timmurray@google.com>
Change-Id: I8100c4364b7d15d1bf00a8ca5c286e4d4b23ce85
|
| |
|
|
|
| |
Change-Id: I857ef86b2d502293fb8c37398383dceaa21dd29f
Signed-off-by: Mister Oyster <oysterized@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
[Detail]
dat initialize to 0
[Solution]
Change-Id: Ib539e9624b1b8153eda8dd8f7ce55cb67052be59
CR-Id: ALPS03288635
Feature: Others
Signed-off-by: hongxu.zhao <hongxu.zhao@mediatek.com>
(cherry picked from commit ba50a5f9d3254520dda3a70db87a35401e4e14ac)
|
| |
|
|
|
|
|
|
|
|
|
| |
[Detail]
add mutex protect list_add and list_del
to avoid use after free
Change-Id: Ic7d02a5b97955eaee4d3684e13a4a67f3349b42b
Signed-off-by: Qinglong Chai <qinglong.chai@mediatek.com>
CR-Id: ALPS03275524
Feature: disp
|
| |
|
|
|
| |
This reverts commit 6e4b043748ce83d6c7ba6dbf9ba50bd857d659d6.
This reverts commit 1abef3b2cf1b835192ba2484e42f4a1dbba26807.
|
| |
|
|
|
|
|
|
|
|
|
| |
We fix a typo in the code which had us comparing a pointer instead
of the value which was being pointed to. This turns out to be
a relatively benign bug, as we'd incorrectly pass in the empty
string instead of NULL to the function, and the function can handle
both. But we fix it so the code is clearly doing what we intend.
Signed-off-by: Greg Kaiser <gkaiser@google.com>
Change-Id: Ib059819775a3bebca357d4ce684be779853156e3
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In cpufreq_governor_interactive, driver throws warning with WARN_ON
for !tunables and event != CPUFREQ_GOV_POLICY_INIT.
In case when tunables is NULL for event other than
CPUFREQ_GOV_POLICY_INIT, kernel will crash as there is no safe check
available before accessing tunables. So to handle such case and avoid
the kernel crash, return -EINVAL if WARN_ON returns TRUE.
Change-Id: I7a3a22d58e3c8a315a1cc1d31143649dc8807dee
Signed-off-by: gaurav jindal <gauravjindal1104@gmail.com>
|
| |
|
|
|
|
|
|
|
| |
Use the %pP functionality to explicitly allow kernel
pointers to be logged for stack traces
BUG: 30368199
Change-Id: I495915465565293e9e4da5aa28fbd1d14538d99b
Signed-off-by: Dave Weinstein <olorin@google.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
The source file ion.c should not be
executable.
This patch resets the permissions
to "644".
Signed-off-by: Alex Naidis <alex.naidis@linux.com>
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
|
| |
|
|
|
|
|
|
|
| |
Prevent conflict with possible new upstream ioctls
before it itself is upstreamed.
Test: None
Change-Id: I10cbc01c25f920a626ea7559e8ca80ee08865333
Signed-off-by: Jerry Zhang <zhangjerry@google.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
This creates an ioctl named FUNCTIONFS_ENDPOINT_ALLOC which will
preallocate buffers for a given size. Any reads/writes on that
endpoint below that size will use those buffers instead of allocating
their own. If the endpoint is not active, the buffer will not be
allocated until it becomes active.
Change-Id: I4da517620ed913161ea9e21a31f6b92c9a012b44
Signed-off-by: Jerry Zhang <zhangjerry@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch introduces ioctl named FUNCTIONFS_ENDPOINT_DESC, which
returns endpoint descriptor to userspace. It works only if function
is active.
Signed-off-by: Robert Baldyga <r.baldyga@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Jerry Zhang <zhangjerry@google.com>
Change-Id: I55987bf0c6744327f7763b567b5a2b39c50d18e6
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was found during userspace fuzzing test when a large size
allocation is made from ion
[<ffffffc00008a098>] show_stack+0x10/0x1c
[<ffffffc00119c390>] dump_stack+0x74/0xc8
[<ffffffc00020d9a0>] kasan_report_error+0x2b0/0x408
[<ffffffc00020dbd4>] kasan_report+0x34/0x40
[<ffffffc00020cfec>] __asan_storeN+0x15c/0x168
[<ffffffc00020d228>] memset+0x20/0x44
[<ffffffc00009b730>] __dma_alloc_coherent+0x114/0x18c
[<ffffffc00009c6e8>] __dma_alloc_noncoherent+0xbc/0x19c
[<ffffffc000c2b3e0>] ion_cma_allocate+0x178/0x2f0
[<ffffffc000c2b750>] ion_secure_cma_allocate+0xdc/0x190
[<ffffffc000c250dc>] ion_alloc+0x264/0xb88
[<ffffffc000c25e94>] ion_ioctl+0x1f4/0x480
[<ffffffc00022f650>] do_vfs_ioctl+0x67c/0x764
[<ffffffc00022f790>] SyS_ioctl+0x58/0x8c
Change-Id: Idc9c19977a8cc62c7d092f689d30368704b400bc
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
|
| |
|
|
| |
https://github.com/Moyster/android_kernel_m2note/commit/cb0f38652f775a21e09939cac3031ffe7417c563
|
| |
|
|
|
|
| |
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 63245673
Change-Id: I5fc596420301045895e5a9a7e297fd05434babf9
|
| |
|
|
|
|
|
|
|
| |
This moves the code to adjust the gid/uid of lower filesystem
files under the mount flag derive_gid.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I44eaad4ef67c7fcfda3b6ea3502afab94442610c
Bug: 63245673
|
| |
|
|
|
|
|
|
|
| |
Fix double free on error paths
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Change-Id: I1c25a175e87e5dd5cafcdcf9d78bf4c0dc3f88ef
Bug: 65386954
Fixes: aa6d3ace42f9 ("mnt: Add filesystem private data to mount points")
|
| |
|
|
|
|
|
|
|
|
| |
Validate the output buffer length for L2CAP config requests and responses
to avoid overflowing the stack buffer used for building the option blocks.
Cc: stable@vger.kernel.org
Signed-off-by: Ben Seri <ben@armis.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make this useful helper available for other users.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
drivers/md/raid5.c
Change-Id: Ibfc31e7289ffe9bda511c88543bc2deb70a4691b
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 64aee2a965cf2954a038b5522f11d2cd2f0f8f3e upstream.
Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.
Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.
For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-CPU basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.
This can be triggered with the below test case, resulting in warnings
from arch backends.
#define _GNU_SOURCE
#include <linux/hw_breakpoint.h>
#include <linux/perf_event.h>
#include <sched.h>
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <unistd.h>
static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
int group_fd, unsigned long flags)
{
return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
}
char watched_char;
struct perf_event_attr wp_attr = {
.type = PERF_TYPE_BREAKPOINT,
.bp_type = HW_BREAKPOINT_RW,
.bp_addr = (unsigned long)&watched_char,
.bp_len = 1,
.size = sizeof(wp_attr),
};
int main(int argc, char *argv[])
{
int leader, ret;
cpu_set_t cpus;
/*
* Force use of CPU0 to ensure our CPU0-bound events get scheduled.
*/
CPU_ZERO(&cpus);
CPU_SET(0, &cpus);
ret = sched_setaffinity(0, sizeof(cpus), &cpus);
if (ret) {
printf("Unable to set cpu affinity\n");
return 1;
}
/* open leader event, bound to this task, CPU0 only */
leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
if (leader < 0) {
printf("Couldn't open leader: %d\n", leader);
return 1;
}
/*
* Open a follower event that is bound to the same task, but a
* different CPU. This means that the group should never be possible to
* schedule.
*/
ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
if (ret < 0) {
printf("Couldn't open mismatched follower: %d\n", ret);
return 1;
} else {
printf("Opened leader/follower with mismastched CPUs\n");
}
/*
* Open as many independent events as we can, all bound to the same
* task, CPU0 only.
*/
do {
ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
} while (ret >= 0);
/*
* Force enable/disble all events to trigger the erronoeous
* installation of the follower event.
*/
printf("Opened all events. Toggling..\n");
for (;;) {
prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
}
return 0;
}
Fix this by validating this requirement regardless of whether we're
moving events.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Zhou Chengming <zhouchengming1@huawei.com>
Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was found during userspace fuzzing test when a large size
allocation is made from ion
[<ffffffc00008a098>] show_stack+0x10/0x1c
[<ffffffc00119c390>] dump_stack+0x74/0xc8
[<ffffffc00020d9a0>] kasan_report_error+0x2b0/0x408
[<ffffffc00020dbd4>] kasan_report+0x34/0x40
[<ffffffc00020cfec>] __asan_storeN+0x15c/0x168
[<ffffffc00020d228>] memset+0x20/0x44
[<ffffffc00009b730>] __dma_alloc_coherent+0x114/0x18c
[<ffffffc00009c6e8>] __dma_alloc_noncoherent+0xbc/0x19c
[<ffffffc000c2b3e0>] ion_cma_allocate+0x178/0x2f0
[<ffffffc000c2b750>] ion_secure_cma_allocate+0xdc/0x190
[<ffffffc000c250dc>] ion_alloc+0x264/0xb88
[<ffffffc000c25e94>] ion_ioctl+0x1f4/0x480
[<ffffffc00022f650>] do_vfs_ioctl+0x67c/0x764
[<ffffffc00022f790>] SyS_ioctl+0x58/0x8c
Bug: 38195738
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Signed-off-by: Maggie White <maggiewhite@google.com>
Change-Id: I6b1a0a3eaec10500cd4e73290efad4023bc83da5
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 88c54cdf61f508ebcf8da2d819f5dfc03e954d1d upstream.
When user tries to replace the user-defined control TLV, the kernel
checks the change of its content via memcmp(). The problem is that
the kernel passes the return value from memcmp() as is. memcmp()
gives a non-zero negative value depending on the comparison result,
and this shall be recognized as an error code.
The patch covers that corner-case, return 1 properly for the changed
TLV.
Fixes: 8aa9b586e420 ("[ALSA] Control API - more robust TLV implementation")
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 42bec214d8bd432be6d32a1acb0a9079ecd4d142 upstream.
The df for a SMB2 share triggers a GetInfo call for
FS_FULL_SIZE_INFORMATION. The values returned are used to populate
struct statfs.
The problem is that none of the information returned by the call
contains the total blocks available on the filesystem. Instead we use
the blocks available to the user ie. quota limitation when filling out
statfs.f_blocks. The information returned does contain Actual free units
on the filesystem and is used to populate statfs.f_bfree. For users with
quota enabled, it can lead to situations where the total free space
reported is more than the total blocks on the system ending up with df
reports like the following
# df -h /mnt/a
Filesystem Size Used Avail Use% Mounted on
//192.168.22.10/a 2.5G -2.3G 2.5G - /mnt/a
To fix this problem, we instead populate both statfs.f_bfree with the
same value as statfs.f_bavail ie. CallerAvailableAllocationUnits. This
is similar to what is done already in the code for cifs and df now
reports the quota information for the user used to mount the share.
# df --si /mnt/a
Filesystem Size Used Avail Use% Mounted on
//192.168.22.10/a 2.7G 101M 2.6G 4% /mnt/a
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Pierguido Lambri <plambri@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Date Wed, 10 Apr 2013 14:07:48 -0700
When switching the hrtimer cpu_base, we briefly allow for
preemption to become enabled by unlocking the cpu_base lock.
During this time, the CPU corresponding to the new cpu_base
that was selected may in fact go offline. In this scenario, the
hrtimer is enqueued to a CPU that's not online, and therefore
it never fires.
As an example, consider this example:
CPU #0 CPU #1
---- ----
... hrtimer_start()
lock_hrtimer_base()
switch_hrtimer_base()
cpu = hrtimer_get_target() -> 1
spin_unlock(&cpu_base->lock)
<migrate thread to CPU #0>
<offline>
spin_lock(&new_base->lock)
this_cpu = 0
cpu != this_cpu
enqueue_hrtimer(cpu_base #1)
To prevent this scenario, verify that the CPU corresponding to
the new cpu_base is indeed online before selecting it in
hrtimer_switch_base(). If it's not online, fallback to using the
base of the current CPU.
Change-Id: I3aaf5b806a25d5a8b96d6ccea7a042d2718091f7
Signed-off-by: Michael Bohan <mbohan@codeaurora.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Date Wed, 10 Apr 2013 14:07:47 -0700
When switching to a new cpu_base in switch_hrtimer_base(), we
briefly enable preemption by unlocking the cpu_base lock in two
places. During this interval it's possible for the running thread
to be swapped to a different CPU.
Consider the following example:
CPU #0 CPU #1
---- ----
hrtimer_start() ...
lock_hrtimer_base()
switch_hrtimer_base()
this_cpu = 0;
target_cpu_base = 0;
raw_spin_unlock(&cpu_base->lock)
<migrate to CPU 1>
... this_cpu == 0
cpu == this_cpu
timer->base = CPU #0
timer->base != LOCAL_CPU
Since the cached this_cpu is no longer accurate, we'll skip the
hrtimer_check_target() check. Once we eventually go to program
the hardware, we'll decide not to do so since it knows the real
CPU that we're running on is not the same as the chosen base. As
a consequence, we may end up missing the hrtimer's deadline.
Fix this by updating the local CPU number each time we retake a
cpu_base lock in switch_hrtimer_base().
Another possibility is to disable preemption across the whole of
switch_hrtimer_base. This looks suboptimal since preemption
would be disabled while waiting for lock(s).
Change-Id: I46be5cf3069f8e6683ad8fff0841949bdb801684
Signed-off-by: Michael Bohan <mbohan@codeaurora.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Smart wake-affine is using node-size as the factor currently, but the overhead
of the mask operation is high.
Thus, this patch introduce the 'sd_llc_size' percpu variable, which will record
the highest cache-share domain size, and make it to be the new factor, in order
to reduce the overhead and make it more reasonable.
Tested-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Link: http://lkml.kernel.org/r/51D5008E.6030102@linux.vnet.ibm.com
[ Tidied up the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change-Id: I8fb3ceac1e6944db078932172c99d544a4e304e4
Signed-off-by: Paul Reioux <reioux@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The wake-affine scheduler feature is currently always trying to pull
the wakee close to the waker. In theory this should be beneficial if
the waker's CPU caches hot data for the wakee, and it's also beneficial
in the extreme ping-pong high context switch rate case.
Testing shows it can benefit hackbench up to 15%.
However, the feature is somewhat blind, from which some workloads
such as pgbench suffer. It's also time-consuming algorithmically.
Testing shows it can damage pgbench up to 50% - far more than the
benefit it brings in the best case.
So wake-affine should be smarter and it should realize when to
stop its thankless effort at trying to find a suitable CPU to wake on.
This patch introduces 'wakee_flips', which will be increased each
time the task flips (switches) its wakee target.
So a high 'wakee_flips' value means the task has more than one
wakee, and the bigger the number, the higher the wakeup frequency.
Now when making the decision on whether to pull or not, pay attention to
the wakee with a high 'wakee_flips', pulling such a task may benefit
the wakee. Also imply that the waker will face cruel competition later,
it could be very cruel or very fast depends on the story behind
'wakee_flips', waker therefore suffers.
Furthermore, if waker also has a high 'wakee_flips', that implies that
multiple tasks rely on it, then waker's higher latency will damage all
of them, so pulling wakee seems to be a bad deal.
Thus, when 'waker->wakee_flips / wakee->wakee_flips' becomes
higher and higher, the cost of pulling seems to be worse and worse.
The patch therefore helps the wake-affine feature to stop its pulling
work when:
wakee->wakee_flips > factor &&
waker->wakee_flips > (factor * wakee->wakee_flips)
The 'factor' here is the number of CPUs in the current CPU's NUMA node,
so a bigger node will lead to more pulling since the trial becomes more
severe.
After applying the patch, pgbench shows up to 40% improvements and no regressions.
Tested with 12 cpu x86 server and tip 3.10.0-rc7.
The percentages in the final column highlight the areas with the biggest wins,
all other areas improved as well:
pgbench base smart
| db_size | clients | tps | | tps |
+---------+---------+-------+ +-------+
| 22 MB | 1 | 10598 | | 10796 |
| 22 MB | 2 | 21257 | | 21336 |
| 22 MB | 4 | 41386 | | 41622 |
| 22 MB | 8 | 51253 | | 57932 |
| 22 MB | 12 | 48570 | | 54000 |
| 22 MB | 16 | 46748 | | 55982 | +19.75%
| 22 MB | 24 | 44346 | | 55847 | +25.93%
| 22 MB | 32 | 43460 | | 54614 | +25.66%
| 7484 MB | 1 | 8951 | | 9193 |
| 7484 MB | 2 | 19233 | | 19240 |
| 7484 MB | 4 | 37239 | | 37302 |
| 7484 MB | 8 | 46087 | | 50018 |
| 7484 MB | 12 | 42054 | | 48763 |
| 7484 MB | 16 | 40765 | | 51633 | +26.66%
| 7484 MB | 24 | 37651 | | 52377 | +39.11%
| 7484 MB | 32 | 37056 | | 51108 | +37.92%
| 15 GB | 1 | 8845 | | 9104 |
| 15 GB | 2 | 19094 | | 19162 |
| 15 GB | 4 | 36979 | | 36983 |
| 15 GB | 8 | 46087 | | 49977 |
| 15 GB | 12 | 41901 | | 48591 |
| 15 GB | 16 | 40147 | | 50651 | +26.16%
| 15 GB | 24 | 37250 | | 52365 | +40.58%
| 15 GB | 32 | 36470 | | 50015 | +37.14%
Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/51D50057.9000809@linux.vnet.ibm.com
[ Improved the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Change-Id: I70018a00435ea795121b70a576d74bbbd00b7464
Signed-off-by: Paul Reioux <reioux@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reinitialize rq->next_balance when a CPU is hot-added. Otherwise,
scheduler domain rebalancing may be skipped if rq->next_balance was
set to a future time when the CPU was last active, and the
newly-re-added CPU is in idle_balance(). As a result, the
newly-re-added CPU will remain idle with no tasks scheduled until the
softlockup watchdog runs - potentially 4 seconds later. This can
waste energy and reduce performance.
This behavior can be observed in some SoC kernels, which use CPU
hotplug to dynamically remove and add CPUs in response to load. In
one case that triggered this behavior,
0. the system started with all cores enabled, running multi-threaded
CPU-bound code;
1. the system entered some single-threaded code;
2. a CPU went idle and was hot-removed;
3. the system started executing a multi-threaded CPU-bound task;
4. the CPU from event 2 was re-added, to respond to the load.
The time interval between events 2 and 4 was approximately 300
milliseconds.
Of course, ideally CPU hotplug would not be used in this manner,
but this patch does appear to fix a real bug.
Nvidia folks: this patch is submitted as at least a partial fix for
bug 1243368 ("[sched] Load-balancing not happening correctly after
cores brought online")
Change-Id: Iabac21e110402bb581b7db40c42babc951d378d0
Signed-off-by: Paul Walmsley <pwalmsley@nvidia.com>
Cc: Peter Boonstoppel <pboonstoppel@nvidia.com>
Reviewed-on: http://git-master/r/208927
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Peter Zu <pzu@nvidia.com>
Reviewed-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Peter Zu <pzu@nvidia.com>
Reviewed-by: Yu-Huan Hsu <yhsu@nvidia.com>
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
|
| |
|
|
|
|
|
|
|
|
| |
Double ! or !! are normally required to get 0 or 1 out of a expression. A
comparision always returns 0 or 1 and hence there is no need to apply double !
over it again.
Change-Id: Ia705389240381dd5cb04b81566021ea29af6b1d0
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| |
|
|
|
|
|
|
| |
The structure fix is initialised before its usage to prevent
kernel information leak during copy_to_user.
Change-Id: Ice4da4c9bd6095a4387e1d16cb20ca474accb9dc
Signed-off-by: Krishna Manikandan <mkrishn@codeaurora.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(url: https://patchwork.kernel.org/patch/9876921/)
Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables
f2fs to issue 4K size discard in real-time discard mode. However, issuing
smaller discard may cost more lifetime but releasing less free space in
flash device. Since f2fs has ability of separating hot/cold data and
garbage collection, we can expect that small-sized invalid region would
expand soon with OPU, deletion or garbage collection on valid datas, so
it's better to delay or skip issuing smaller size discards, it could help
to reduce overmuch consumption of IO bandwidth and lifetime of flash
storage.
This patch makes f2fs selectng 64K size as its default minimal
granularity, and issue discard with the size which is not smaller than
minimal granularity. Also it exposes discard granularity as sysfs entry
for configuration in different scenario.
Jaegeuk Kim:
We must issue all the accumulated discard commands when fstrim is called.
So, I've added pend_list_tag[] to indicate whether we should issue the
commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them.
P_TRIM is set once at a time, given fstrim trigger.
In addition, issue_discard_thread is calling too much due to the number of
discard commands remaining in the pending list. I added a timer to control
it likewise gc_thread.
Change-Id: Ia90dd686c25cb27f144137ea3c9bcc1c943a9aea
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
|
|
|
|
|
|
|
|
| |
It'd be better to retry writing atomic pages when we get -ENOMEM.
(url https://www.spinics.net/lists/linux-fsdevel/msg113844.html)
Bug: 63260873
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
(url: https://patchwork.kernel.org/patch/9886419/)
We expect cold files write data sequentially, but sometimes some of small data
can be updated, which incurs fragmentation.
Let's avoid that.
Change-Id: Ib4a8db92e05bc88b1c7809707078efd249421045
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(url: https://patchwork.kernel.org/patch/9857935/)
When ->freeze_fs is called from lvm for doing snapshot, it needs to
make sure there will be no more changes in filesystem's data, however,
previously, background threads like GC thread wasn't aware of freezing,
so in environment with active background threads, data of snapshot
becomes unstable.
This patch fixes this issue by adding sb_{start,end}_intwrite in
below background threads:
- GC thread
- flush thread
- discard thread
Note that, don't use sb_start_intwrite() in gc_thread_func() due to:
generic/241 reports below bug:
======================================================
WARNING: possible circular locking dependency detected
4.13.0-rc1+ #32 Tainted: G O
------------------------------------------------------
f2fs_gc-250:0/22186 is trying to acquire lock:
(&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs]
but task is already holding lock:
(sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (sb_internal#2){++++.-}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__sb_start_write+0x11d/0x1f0
f2fs_evict_inode+0x2d6/0x4e0 [f2fs]
evict+0xa8/0x170
iput+0x1fb/0x2c0
f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
write_checkpoint+0x1b1/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs]
f2fs_sync_file+0x34/0x40 [f2fs]
vfs_fsync_range+0x4a/0xa0
do_fsync+0x3c/0x60
SyS_fdatasync+0x15/0x20
do_fast_syscall_32+0xa1/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #1 (&sbi->cp_mutex){+.+...}:
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
write_checkpoint+0x2f/0x750 [f2fs]
f2fs_sync_fs+0x85/0x1b0 [f2fs]
sync_filesystem+0x67/0x80
generic_shutdown_super+0x27/0x100
kill_block_super+0x22/0x50
kill_f2fs_super+0x3a/0x40 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x69/0x80
exit_to_usermode_loop+0x57/0x92
do_fast_syscall_32+0x18c/0x1b0
entry_SYSENTER_32+0x4c/0x7b
-> #0 (&sbi->gc_mutex){+.+...}:
validate_chain.isra.36+0xc50/0xdb0
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
__mutex_lock+0x4f/0x830
mutex_lock_nested+0x25/0x30
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
kthread+0xe9/0x120
ret_from_fork+0x19/0x24
other info that might help us debug this:
Chain exists of:
&sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(sb_internal#2);
lock(&sbi->cp_mutex);
lock(sb_internal#2);
lock(&sbi->gc_mutex);
*** DEADLOCK ***
1 lock held by f2fs_gc-250:0/22186:
#0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs]
stack backtrace:
CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
dump_stack+0x5f/0x92
print_circular_bug+0x1b3/0x1bd
validate_chain.isra.36+0xc50/0xdb0
? __this_cpu_preempt_check+0xf/0x20
__lock_acquire+0x405/0x7b0
lock_acquire+0xae/0x220
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
__mutex_lock+0x4f/0x830
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
mutex_lock_nested+0x25/0x30
? f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_sync_fs+0x7b/0x1b0 [f2fs]
f2fs_balance_fs_bg+0xb9/0x200 [f2fs]
gc_thread_func+0x302/0x4a0 [f2fs]
? preempt_schedule_common+0x2f/0x4d
? f2fs_gc+0x540/0x540 [f2fs]
kthread+0xe9/0x120
? f2fs_gc+0x540/0x540 [f2fs]
? kthread_create_on_node+0x30/0x30
ret_from_fork+0x19/0x24
The deadlock occurs in below condition:
GC Thread Thread B
- sb_start_intwrite
- f2fs_sync_file
- f2fs_sync_fs
- mutex_lock(&sbi->gc_mutex)
- write_checkpoint
- block_operations
- f2fs_sync_inode_meta
- iput
- sb_start_intwrite
- mutex_lock(&sbi->gc_mutex)
Fix this by altering sb_start_intwrite to sb_start_write_trylock.
Change-Id: Ibea8cff73d684e5aebc950f29ef4d611fc10df76
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
| | |
|
| | |
|
| |
|
|
| |
Change-Id: I995b152a3766ebb8faec244849d90d7d2bd5c672
|