| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
| |
The accessory detect debounce work is not performance sensitive so let
the scheduler run it wherever is most efficient rather than in a per CPU
workqueue by using the system power efficient workqueue.
Signed-off-by: Mark Brown <broonie@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
(cherry picked from commit e6058aaadcd473e5827720dc143af56aabbeecc7)
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
There is no need to use a normal per-CPU workqueue for delayed power downs
as they're not timing or performance critical and waking up a core for them
would defeat some of the point.
Signed-off-by: Mark Brown <broonie@linaro.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Liam Girdwood <liam.r.girdwood@intel.com>
(cherry picked from commit 070260f07c7daec311f2466eb9d1df475d5a46f8)
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
There is no need to use a normal per-CPU workqueue for delayed power downs
as they're not timing or performance critical and waking up a core for them
would defeat some of the point.
Signed-off-by: Mark Brown <broonie@linaro.org>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
(cherry picked from commit d4e1a73acd4e894f8332f2093bceaef585cfab67)
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
fbcon uses workqueues and it has no real dependency of scheduling these on the
cpu which scheduled them.
On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.
This patch replaces system_wq with system_power_efficient_wq.
Cc: Dave Airlie <airlied@redhat.com>
Cc: linux-fbdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit a85f1a41f020bc2c97611060bcfae6f48a1db28d)
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Block layer uses workqueues for multiple purposes. There is no real dependency
of scheduling these on the cpu which scheduled them.
On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.
This patch replaces normal workqueues with power efficient versions.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 695588f9454bdbc7c1a2fbb8a6bfdcfba6183348)
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Phylib uses workqueues for multiple purposes. There is no real dependency of
scheduling these on the cpu which scheduled them.
On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.
This patch replaces system_wq with system_power_efficient_wq for PHYLIB.
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit bbb47bdeae756f04b896b55b51f230f3eb21f207)
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds system wide workqueues aligned towards power saving. This is
done by allocating them with WQ_UNBOUND flag if 'wq_power_efficient' is set to
'true'.
tj: updated comments a bit.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 0668106ca3865ba945e155097fb042bf66d364d3)
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Workqueues can be performance or power-oriented. Currently, most workqueues are
bound to the CPU they were created on. This gives good performance (due to cache
effects) at the cost of potentially waking up otherwise idle cores (Idle from
scheduler's perspective. Which may or may not be physically idle) just to
process some work. To save power, we can allow the work to be rescheduled on a
core that is already awake.
Workqueues created with the WQ_UNBOUND flag will allow some power savings.
However, we don't change the default behaviour of the system. To enable
power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to
be turned on. This option can also be overridden by the
workqueue.power_efficient boot parameter.
tj: Updated config description and comments. Renamed
CONFIG_WQ_POWER_EFFICIENT to CONFIG_WQ_POWER_EFFICIENT_DEFAULT.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit cee22a15052faa817e3ec8985a28154d3fabc7aa)
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
propagate from (CR).
On embedded devices with built-in batteries, it is not so
important to sync the file systems before suspend. The chance
of losing power during suspend are no greater than they are
when the system is awake. The sync operations can greatly
increase suspend latency when the system has accrued many dirty
pages and/or the target storage devices are not particularly
fast.
This commit adds a kernel config option to allow file system
sync in the suspend path to be disabled.
It is enabled by default.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
compound_head() is implemented with assumption that there would be race
condition when checking tail flag. This assumption is only true when we
try to access arbitrary positioned struct page.
The situation that virt_to_head_page() is called is different case. We
call virt_to_head_page() only in the range of allocated pages, so there
is no race condition on tail flag. In this case, we don't need to
handle race condition and we can reduce overhead slightly. This patch
implements compound_head_fast() which is similar with compound_head()
except tail flag race handling. And then, virt_to_head_page() uses this
optimized function to improve performance.
I saw 1.8% win in a fast-path loop over kmem_cache_alloc/free, (14.063
ns -> 13.810 ns) if target object is on tail page.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
| |
target
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
target
This patch adds checks that prevens futile attempts to move rt tasks
to a CPU with active tasks of equal or higher priority.
This reduces run queue lock contention and improves the performance of
a well known OLTP benchmark by 0.7%.
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Cc: Suruchi Kadu <suruchi.a.kadu@intel.com>
Cc: Doug Nelson<doug.nelson@intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1421430374.2399.27.camel@schen9-desk2.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In this case, it is basically a polling. Let's not involve timer at all
because that would hurt performance for application event loops.
In an arbitrary test I've done, io_getevents syscall elapsed time
reduces from 50000+ nanoseconds to a few hundereds.
Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When there is serious memory pressure, all workers in a pool could be
blocked, and a new thread cannot be created because it requires memory
allocation.
In this situation a WQ_MEM_RECLAIM workqueue will wake up the
rescuer thread to do some work.
The rescuer will only handle requests that are already on ->worklist.
If max_requests is 1, that means it will handle a single request.
The rescuer will be woken again in 100ms to handle another max_requests
requests.
I've seen a machine (running a 3.0 based "enterprise" kernel) with
thousands of requests queued for xfslogd, which has a max_requests of
1, and is needed for retiring all 'xfs' write requests. When one of
the worker pools gets into this state, it progresses extremely slowly
and possibly never recovers (only waited an hour or two).
With this patch we leave a pool_workqueue on mayday list
until it is clearly no longer in need of assistance. This allows
all requests to be handled in a timely fashion.
We keep each pool_workqueue on the mayday list until
need_to_create_worker() is false, and no work for this workqueue is
found in the pool.
I have tested this in combination with a (hackish) patch which forces
all work items to be handled by the rescuer thread. In that context
it significantly improves performance. A similar patch for a 3.0
kernel significantly improved performance on a heavy work load.
Thanks to Jan Kara for some design ideas, and to Dongsu Park for
some comments and testing.
tj: Inverted the lock order between wq_mayday_lock and pool->lock with
a preceding patch and simplified this patch. Added comment and
updated changelog accordingly. Dongsu spotted missing get_pwq()
in the simplified code.
Cc: Dongsu Park <dongsu.park@profitbricks.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had to insert a preempt enable/disable in the fastpath a while ago in
order to guarantee that tid and kmem_cache_cpu are retrieved on the same
cpu. It is the problem only for CONFIG_PREEMPT in which scheduler can
move the process to other cpu during retrieving data.
Now, I reach the solution to remove preempt enable/disable in the
fastpath. If tid is matched with kmem_cache_cpu's tid after tid and
kmem_cache_cpu are retrieved by separate this_cpu operation, it means
that they are retrieved on the same cpu. If not matched, we just have
to retry it.
With this guarantee, preemption enable/disable isn't need at all even if
CONFIG_PREEMPT, so this patch removes it.
I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free in
CONFIG_PREEMPT. (14.821 ns -> 14.049 ns)
Below is the result of Christoph's slab_test reported by Jesper Dangaard
Brouer.
* Before
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 49 cycles kfree -> 62 cycles
10000 times kmalloc(16) -> 48 cycles kfree -> 64 cycles
10000 times kmalloc(32) -> 53 cycles kfree -> 70 cycles
10000 times kmalloc(64) -> 64 cycles kfree -> 77 cycles
10000 times kmalloc(128) -> 74 cycles kfree -> 84 cycles
10000 times kmalloc(256) -> 84 cycles kfree -> 114 cycles
10000 times kmalloc(512) -> 83 cycles kfree -> 116 cycles
10000 times kmalloc(1024) -> 81 cycles kfree -> 120 cycles
10000 times kmalloc(2048) -> 104 cycles kfree -> 136 cycles
10000 times kmalloc(4096) -> 142 cycles kfree -> 165 cycles
10000 times kmalloc(8192) -> 238 cycles kfree -> 226 cycles
10000 times kmalloc(16384) -> 403 cycles kfree -> 264 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 68 cycles
10000 times kmalloc(16)/kfree -> 68 cycles
10000 times kmalloc(32)/kfree -> 69 cycles
10000 times kmalloc(64)/kfree -> 68 cycles
10000 times kmalloc(128)/kfree -> 68 cycles
10000 times kmalloc(256)/kfree -> 68 cycles
10000 times kmalloc(512)/kfree -> 74 cycles
10000 times kmalloc(1024)/kfree -> 75 cycles
10000 times kmalloc(2048)/kfree -> 74 cycles
10000 times kmalloc(4096)/kfree -> 74 cycles
10000 times kmalloc(8192)/kfree -> 75 cycles
10000 times kmalloc(16384)/kfree -> 510 cycles
* After
Single thread testing
=====================
1. Kmalloc: Repeatedly allocate then free test
10000 times kmalloc(8) -> 46 cycles kfree -> 61 cycles
10000 times kmalloc(16) -> 46 cycles kfree -> 63 cycles
10000 times kmalloc(32) -> 49 cycles kfree -> 69 cycles
10000 times kmalloc(64) -> 57 cycles kfree -> 76 cycles
10000 times kmalloc(128) -> 66 cycles kfree -> 83 cycles
10000 times kmalloc(256) -> 84 cycles kfree -> 110 cycles
10000 times kmalloc(512) -> 77 cycles kfree -> 114 cycles
10000 times kmalloc(1024) -> 80 cycles kfree -> 116 cycles
10000 times kmalloc(2048) -> 102 cycles kfree -> 131 cycles
10000 times kmalloc(4096) -> 135 cycles kfree -> 163 cycles
10000 times kmalloc(8192) -> 238 cycles kfree -> 218 cycles
10000 times kmalloc(16384) -> 399 cycles kfree -> 262 cycles
2. Kmalloc: alloc/free test
10000 times kmalloc(8)/kfree -> 65 cycles
10000 times kmalloc(16)/kfree -> 66 cycles
10000 times kmalloc(32)/kfree -> 65 cycles
10000 times kmalloc(64)/kfree -> 66 cycles
10000 times kmalloc(128)/kfree -> 66 cycles
10000 times kmalloc(256)/kfree -> 71 cycles
10000 times kmalloc(512)/kfree -> 72 cycles
10000 times kmalloc(1024)/kfree -> 71 cycles
10000 times kmalloc(2048)/kfree -> 71 cycles
10000 times kmalloc(4096)/kfree -> 71 cycles
10000 times kmalloc(8192)/kfree -> 65 cycles
10000 times kmalloc(16384)/kfree -> 511 cycles
Most of the results are better than before.
Note that this change slightly worses performance in !CONFIG_PREEMPT,
roughly 0.3%. Implementing each case separately would help performance,
but, since it's so marginal, I didn't do that. This would help
maintanance since we have same code for all cases.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Replace places where __get_cpu_var() is used for an address calculation
with this_cpu_ptr().
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We explicitly mark the task running after returning from
a __rt_mutex_slowlock() call, which does the actual sleeping
via wait-wake-trylocking. As such, this patch does two things:
(1) refactors the code so that setting current to TASK_RUNNING
is done by __rt_mutex_slowlock(), and not by the callers. The
downside to this is that it becomes a bit unclear when at what
point we block. As such I've added a comment that the task
blocks when calling __rt_mutex_slowlock() so readers can figure
out when it is running again.
(2) relaxes setting current's state through __set_current_state(),
instead of it's more expensive barrier alternative. There was no
need for the implied barrier as we're obviously not planning on
blocking.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1422857784.18096.1.camel@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "thread would block" case can be checked without grabbing ->wait.lock.
[ If the check does not return early then grab the lock and recheck.
A memory barrier is not needed as complete() and complete_all() imply
a barrier.
The ACCESS_ONCE() is needed for calls in a loop that, if inlined, could
optimize out the re-fetching of x->done. ]
Signed-off-by: Nicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1422013307-13200-1-git-send-email-der.herr@hofr.at
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
To apply proper slack, the original algorithm used to prepare a mask and
then apply the mask to obtain the appropriately rounded-off absolute
time the timer expires.
This patch modifies the masking logic to a bit-shift logic, therby
reducing the complexity and number of operations. Thus obtaining a minor
speed-up.
Signed-off-by: Chinmay V S <chinmay.v.s@pathpartnertech.com>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
Introduce CPUFREQ_RELATION_C for frequency selection.
It selects the frequency with the minimum euclidean distance to target.
In case of equal distance between 2 frequencies, it will select the
greater frequency.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
| |
Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
Signed-off-by: engstk <eng.stk@sapo.pt>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| | |
|
| |
|
|
|
| |
Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
| |
Import ROW scheduler from lenok source with additional
changes due to past MediaTek commits to ensure compatibility.
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some i/o schedulers (i.e. row-iosched, cfq-iosched) deploy an idling
algorithm in order to be better synced with the readahead algorithm.
Idling is a prediction algorithm for incoming read requests.
In this patch we mark pages which are part of a readahead window, by
setting a newly introduced flag. With this flag, the i/o scheduler can
identify a request which is associated with a readahead page. This
enables the i/o scheduler's idling mechanism to be en-sync with the
readahead mechanism and, in turn, can increase read throughput.
Change-Id: I0654f23315b6d19d71bcc9cc029c6b281a44b196
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a new flag to be used in cmd_flags field of struct request
for marking request as urgent.
Urgent request is the one that should be given priority currently handled
(regular) request by the device driver. The decision of a request urgency
is taken by the scheduler.
Change-Id: Ic20470987ef23410f1d0324f96f00578f7df8717
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch add support in block & elevator layers for handling
urgent requests. The decision if a request is urgent or not is taken
by the scheduler. Urgent request notification is passed to the underlying
block device driver (eMMC for example). Block device driver may decide to
interrupt the currently running low priority request to serve the new
urgent request. By doing so READ latency is greatly reduced in read&write
collision scenarios.
Note that if the current scheduler doesn't implement the urgent request
mechanism, this code path is never activated.
Change-Id: I8aa74b9b45c0d3a2221bd4e82ea76eb4103e7cfa
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for reinserting a dispatched request back to the
scheduler's internal data structures.
This capability is used by the device driver when it chooses to
interrupt the current request transmission and execute another (more
urgent) pending request. For example: interrupting long write in order
to handle pending read. The device driver re-inserts the
remaining write request back to the scheduler, to be rescheduled
for transmission later on.
Add API for verifying whether the current scheduler
supports reinserting requests mechanism. If reinsert mechanism isn't
supported by the scheduler, this code path will never be activated.
Change-Id: I5c982a66b651ebf544aae60063ac8a340d79e67f
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A set of processes may happen to perform interleaved reads, i.e.,requests
whose union would give rise to a sequential read pattern. There are two
typical cases: in the first case, processes read fixed-size chunks of
data at a fixed distance from each other, while in the second case processes
may read variable-size chunks at variable distances. The latter case occurs
for example with QEMU, which splits the I/O generated by the guest into
multiple chunks, and lets these chunks be served by a pool of cooperating
processes, iteratively assigning the next chunk of I/O to the first
available process. CFQ uses actual queue merging for the first type of
rocesses, whereas it uses preemption to get a sequential read pattern out
of the read requests performed by the second type of processes. In the end
it uses two different mechanisms to achieve the same goal: boosting the
throughput with interleaved I/O.
This patch introduces Early Queue Merge (EQM), a unified mechanism to get a
sequential read pattern with both types of processes. The main idea is
checking newly arrived requests against the next request of the active queue
both in case of actual request insert and in case of request merge. By doing
so, both the types of processes can be handled by just merging their queues.
EQM is then simpler and more compact than the pair of mechanisms used in
CFQ.
Finally, EQM also preserves the typical low-latency properties of BFQ, by
properly restoring the weight-raising state of a queue when it gets back to
a non-merged state.
Change-Id: I464d77705cfd2a382150264c52405887e3903358
Signed-off-by: Mauro Andreolini <mauro.andreolini@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add the BFQ-v7r8 I/O scheduler to 3.10.8+.
The general structure is borrowed from CFQ, as much of the code for
handling I/O contexts Over time, several useful features have been
ported from CFQ as well (details in the changelog in README.BFQ). A
(bfq_)queue is associated to each task doing I/O on a device, and each
time a scheduling decision has to be made a queue is selected and served
until it expires.
- Slices are given in the service domain: tasks are assigned
budgets, measured in number of sectors. Once got the disk, a task
must however consume its assigned budget within a configurable
maximum time (by default, the maximum possible value of the
budgets is automatically computed to comply with this timeout).
This allows the desired latency vs "throughput boosting" tradeoff
to be set.
- Budgets are scheduled according to a variant of WF2Q+, implemented
using an augmented rb-tree to take eligibility into account while
preserving an O(log N) overall complexity.
- A low-latency tunable is provided; if enabled, both interactive
and soft real-time applications are guaranteed a very low latency.
- Latency guarantees are preserved also in the presence of NCQ.
- Also with flash-based devices, a high throughput is achieved
while still preserving latency guarantees.
- BFQ features Early Queue Merge (EQM), a sort of fusion of the
cooperating-queue-merging and the preemption mechanisms present
in CFQ. EQM is in fact a unified mechanism that tries to get a
sequential read pattern, and hence a high throughput, with any
set of processes performing interleaved I/O over a contiguous
sequence of sectors.
- BFQ supports full hierarchical scheduling, exporting a cgroups
interface. Since each node has a full scheduler, each group can
be assigned its own weight.
- If the cgroups interface is not used, only I/O priorities can be
assigned to processes, with ioprio values mapped to weights
with the relation weight = IOPRIO_BE_NR - ioprio.
- ioprio classes are served in strict priority order, i.e., lower
priority queues are not served as long as there are higher
priority queues. Among queues in the same class the bandwidth is
distributed in proportion to the weight of each queue. A very
thin extra bandwidth is however guaranteed to the Idle class, to
prevent it from starving.
Change-Id: Ib51ef300946b21e19cbaad97eb415c05bddd3ead
Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because mwait_idle_with_hints() gets called from !idle context it must
call current_clr_polling(). This however means that resched_task() is
very likely to send an IPI even when we were polling:
CPU0 CPU1
if (current_set_polling_and_test())
goto out;
__monitor(&ti->flags);
if (!need_resched())
__mwait(eax, ecx);
set_tsk_need_resched(p);
smp_mb();
out:
current_clr_polling();
if (!tsk_is_polling(p))
smp_send_reschedule(cpu);
So while it is correct (extra IPIs aren't a problem, whereas a missed
IPI would be) it is a performance problem (for some).
Avoid this issue by using fetch_or() to atomically set NEED_RESCHED
and test if POLLING_NRFLAG is set.
Since a CPU stuck in mwait is unlikely to modify the flags word,
contention on the cmpxchg is unlikely and thus we should mostly
succeed in a single go.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-kf5suce6njh5xf5d3od13rr0@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
| |
the performance of memcpy and memmove of the general version is very
inefficient, this patch improved them.
Signed-off-by: Miao Xie <miaox*******>
Signed-off-by: faux123 <reioux@gmail.com>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the kernel's memcpy and memmove is very inefficient. But the glibc version is
quite fast, in some cases it is 10 times faster than the kernel version. So I
introduce some memory copy macros and functions of the glibc to improve the
kernel version's performance.
The strategy of the memory functions is:
1. Copy bytes until the destination pointer is aligned.
2. Copy words in unrolled loops. If the source and destination are not
aligned in the same way, use word memory operations, but shift and merge
two read words before writing.
3. Copy the few remaining bytes.
Signed-off-by: Miao Xie <miaox*******>
Signed-off-by: faux123 <reioux@gmail.com>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
new latency requirement, it's not necessary to immediately wake up another cpu by sending cross-cpu IPI, we can consider the new latency to be taken into effect after next wakeup from idle, this can save the unnecessary wakeup cost, and reduce the risk that drivers may request latency in irq disabled context.
[<c08e0cc0>] (__irq_svc+0x40/0x70) from [<c00e801c>] (smp_call_function_single+0x16c/0x240)
[<c00e801c>] (smp_call_function_single+0x16c/0x240) from [<c00e855c>] (smp_call_function+0x40/0x6c)
[<c00e855c>] (smp_call_function+0x40/0x6c) from [<c0601c9c>] (cpuidle_latency_notify+0x18/0x20)
[<c0601c9c>] (cpuidle_latency_notify+0x18/0x20) from [<c00b7c28>] (blocking_notifier_call_chain+0x74/0x94)
[<c00b7c28>] (blocking_notifier_call_chain+0x74/0x94) from [<c00d563c>] (pm_qos_update_target+0xe0/0x128)
[<c00d563c>] (pm_qos_update_target+0xe0/0x128) from [<c0620d3c>] (msmsdcc_enable+0xac/0x158)
[<c0620d3c>] (msmsdcc_enable+0xac/0x158) from [<c06050e0>] (mmc_try_claim_host+0xb0/0xb8)
[<c06050e0>] (mmc_try_claim_host+0xb0/0xb8) from [<c0605318>] (mmc_start_bkops.part.15+0x50/0x2f4)
[<c0605318>] (mmc_start_bkops.part.15+0x50/0x2f4) from [<c00ab768>] (process_one_work+0x124/0x55c)
[<c00ab768>] (process_one_work+0x124/0x55c) from [<c00abfc8>] (worker_thread+0x178/0x45c)
[<c00abfc8>] (worker_thread+0x178/0x45c) from [<c00b0b24>] (kthread+0x84/0x90)
[<c00b0b24>] (kthread+0x84/0x90) from [<c000fdd4>] (kernel_thread_exit+0x0/0x8)
Disabling lock debugging due to kernel taint
coresight-etb coresight-etb.0: ETB aborted
Kernel panic - not syncing: softlockup: hung tasks
Signed-off-by: Guojian Chen <a21757@motorola.com>
Reviewed-on: http://gerrit.pcs.mot.com/532702
SLT-Approved: Slta Waiver <sltawvr@motorola.com>
Tested-by: Jira Key <jirakey@motorola.com>
Reviewed-by: Klocwork kwcheck <klocwork-kwcheck@sourceforge.mot.com>
Reviewed-by: Christopher Fries <qcf001@motorola.com>
Reviewed-by: David Ding <dding@motorola.com>
Submit-Approved: Jira Key <jirakey@motorola.com>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do the multiplications and divisions at compile time
instead of runtime when the converted value is a constant.
Make the calculation functions static __always_inline to jiffies.h.
Add #defines with __builtin_constant_p to test and use the
static inline or the runtime functions as appropriate.
Prefix the old exported symbols/functions with __
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| | |
|
| |
|
|
| |
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
| |
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
| |
This reverts commit d0550a3f2d313a5310ace03299d45ef63edfb750.
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
| |
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 35c45e8bce3c92fb1ff94d376f1d4bfaae079d66 which was
commit 06d2f6ca5a38abe92f1f3a132b331eee773868c3 upstream as it should
not have been applied.
Reported-by: Luis Henriques <luis.henriques@canonical.com>
Cc: Markus Pargmann <mpa@pengutronix.de>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 93e3bce6287e1fb3e60d3324ed08555b5bbafa89 upstream.
The warning message in prepend_path is unclear and outdated. It was
added as a warning that the mechanism for generating names of pseudo
files had been removed from prepend_path and d_dname should be used
instead. Unfortunately the warning reads like a general warning,
making it unclear what to do with it.
Remove the warning. The transition it was added to warn about is long
over, and I added code several years ago which in rare cases causes
the warning to fire on legitimate code, and the warning is now firing
and scaring people for no good reason.
Reported-by: Ivan Delalande <colona@arista.com>
Reported-by: Omar Sandoval <osandov@osandov.com>
Fixes: f48cfddc6729e ("vfs: In d_path don't call d_dname on a mount point")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
[ vlee: Backported to 3.10. Adjusted context. ]
Signed-off-by: Vinson Lee <vlee@twitter.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 41fc014332d91ee90c32840bf161f9685b7fbf2b ]
dump_rules returns skb length and not error.
But when family == AF_UNSPEC, the caller of dump_rules
assumes that it returns an error. Hence, when family == AF_UNSPEC,
we continue trying to dump on -EMSGSIZE errors resulting in
incorrect dump idx carried between skbs belonging to the same dump.
This results in fib rule dump always only dumping rules that fit
into the first skb.
This patch fixes dump_rules to return error so that we exit correctly
and idx is correctly maintained between skbs that are part of the
same dump.
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 8e2d61e0aed2b7c4ecb35844fe07e0b2b762dee4 ]
Consider sctp module is unloaded and is being requested because an user
is creating a sctp socket.
During initialization, sctp will add the new protocol type and then
initialize pernet subsys:
status = sctp_v4_protosw_init();
if (status)
goto err_protosw_init;
status = sctp_v6_protosw_init();
if (status)
goto err_v6_protosw_init;
status = register_pernet_subsys(&sctp_net_ops);
The problem is that after those calls to sctp_v{4,6}_protosw_init(), it
is possible for userspace to create SCTP sockets like if the module is
already fully loaded. If that happens, one of the possible effects is
that we will have readers for net->sctp.local_addr_list list earlier
than expected and sctp_net_init() does not take precautions while
dealing with that list, leading to a potential panic but not limited to
that, as sctp_sock_init() will copy a bunch of blank/partially
initialized values from net->sctp.
The race happens like this:
CPU 0 | CPU 1
socket() |
__sock_create | socket()
inet_create | __sock_create
list_for_each_entry_rcu( |
answer, &inetsw[sock->type], |
list) { | inet_create
/* no hits */ |
if (unlikely(err)) { |
... |
request_module() |
/* socket creation is blocked |
* the module is fully loaded |
*/ |
sctp_init |
sctp_v4_protosw_init |
inet_register_protosw |
list_add_rcu(&p->list, |
last_perm); |
| list_for_each_entry_rcu(
| answer, &inetsw[sock->type],
sctp_v6_protosw_init | list) {
| /* hit, so assumes protocol
| * is already loaded
| */
| /* socket creation continues
| * before netns is initialized
| */
register_pernet_subsys |
Simply inverting the initialization order between
register_pernet_subsys() and sctp_v4_protosw_init() is not possible
because register_pernet_subsys() will create a control sctp socket, so
the protocol must be already visible by then. Deferring the socket
creation to a work-queue is not good specially because we loose the
ability to handle its errors.
So, as suggested by Vlad, the fix is to split netns initialization in
two moments: defaults and control socket, so that the defaults are
already loaded by when we register the protocol, while control socket
initialization is kept at the same moment it is today.
Fixes: 4db67e808640 ("sctp: Make the address lists per network namespace")
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 25b4a44c19c83d98e8c0807a7ede07c1f28eab8b ]
In the IPv6 multicast routing code the mrt_lock was not being released
correctly in the MFC iterator, as a result adding or deleting a MIF would
cause a hang because the mrt_lock could not be acquired.
This fix is a copy of the code for the IPv4 case and ensures that the lock
is released correctly.
Signed-off-by: Richard Laing <richard.laing@alliedtelesis.co.nz>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit e41b0bedba0293b9e1e8d1e8ed553104b9693656 ]
We previously register IPPROTO_ROUTING offload under inet6_add_offload(),
but in error path, we try to unregister it with inet_del_offload(). This
doesn't seem correct, it should actually be inet6_del_offload(), also
ipv6_exthdrs_offload_exit() from that commit seems rather incorrect (it
also uses rthdr_offload twice), but it got removed entirely later on.
Fixes: 3336288a9fea ("ipv6: Switch to using new offload infrastructure.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit f50791ac1aca1ac1b0370d62397b43e9f831421a ]
It is needed to check EVENT_NO_RUNTIME_PM bit of dev->flags in
usbnet_stop(), but its value should be read before it is cleared
when dev->flags is set to 0.
The problem was spotted and the fix was provided by
Oliver Neukum <oneukum@suse.de>.
Signed-off-by: Eugene Shatokhin <eugene.shatokhin@rosalab.ru>
Acked-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit d4257295ba1b389c693b79de857a96e4b7cd8ac0 ]
When a tunnel is deleted, the cached dst entry should be released.
This problem may prevent the removal of a netns (seen with a x-netns IPv6
gre tunnel):
unregister_netdevice: waiting for lo to become free. Usage count = 3
CC: Dmitry Kozlov <xeb@mail.ru>
Fixes: c12b395a4664 ("gre: Support GRE over IPv6")
Signed-off-by: huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 468b732b6f76b138c0926eadf38ac88467dcd271 ]
"len" is a signed integer. We check that len is not negative, so it
goes from zero to INT_MAX. PAGE_SIZE is unsigned long so the comparison
is type promoted to unsigned long. ULONG_MAX - 4095 is a higher than
INT_MAX so the condition can never be true.
I don't know if this is harmful but it seems safe to limit "len" to
INT_MAX - 4095.
Fixes: a8c879a7ee98 ('RDS: Info and stats')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Upstream commit 0470eb99b4721586ccac954faac3fa4472da0845 ]
Kirill A. Shutemov says:
This simple test-case trigers few locking asserts in kernel:
int main(int argc, char **argv)
{
unsigned int block_size = 16 * 4096;
struct nl_mmap_req req = {
.nm_block_size = block_size,
.nm_block_nr = 64,
.nm_frame_size = 16384,
.nm_frame_nr = 64 * block_size / 16384,
};
unsigned int ring_size;
int fd;
fd = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
if (setsockopt(fd, SOL_NETLINK, NETLINK_RX_RING, &req, sizeof(req)) < 0)
exit(1);
if (setsockopt(fd, SOL_NETLINK, NETLINK_TX_RING, &req, sizeof(req)) < 0)
exit(1);
ring_size = req.nm_block_nr * req.nm_block_size;
mmap(NULL, 2 * ring_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
return 0;
}
+++ exited with 0 +++
BUG: sleeping function called from invalid context at /home/kas/git/public/linux-mm/kernel/locking/mutex.c:616
in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init
3 locks held by init/1:
#0: (reboot_mutex){+.+...}, at: [<ffffffff81080959>] SyS_reboot+0xa9/0x220
#1: ((reboot_notifier_list).rwsem){.+.+..}, at: [<ffffffff8107f379>] __blocking_notifier_call_chain+0x39/0x70
#2: (rcu_callback){......}, at: [<ffffffff810d32e0>] rcu_do_batch.isra.49+0x160/0x10c0
Preemption disabled at:[<ffffffff8145365f>] __delay+0xf/0x20
CPU: 1 PID: 1 Comm: init Not tainted 4.1.0-00009-gbddf4c4818e0 #253
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Debian-1.8.2-1 04/01/2014
ffff88017b3d8000 ffff88027bc03c38 ffffffff81929ceb 0000000000000102
0000000000000000 ffff88027bc03c68 ffffffff81085a9d 0000000000000002
ffffffff81ca2a20 0000000000000268 0000000000000000 ffff88027bc03c98
Call Trace:
<IRQ> [<ffffffff81929ceb>] dump_stack+0x4f/0x7b
[<ffffffff81085a9d>] ___might_sleep+0x16d/0x270
[<ffffffff81085bed>] __might_sleep+0x4d/0x90
[<ffffffff8192e96f>] mutex_lock_nested+0x2f/0x430
[<ffffffff81932fed>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
[<ffffffff81464143>] ? __this_cpu_preempt_check+0x13/0x20
[<ffffffff8182fc3d>] netlink_set_ring+0x1ed/0x350
[<ffffffff8182e000>] ? netlink_undo_bind+0x70/0x70
[<ffffffff8182fe20>] netlink_sock_destruct+0x80/0x150
[<ffffffff817e484d>] __sk_free+0x1d/0x160
[<ffffffff817e49a9>] sk_free+0x19/0x20
[..]
Cong Wang says:
We can't hold mutex lock in a rcu callback, [..]
Thomas Graf says:
The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.
Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Diagnosed-by: Cong Wang <cwang@twopensource.com>
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Stefan Guendhoer <stefan@guendhoer.com>
|