| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
|
| |
|
|
|
|
|
|
|
| |
Remove two unneeded `else's.
Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The bitmap_equal function has optimized code for small bitmaps with less
than BITS_PER_LONG bits. For larger bitmaps the out-of-line function
__bitmap_equal is called.
For a constant number of bits divisible by BITS_PER_LONG the memcmp
function can be used. For s390 gcc knows how to optimize this function,
memcmp calls with up to 256 bytes / 2048 bits are translated into a
single instruction.
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With this config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os
gcc-4.7.2 generates many copies of these tiny functions:
bitmap_weight (55 copies):
55 push %rbp
48 89 e5 mov %rsp,%rbp
e8 3f 3a 8b 00 callq __bitmap_weight
5d pop %rbp
c3 retq
hweight_long (23 copies):
55 push %rbp
e8 b5 65 8e 00 callq __sw_hweight64
48 89 e5 mov %rsp,%rbp
5d pop %rbp
c3 retq
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
This patch fixes this via s/inline/__always_inline/
While at it, replaced two "__inline__" with usual "inline"
(the rest of the source file uses the latter).
text data bss dec filename
86971357 17195880 36659200 140826437 vmlinux.before
86971120 17195912 36659200 140826232 vmlinux
Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1438697716-28121-1-git-send-email-dvlasenk@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The macro BITMAP_LAST_WORD_MASK can be implemented without a conditional,
which will generally lead to slightly better generated code (221 bytes
saved for allmodconfig-GCOV_KERNEL, ~2k with GCOV_KERNEL). As a small
bonus, this also ensures that the nbits parameter is expanded exactly
once.
In BITMAP_FIRST_WORD_MASK, if start is signed gcc is technically allowed
to assume it is positive (or divisible by BITS_PER_LONG), and hence just
do the simple mask. It doesn't seem to use this, and even on an
architecture like x86 where the shift only depends on the lower 5 or 6
bits, and these bits are not affected by the signedness of the expression,
gcc still generates code to compute the C99 mandated value of start %
BITS_PER_LONG. So just use a mask explicitly, also for consistency with
BITMAP_LAST_WORD_MASK.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: George Spelvin <linux@horizon.com>
Cc: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
bitmap_empty() has its own implementation. But it's clearly as simple as:
find_first_bit(src, nbits) == nbits
The same is true for 'bitmap_full'.
Signed-off-by: Yury Norov <yury.norov@gmail.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Alexey Klimov <klimov.linux@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
gcc can generate slightly better code for stuff like "nbits %
BITS_PER_LONG" when it knows nbits is not negative. Since negative size
bitmaps or shift amounts don't make sense, change these parameters of
bitmap_shift_right to unsigned.
If off >= lim (which requires shift >= nbits), k is initialized with a
large positive value, but since I've let k continue to be signed, the loop
will never run and dst will be zeroed as expected. Inside the loop, k is
guaranteed to be non-negative, so the fact that it is promoted to unsigned
in the various expressions it appears in is harmless.
Also use "shift" and "nbits" consistently for the parameter names.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
I've previously changed the nbits parameter of most bitmap_* functions to
unsigned; now it is bitmap_shift_{left,right}'s turn. This alone saves
some .text, but while at it I found that there were a few other things one
could do. The end result of these seven patches is
$ scripts/bloat-o-meter /tmp/bitmap.o.{old,new}
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-328 (-328)
function old new delta
__bitmap_shift_right 384 226 -158
__bitmap_shift_left 306 136 -170
and less importantly also a smaller stack footprint
$ stack-o-meter.pl master bitmap
file function old new delta
lib/bitmap.o __bitmap_shift_right 24 8 -16
lib/bitmap.o __bitmap_shift_left 24 0 -24
For each pair of 0 <= shift <= nbits <= 256 I've tested the end result
with a few randomly filled src buffers (including garbage beyond nbits),
in each case verifying that the shift {left,right}-most bits of dst are
zero and the remaining nbits-shift bits correspond to src, so I'm fairly
confident I didn't screw up. That hasn't stopped me from being wrong
before, though.
This patch (of 7):
gcc can generate slightly better code for stuff like "nbits %
BITS_PER_LONG" when it knows nbits is not negative. Since negative size
bitmaps or shift amounts don't make sense, change these parameters of
bitmap_shift_right to unsigned.
The expressions involving "lim - 1" are still ok, since if lim is 0 the
loop is never executed.
Also use "shift" and "nbits" consistently for the parameter names.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
| |
There is no guarantee that *src does not contain garbage bits outside
the lower nbits, so we need to mask it before the shift-and-assign.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
| |
On little-endian, there's no reason to have an extra, presumably less
efficient, way of copying a bitmap.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the prototype of bitmap_copy_le the same as bitmap_copy's. All other
bitmap_* functions take unsigned long* parameters; there's no reason this
should be special.
The only current user is the static inline uwb_mas_bm_copy_le, which
already does the void* laundering, so the end users can pass their u8 or
__le32 buffers without a cast.
Furthermore, this allows us to simply let bitmap_copy_le be an alias for
bitmap_copy on little-endian; see next patch.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
| |
Also, rename bits to nbits. Both changes for consistency with other
bitmap_* functions.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the return value and the ord and nbits parameters of
bitmap_ord_to_pos unsigned.
Also, simplify the implementation and as a side effect make the result
fully defined, returning nbits for ord >= weight, in analogy with what
find_{first,next}_bit does. This is a better sentinel than the former
("unofficial") 0. No current users are affected by this change.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
| |
For consistency with the other bitmap_* functions, also make the nbits
parameter of bitmap_zero, bitmap_fill and bitmap_copy unsigned.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Change the sz and nbits parameters of bitmap_fold to unsigned int for
consistency with other bitmap_* functions, and to save another few bytes
in the generated code.
[akpm@linux-foundation.org: fix kerneldoc]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
Change the nbits parameter of bitmap_onto to unsigned int for consistency
with other bitmap_* functions.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
Apparently, bitmap_andnot is supposed to return whether the new bitmap
is empty. But it didn't take potential garbage bits in the last word
into account.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
Apparently, bitmap_and is supposed to return whether the new bitmap is
empty. But it didn't take potential garbage bits in the last word into
account.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Changing the pos parameter of __reg_op to unsigned allows the compiler
to generate slightly smaller and simpler code. Also update its callers
bitmap_*_region to receive and pass unsigned int. The return types of
bitmap_find_free_region and bitmap_allocate_region are still int to
allow a negative error code to be returned. An int is certainly capable
of representing any realistic return value.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The compiler can generate slightly smaller and simpler code when it
knows that "start" is non-negative.
Also, use the names "start" and "len" for the two parameters for
consistency with bitmap_set.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This commit adds a bitmap_find_next_zero_area_off() function which
works like bitmap_find_next_zero_area() function expect it allows an
offset to be specified when alignment is checked. This lets caller
request a bit such that its number plus the offset is aligned
according to the mask.
Change-Id: Ib0593cf578ed69ba4c51b1e102a1f8ea1aeb93e8
Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Larry Bassel <lbassel@codeaurora.org>
(cherry picked from commit 5f2929128ae4db1a6577748c72437f102ed400a5)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The compiler can generate slightly smaller and simpler code when it
knows that "start" is non-negative.
Also, use the names "start" and "len" for the two parameters in both
header file and implementation, instead of the previous mix.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
I didn't change the return type, since that might change the semantics
of some expression containing a call to bitmap_weight(). Certainly an
int is capable of holding the result.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
| |
This change is only for consistency with the changes to the other
bitmap_* functions; it doesn't change the size of the generated code:
inside BITS_TO_LONGS there is a sizeof(long), which causes bits to be
interpreted as unsigned anyway.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
Since the extra bits are "don't care", there is no reason to mask the
last word to the used bits when complementing. This shaves off yet a
few bytes.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Many functions in lib/bitmap.c start with an expression such as lim =
bits/BITS_PER_LONG. Since bits has type (signed) int, and since gcc
cannot know that it is in fact non-negative, it generates worse code
than it could. These patches, mostly consisting of changing various
parameters to unsigned, gives a slight overall code reduction:
add/remove: 1/1 grow/shrink: 8/16 up/down: 251/-414 (-163)
function old new delta
tick_device_uses_broadcast 335 425 +90
__irq_alloc_descs 498 554 +56
__bitmap_andnot 73 115 +42
__bitmap_and 70 101 +31
bitmap_weight - 11 +11
copy_hugetlb_page_range 752 762 +10
follow_hugetlb_page 846 854 +8
hugetlb_init 1415 1417 +2
hugetlb_nrpages_setup 130 131 +1
hugetlb_add_hstate 377 376 -1
bitmap_allocate_region 82 80 -2
select_task_rq_fair 2202 2191 -11
hweight_long 66 55 -11
__reg_op 230 219 -11
dm_stats_message 2849 2833 -16
bitmap_parselist 92 74 -18
__bitmap_weight 115 97 -18
__bitmap_subset 153 129 -24
__bitmap_full 128 104 -24
__bitmap_empty 120 96 -24
bitmap_set 179 149 -30
bitmap_clear 185 155 -30
__bitmap_equal 136 105 -31
__bitmap_intersects 148 108 -40
__bitmap_complement 109 67 -42
tick_device_setup_broadcast_func.isra 81 - -81
[The increases in __bitmap_and{,not} are due to bug fixes 17/18,18/18.
No idea why bitmap_weight suddenly appears.] While 163 bytes treewide is
insignificant, I believe the bitmap functions are often called with
locks held, so saving even a few cycles might be worth it.
While making these changes, I found a few other things that might be
worth including. 16,17,18 are actual bug fixes. The rest shouldn't
change the behaviour of any of the functions, provided no-one passed
negative nbits values. If something should come up, it should be fairly
bisectable.
A few issues I thought about, but didn't know what to do with:
* Many of the functions misbehave if nbits is compile-time 0; the
out-of-line functions generally handle 0 correctly. bitmap_fill() is
particularly bad, whether the 0 is known at compile time or not. It
would probably be nice to add detection of at least compile-time 0 and
handle that appropriately.
* I didn't change __bitmap_shift_{left,right} to use unsigned because I
want to fully understand why the algorithm works before making that
change. However, AFAICT, they behave correctly for all (positive) shift
amounts. This is not the case for the small_const_nbits versions. If
for example nbits = n = BITS_PER_LONG, the shift operators turn into
no-ops (at least on x86), so one get *dst = *src, whereas one would
expect to get *dst=0. That difference in behaviour is somewhat
annoying.
This patch (of 18):
The compiler can generate slightly smaller and simpler code when it
knows that "nbits" is non-negative. Since no-one passes a negative
bit-count, this shouldn't affect the semantics.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|