| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The existing implementation of copy_page for ARM appears to be
optimized for older platforms. Benchmark testing in a sandbox
environment shows suboptimal performance on modern platforms
like armv6 and armv7, with speed-ups ranging from 10% (Cortex A8)
to 80% (armv6 used in Raspberry Pi) being achievable.
This commit optimizes copy_page and introduces the new compile-time
constant PREFETCH_DISTANCE, defined in cache.h, which when
multiplied by L1_CACHE_BYTES is equal to the offset used for
prefetches performed with the PLD instruction. For platforms where
L1_CACHE_BYTES is 32 (armv5 and armv6), copy_page processes 32 bytes
at a time while doing one prefetch per iteration, while for armv7
(with L1_CACHE_BYTES equal to 64), 64 bytes are processed at at time
with one prefetch per iteration. When no preload instruction is
available (platforms earlier than armv5), no preload instructions
are generated and 32 bytes are processed at at time.
To facilitate specifying instructions for architectures with no
preload instruction, the NO_PLD macro is added to assembler.h,
augmenting the PLD macro.
Signed-off-by: Harm Hanemaaijer <fgenfb@yahoo.com>
Signed-off-by: RyTek <rytek1128@outlook.com>
|