| Commit message (Collapse) | Author | Age | Files | Lines | |
|---|---|---|---|---|---|
| * | arm64: support __int128 on gcc 5+ | Jason A. Donenfeld | 2018-11-30 | 2 | -1/+81 |
| | | | | | | | | | | | | | | | | | | | | | | | | [Upstream: fb8722735f50cd51204bfbeefa2e5e7e9ff5b2be] [Upstream: 9bfe7553fadb269e45a6e10f68b727957dff5676] Versions of gcc prior to gcc 5 emitted a __multi3 function call when dealing with TI types, resulting in failures when trying to link to libgcc, and more generally, bad performance. However, since gcc 5, the compiler supports actually emitting fast instructions, which means we can at long last enable this option and receive the speedups. The gcc commit that added proper Aarch64 support is: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=d1ae7bb994f49316f6f63e6173f2931e837a351d This commit appears to be part of the gcc 5 release. There are still a few instructions, __lshrti3, __ashlti3, and __ashrti3, which require libgcc, which is fine. Rather than linking to libgcc, we simply provide them ourselves, since they're not that complicated. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> | ||||
| * | arm64: lib: improve copy_page to deal with 128 bytes at a time | Will Deacon | 2017-04-11 | 1 | -8/+38 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | We want to avoid lots of different copy_page implementations, settling for something that is "good enough" everywhere and hopefully easy to understand and maintain whilst we're at it. This patch reworks our copy_page implementation based on discussions with Cavium on the list and benchmarking on Cortex-A processors so that: - The loop is unrolled to copy 128 bytes per iteration - The reads are offset so that we read from the next 128-byte block in the same iteration that we store the previous block - Explicit prefetch instructions are removed for now, since they hurt performance on CPUs with hardware prefetching - The loop exit condition is calculated at the start of the loop Change-Id: I0d9f3bbe4efa2751f41432a3b4b299fbb0e494be Signed-off-by: Will Deacon <will.deacon@arm.com> Tested-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com> | ||||
| * | arm64: lib: use pair accessors for copy_*_user routines | Will Deacon | 2016-09-10 | 3 | -18/+33 |
| | | | | | | | | | | | The AArch64 instruction set contains load/store pair memory accessors, so use these in our copy_*_user routines to transfer 16 bytes per iteration. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: franciscofranco <franciscofranco.1990@gmail.com> | ||||
| * | first commit | Meizu OpenSource | 2016-08-15 | 18 | -0/+1031 |
