xavi/android_kernel_m2note - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	(CR) ALPS03877842(For_mt6737m_35_n1_alps-mp-n1.mp1-V1_P113)	lingsen1	2019-07-20	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch Type: Customer Request CR ID: ALPS03877842 Severity: Description: [Patch Request] [PMS] mt, Project: mt6737M_35_N1, SW Version: alps-mp-n1.mp1-V1N/A Associated Files: device/mt/mt6737m_35_n1/ProjectConfig.mk vendor/mt/libs/libmtk-art-runtime/arm/libmtk-art-runtime.a Patch Type: Customer Request CR ID: ALPS03683903 Severity: Critical Description: [Buganizer]Security Vulnerability Issue 70515752 - [An*d GO Pening] Mediatek Preloader Allows Arbitrary Peripheral Memory Reads and Writes [[Title foustomer]] [Buganizer]Security Vulnerability Issue 70515752 - [And GO Pening] Mediatek Preloader Allows Arbitrary Peripheral Memory Reads and Writes [[Problem Description]] [Buganizer]Security Vulnerability Issue 70515752 - [And GO Pening] Mediatek Preloader Allows Arbitrary Peripheral Memory Reads and Writes [[Potential Impa* of the solution]] No [[Modules to be verified after taking p*h]] No [[問題標題]] [Buganizer]Security Vulnerability Issue 70515752 - [And GO Pening] Mediatek Preloader Allows Arbitrary Peripheral Memory Reads and Writes [[問題現象]] [Buganizer]Security Vulnerability Issue 70515752 - [And GO Pening] Mediatek Preloader Allows Arbitrary Peripheral Memory Reads and Writes [[解法可能帶來的影響]] (請填寫於此行下方，並描述如果合入這個ph可能會有什麼trade off的改變，如perfo***e降低、UI改變等等) No [[建議驗證模塊]] (請填寫於此行下方，並建議客戶合了此ph後要驗證哪些module或feature) NoN/A Associated Files: vendor/mediatek/proprietary/bootable/bootloader/preloader/platform/mt6735/src/core/download.c vendor/mediatek/proprietary/bootable/bootloader/preloader/platform/mt6735/src/core/inc/download.h vendor/mediatek/proprietary/bootable/bootloader/preloader/platform/mt6735/src/drivers/inc/mt6735.h vendor/mediatek/proprietary/bootable/bootloader/preloader/platform/mt6735/src/security/inc/sec_region.h vendor/mediatek/proprietary/bootable/bootloader/preloader/platform/mt6735/src/security/sec_region.c Patch Type: Customer Request CR ID: ALPS03693488 Severity: Critical Description: [Buganizer]Security Vulnerability Issue 70515281 - [And GO Pening] Mediatek Preloader ¡§Download Mode¡¨ Memory Corruption [[Title foustomer]] [Buganizer]Security Vulnerability Issue 70515281 - [And GO Pening] Mediatek Preloader Download Mode Memory Corruption [[Problem Description]] [Buganizer]Security Vulnerability Issue 70515281 - [And GO Pening] Mediatek Preloader Download Mode Memory Corruption [[Potential Impa* of the solution]] no [[Modules to be verified after taking p*h]] boot [[問題標題]] [Buganizer]Security Vulnerability Issue 70515281 - [And GO Pening] Mediatek Preloader Download Mode Memory Corruption [[問題現象]] [Buganizer]Security Vulnerability Issue 70515281 - [And GO Pening] Mediatek Preloader Download Mode Memory Corruption [[解法可能帶來的影響]] (請填寫於此行下方，並描述如果合入這個ph可能會有什麼trade off的改變，如perfo***e降低、UI改變等等) no [[建議驗證模塊]] (請填寫於此行下方，並建議客戶合了此ph後要驗證哪些module或feature) bootN/A Associated Files: vendor/mediatek/proprietary/bootable/bootloader/preloader/platform/mt6735/link_descriptor.ld vendor/mediatek/proprietary/bootable/bootloader/preloader/platform/mt6735/src/core/partition.c Patch Type: Customer Request CR ID: ALPS03740330 Severity: Critical Description: [Buganizer]Security Vulnerability Issue 71867247 - [And GO Pening] - Remoemory Corruption in Mediatek WiFi TLDS Frame Parser [[Title foustomer]] [Buganizer]Security Vulnerability Issue 71867247 - [And GO Pening] - Remoemory Corruption in Mediatek WiFi TLDS Frame Parser [[Problem Description]] [Buganizer]Security Vulnerability Issue 71867247 - [And GO Pening] - Remoemory Corruption in Mediatek WiFi TLDS Frame Parser [[Potential Impa of the solution]] None [[Modules to be verified after taking ph]] None [[問題標題]] [Buganizer]Security Vulnerability Issue 71867247 - [And GO Pening] - Remoemory Corruption in Mediatek WiFi TLDS Frame Parser [[問題現象]] [Buganizer]Security Vulnerability Issue 71867247 - [And GO Pening] - Remoemory Corruption in Mediatek WiFi TLDS Frame Parser [[解法可能帶來的影響]] (請填寫於此行下方，並描述如果合入這個ph可能會有什麼trade off的改變，如perfo**e降低、UI改變等等) None [[建議驗證模塊]] (請填寫於此行下方，並建議客戶合了此ph後要驗證哪些module或feature) NoneN/A Associated Files: kernel-3.18/drivers/misc/mediatek/connectivity/wlan/gen2/mgmt/tdls.c Patch Type: Customer Request CR ID: ALPS03862169 Severity: Critical Description: [Google Security Ph][CVE_2017_13311]EoP Vulnerability in ProcessStats [[Title foustomer]] [Google Security Ph][CVE_2017_13311]EoP Vulnerability in ProcessStats [[Problem Description]] [Google Security Ph][CVE_2017_13311]EoP Vulnerability in ProcessStats [[Potential Impa of the solution]] None [[Modules to be verified after taking ph]] None [[問題標題]] [Google Security Ph][CVE_2017_13311]EoP Vulnerability in ProcessStats [[問題現象]] [Google Security Ph][CVE_2017_13311]EoP Vulnerability in ProcessStats [[解法可能帶來的影響]] (請填寫於此行下方，並描述如果合入這個ph可能會有什麼trade off的改變，如perfo**e降低、UI改變等等) None [[建議驗證模塊]] (請填寫於此行下方，並建議客戶合了此ph後要驗證哪些module或feature) NoneN/A Associated Files: frameworks/base/core/java/com/android/internal/app/procstats/SparseMappingTable.java Patch Type: Customer Request CR ID: ALPS03862180 Severity: Critical Description: [Google Security Ph][CVE_2017_13316]ID Vulnerability in Speech recognizer [[Title foustomer]] [Google Security Ph][CVE_2017_13316]ID Vulnerability in Speech recognizer [[Problem Description]] [Google Security Ph][CVE_2017_13316]ID Vulnerability in Speech recognizer [[Potential Impa of the solution]] None [[Modules to be verified after taking ph]] None [[問題標題]] [Google Security Ph][CVE_2017_13316]ID Vulnerability in Speech recognizer [[問題現象]] [Google Security Ph][CVE_2017_13316]ID Vulnerability in Speech recognizer [[解法可能帶來的影響]] (請填寫於此行下方，並描述如果合入這個ph可能會有什麼trade off的改變，如perfo**e降低、UI改變等等) None [[建議驗證模塊]] (請填寫於此行下方，並建議客戶合了此ph後要驗證哪些module或feature) NoneN/A Associated Files: frameworks/base/core/java/android/content/PermissionChecker.java frameworks/base/core/java/android/speech/RecognitionService.java Patch Type: Customer Request CR ID: ALPS03862195 Severity: Critical Description: [Google Security Ph][CVE_2017_13319]ID/DoS Vulnerability in MP3 codec [[Title foustomer]] [Google Security Ph][CVE_2017_13319]ID/DoS Vulnerability in MP3 codec [[Problem Description]] [Google Security Ph][CVE_2017_13319]ID/DoS Vulnerability in MP3 codec [[Potential Impa of the solution]] None [[Modules to be verified after taking ph]] None [[問題標題]] [Google Security Ph][CVE_2017_13319]ID/DoS Vulnerability in MP3 codec [[問題現象]] [Google Security Ph][CVE_2017_13319]ID/DoS Vulnerability in MP3 codec [[解法可能帶來的影響]] (請填寫於此行下方，並描述如果合入這個ph可能會有什麼trade off的改變，如perfo**e降低、UI改變等等) None [[建議驗證模塊]] (請填寫於此行下方，並建議客戶合了此ph後要驗證哪些module或feature) NoneN/A Associated Files: frameworks/av/media/libstagefright/codecs/mp3dec/src/pvmp3_decode_header.cpp Patch Type: Customer Request CR ID: ALPS03862206 Severity: Critical Description: [Google Security Ph][CVE_2017_16643]ID Vulnerability in USB driver (Device Specific) [[Title foustomer]] [Google Security Ph][CVE_2017_16643]ID Vulnerability in USB driver (Device Specific) [[Problem Description]] [Google Security Ph][CVE_2017_16643]ID Vulnerability in USB driver (Device Specific) [[Potential Impa of the solution]] None [[Modules to be verified after taking ph]] None [[問題標題]] [Google Security Ph][CVE_2017_16643]ID Vulnerability in USB driver (Device Specific) [[問題現象]] [Google Security Ph][CVE_2017_16643]ID Vulnerability in USB driver (Device Specific) [[解法可能帶來的影響]] (請填寫於此行下方，並描述如果合入這個ph可能會有什麼trade off的改變，如perfo**e降低、UI改變等等) None [[建議驗證模塊]] (請填寫於此行下方，並建議客戶合了此p*h後要驗證哪些module或feature) NoneN/A Associated Files: kernel-3.18/drivers/input/tablet/gtco.c Change-Id: I584cb0ab7b367a80b61730adea475093ca98f3f4
*	ANDROID: sdcardfs: Protect set_top	Daniel Rosenberg	2019-07-20	4	-28/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If the top is changed while we're attempting to use it, it's possible that the reference will be put while we are in the process of grabbing a reference. Now we grab a spinlock to protect grabbing our reference count. Additionally, we now set the inode_info's top value to point to it's own data when initializing, which makes tracking changes easier. Change-Id: If15748c786ce4c0480ab8c5051a92523aff284d2 Signed-off-by: Daniel Rosenberg <drosen@google.com>
*	Revert "ANDROID: sdcardfs: notify lower file of opens"	Daniel Rosenberg	2019-07-20	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit fd825dd8ffd9c4873f80438c3030dd21c204512d. Instead of calling notify within sdcardfs, which reverse the order of notifications during an open with truncate, we'll make fs_notify worry about it. Change-Id: Ic634401c0f223500066300a4df8b1453a0b35b60 Bug: 70706497 Signed-off-by: Daniel Rosenberg <drosen@google.com>
*	ANDROID: sdcardfs: Use lower getattr times/size	Daniel Rosenberg	2019-07-20	1	-10/+9
\| \| \| \| \| \| \| \| \|	We now use the lower filesystem's getattr for time and size related information. Change-Id: I3dd05614a0c2837a13eeb033444fbdf070ddce2a Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 72007585
*	mm/oom_kill: squashed reverts to a stable state	Corinna Vinschen	2019-07-19	10	-202/+215
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revert "mm, oom: fix use-after-free in oom_kill_process" This reverts commit e1bebdeedb497f03d426c85a89c3807c7e75268d. Signed-off-by: Corinna Vinschen <xda@vinschen.de> Revert "mm,oom: make oom_killer_disable() killable" This reverts commit 65a7400a432639aa8d5e572f30687fbca204b6f8. Signed-off-by: Corinna Vinschen <xda@vinschen.de> Revert "mm: oom_kill: don't ignore oom score on exiting tasks" This reverts commit d60dae46b27a8f381e4a7ad9dde870faa49fa5f1. Signed-off-by: Corinna Vinschen <xda@vinschen.de> Revert "mm/oom_kill.c: avoid attempting to kill init sharing same memory" This reverts commit 10773c0325259d6640b93c0694b5598ddf84939f. Signed-off-by: Corinna Vinschen <xda@vinschen.de> Revert "CHROMIUM: DROP: mm/oom_kill: Double-check before killing a child in our place" This reverts commit 2bdd9a2042a0e12d96c545773d9d8038c920f813. Revert "mm/oom_kill: fix the wrong task->mm == mm checks in oom_kill_process()" This reverts commit 419a313435b31821e4d045ca4b7ea1cc5fa02035. Signed-off-by: Corinna Vinschen <xda@vinschen.de> Revert "mm/oom_kill: cleanup the "kill sharing same memory" loop" This reverts commit afda78c6de38f9f66eba0955153b380d540d8276. Revert "mm/oom_kill: remove the wrong fatal_signal_pending() check in oom_kill_process()" This reverts commit acde9c2ace298b249c06ec5b0b971c333449dc09. Signed-off-by: Corinna Vinschen <xda@vinschen.de> Revert "mm, oom: remove task_lock protecting comm printing" This reverts commit 9a9ca142d250ec9de1215284857f4528c6ddb080. Signed-off-by: Corinna Vinschen <xda@vinschen.de> Revert "mm/oom_kill.c: suppress unnecessary "sharing same memory" message" This reverts commit 1aa2960f7c70d65b1481f805ac73b988faff6747. Signed-off-by: Corinna Vinschen <xda@vinschen.de> Revert "mm/oom_kill.c: reverse the order of setting TIF_MEMDIE and sending SIGKILL" This reverts commit f028aedfcfd2e2bb98921b98d3ae183387ab8fed. Revert "mm, oom: remove unnecessary variable" This reverts commit 54b0b58224146d68a11bccb5e64683ab3029373a. Revert "mm/oom_kill.c: print points as unsigned int" This reverts commit 603f975a6d4f0b56c7f6df7889ef2a704eca94a3. Signed-off-by: Corinna Vinschen <xda@vinschen.de> Revert "mm: oom_kill: simplify OOM killer locking" This reverts commit 7951a52ed35d162063fa08b27894e302fd716ccd. Revert "mm: oom_kill: remove unnecessary locking in exit_oom_victim()" This reverts commit f0739b25ac884682865d6aae7485e79489107bfb. Revert "mm: oom_kill: generalize OOM progress waitqueue" This reverts commit eb4b1243c72ba0b392bbe05dbf9f91959f70eb18. Revert "mm: oom_kill: switch test-and-clear of known TIF_MEMDIE to clear" This reverts commit e611f16275c3642cb8a6345ff2470926fef52110. Revert "mm: oom_kill: clean up victim marking and exiting interfaces" This reverts commit c6fada01b9370e3d7603b4ad8c26b56759174667. Revert "mm: oom_kill: remove unnecessary locking in oom_enable()" This reverts commit 5dd152d7351b3805f59b2b1f624722ab2f3c5fd8. Revert "oom, PM: make OOM detection in the freezer path raceless" This reverts commit 5fc5b1ddee5404a7629dd7045f54eaf8941bc11c.
*	mm: Add notifier framework for showing memory	Laura Abbott	2019-07-19	3	-1/+76
\| \| \| \| \| \| \| \| \| \| \| \|	There are many drivers in the kernel which can hold on to lots of memory. It can be useful to dump out all those drivers at key points in the kernel. Introduct a notifier framework for dumping this information. When the notifiers are called, drivers can dump out the state of any memory they may be using. Change-Id: Ifb2946964bf5d072552dd56d8d6dfdd794af6d84 Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
*	memcg: Allow non-root users permission to control memory	Chintan Pandya	2019-07-19	1	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In a system like Android, a process with SYS_ADMIN rights controls the system for things like moving process from one cgroup to another. The native cgroup capabilities are only allowed to execute by root user and not system. While adding a new cgroup sub-system, one may override and relax the permission so that 'system' can also control cgroup. Here, memcg is one such cgroup sub system which requires system level control for that. Allow non-root processes to add arbitrary into 'memory' cgroups if it has 'CAP_SYS_ADMIN' capability set. Change-Id: I43d4468186f142c176cb5b5f060751bb1b160344 Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
*	unifdef.c: use memcpy() instead of the dodgy strncpy().	Tony Finch	2019-07-18	1	-2/+2
\| \| \| \| \| \| \| \|	This makes it clearer that I do not want a '\0' terminator. Submitted by: Carsten Hey <carsten@debian.org> Change-Id: I7b14346e2c32604afdbfd0e6b08baabe8a0ec54b
*	DISP: Printk too much	Elvin Zhang	2019-07-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	[Detail] Replace DISPMSG() as DISPDBG() to reduce printk log MTK-Commit-Id: d9613f32bb286cea1ce1f4cd87a2af91557643fb Change-Id: I2d072885b6c83113490dc27823c822860ec201a5 Signed-off-by: Elvin Zhang <elvin.zhang@mediatek.com> CR-Id:ALPS03499038 Feature:Display Driver
*	GPU DVFS: fix procfs write KE	Brian-SY Yang	2019-07-18	1	-6/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[Detail] KE always happens when write /proc/gpufreq/gpufreq_fixed_freq by IoFuzz test [Solution] add input freq check MTK-Commit-Id: 74092efbcddc8d1584e56bb81df4722affa0b512 Change-Id: I10525c42e946088d63b8adeb29594f754710747f Signed-off-by: Brian-SY Yang <brian-sy.yang@mediatek.com> CR-Id: ALPS03519258 Feature: Others (cherry picked from commit bcbce651ad5b50bc7add53f65c0c355a3b932c33) (cherry picked from commit fa8f434d44293d39b89b3b1585ae114fa1f1d549)
*	masp: fix ioctl: SEC_GET_RANDOM_ID memory check range	Chin-Ting Kuo	2019-07-18	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[Detail] Size of RID is 16 bytes instead of 4 bytes. Instead of using "unsigned int" as input type of _IOR(), a new struct "sec_rid" which is 16 bytes in size is declared and used in order to make memory access permission check range correct. MTK-Commit-Id: 4e1c03ca23666da29bbcd024839de5ad8a3fa143 Change-Id: I892b71fb082b5b2335d29436fee1bc61cf14fc15 Signed-off-by: Chin-Ting Kuo <chin-ting.kuo@mediatek.com> CR-Id: ALPS03523553 Feature: Vulnerability Scan
*	smi: log only for wrong ioctl	Jacky Chen	2019-07-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	[Detail] log only without aee for wrong ioctl MTK-Commit-Id: bb7c3da5b777f7841bf42f3d2b3e2b5f82bc135e Change-Id: I39b4360faeb297f9febefce9c3b3b9885e0c097b Signed-off-by: Jacky Chen <ming-fan.chen@mediatek.com> CR-Id: ALPS03592077 Feature: smi
*	vibrator: delete more log	Shangbing Hu	2019-07-18	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[Detail] delete more log [Solution] delete more log MTK-Commit-Id: 1f1494edf8bb600dfede431d102a5fbbaa04816a Change-Id: I6309eb44c76b588ff44dd7f2a937b3e4c5d5e7bb Signed-off-by: Shangbing Hu <shangbing.hu@mediatek.com> CR-Id: ALPS02571387 Feature: WiFi Calling Service (cherry picked from commit 93f355c2b37d923cd463bb71e20dc8c7e7596cca) (cherry picked from commit b0e147971dcd0d178a1ee6043dcd49dec5f434e7)
*	msdc: mt6735: fix code defect	Edison Liu	2019-07-18	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[Detail] A malicious userspace application can corrupt kernel memory. the offset is not limited, so it will becomes a powerful arbitrary memory read/write primitive. [Solution] set the limit of the offset from 0 to 0xFFFF MTK-Commit-Id: 91446a30b6123dd3391074062dc9833d09dbcc54 Change-Id: Icf733233133bd8ed734ec69a3567e06281d982ff Signed-off-by: Edison Liu <Edison.Liu@mediatek.com> CR-Id: ALPS03684210 Feature: Others
*	Bluetooth: Fix regression with minimum encryption key size alignment	Marcel Holtmann	2019-07-18	2	-14/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 693cd8ce3f882524a5d06f7800dd8492411877b3 upstream. When trying to align the minimum encryption key size requirement for Bluetooth connections, it turns out doing this in a central location in the HCI connection handling code is not possible. Original Bluetooth version up to 2.0 used a security model where the L2CAP service would enforce authentication and encryption. Starting with Bluetooth 2.1 and Secure Simple Pairing that model has changed into that the connection initiator is responsible for providing an encrypted ACL link before any L2CAP communication can happen. Now connecting Bluetooth 2.1 or later devices with Bluetooth 2.0 and before devices are causing a regression. The encryption key size check needs to be moved out of the HCI connection handling into the L2CAP channel setup. To achieve this, the current check inside hci_conn_security() has been moved into l2cap_check_enc_key_size() helper function and then called from four decisions point inside L2CAP to cover all combinations of Secure Simple Pairing enabled devices and device using legacy pairing and legacy service security model. Fixes: d5bb334a8e17 ("Bluetooth: Align minimum encryption key size for LE and BR/EDR connections") Change-Id: I7bccd0e917f183affd7cce670203ed92dc79a4e2 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203643 Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
*	Bluetooth: Align minimum encryption key size for LE and BR/EDR connections	Marcel Holtmann	2019-07-18	2	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit d5bb334a8e171b262e48f378bd2096c0ea458265 upstream. The minimum encryption key size for LE connections is 56 bits and to align LE with BR/EDR, enforce 56 bits of minimum encryption key size for BR/EDR connections as well. Change-Id: Iaa1e00cab1ca82f42098c461f91fe370e501d826 Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
*	Bluetooth: Fix L2CAP information request handling for fixed channels	Johan Hedberg	2019-07-18	1	-20/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Even if we have no connection-oriented channels we should perform the L2CAP Information Request procedures before notifying L2CAP channels of the connection. This is so that the L2CAP channel implementations can perform checks on what the remote side supports (e.g. does it support the fixed channel in question). So far the code has relied on the l2cap_do_start() function to initiate the Information Request, however l2cap_do_start() is used on a per-channel basis and only for connection-oriented channels. This means that if there are no connection-oriented channels on the system we would never start the Information Request procedure. This patch creates a new l2cap_request_info() helper function to initiate the Information Request procedure, and ensures that it is called whenever a BR/EDR connection has been established. The patch also updates fixed channels to be notified of connection readiness only once the Information Request procedure has completed. Change-Id: I36a482189bf4735c4dc81b2668f08aa032edfdc7 Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
*	Bluetooth: Convert hci_conn->link_mode into flags	Johan Hedberg	2019-07-18	5	-32/+55
\| \| \| \| \| \| \| \| \| \| \| \| \|	Since the link_mode member of the hci_conn struct is a bit field and we already have a flags member as well it makes sense to merge these two together. This patch moves all used link_mode bits into corresponding flags. To keep backwards compatibility with user space we still need to provide a get_link_mode() helper function for the ioctl's that expect a link_mode style value. Change-Id: Ia885bce68ab454ad47230a6a577e7ddd9319d73c Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
*	Bluetooth: use l2cap_chan_ready() instead of duplicate code	Gustavo Padovan	2019-07-18	1	-6/+1
\| \| \| \| \| \| \| \| \| \| \| \|	In this case the replacement by l2cap_chan_ready() doesn't change the code flow, the same operations will executed plus two others that have no effect: the use of the parent socket, that a non-oriented channel doesn't have and the reset of conf_state, which is also fine since the connection is ready at this point. Change-Id: I96a54cf02cfefa546949f71d2f44ffaee1c2108c Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
*	tcp: refine memory limit test in tcp_fragment()	Eric Dumazet	2019-07-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit b6653b3629e5b88202be3c9abc44713973f5c4b4 upstream. tcp_fragment() might be called for skbs in the write queue. Memory limits might have been exceeded because tcp_sendmsg() only checks limits at full skb (64KB) boundaries. Therefore, we need to make sure tcp_fragment() wont punish applications that might have setup very low SO_SNDBUF values. Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits") Change-Id: If9ae777f0ccfdde732f94350aa943274ccb1d541 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Christoph Paasch <cpaasch@apple.com> Tested-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
*	BACKPORT: tcp: enforce tcp_min_snd_mss in tcp_mtu_probing()	Eric Dumazet	2019-07-18	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 967c05aee439e6e5d7d805e195b3a20ef5c433d6 upstream. If mtu probing is enabled tcp_mtu_probing() could very well end up with a too small MSS. Use the new sysctl tcp_min_snd_mss to make sure MSS search is performed in an acceptable range. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [BACKPORT to 3.10: use previous sysctrl method] Signed-off-by: syphyr@gmail.com Change-Id: I02c8330a38992461b89081196b1b0ad0add0e6ad
*	BACKPORT: tcp: add tcp_min_snd_mss sysctl	Eric Dumazet	2019-07-18	4	-2/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 5f3e2bf008c2221478101ee72f5cb4654b9fc363 upstream. Some TCP peers announce a very small MSS option in their SYN and/or SYN/ACK messages. This forces the stack to send packets with a very high network/cpu overhead. Linux has enforced a minimal value of 48. Since this value includes the size of TCP options, and that the options can consume up to 40 bytes, this means that each segment can include only 8 bytes of payload. In some cases, it can be useful to increase the minimal value to a saner value. We still let the default to 48 (TCP_MIN_SND_MSS), for compatibility reasons. Note that TCP_MAXSEG socket option enforces a minimal value of (TCP_MIN_MSS). David Miller increased this minimal value in commit c39508d6f118 ("tcp: Make TCP_MAXSEG minimum more correct.") from 64 to 88. We might in the future merge TCP_MIN_SND_MSS and TCP_MIN_MSS. CVE-2019-11479 -- tcp mss hardcoded to 48 Signed-off-by: Eric Dumazet <edumazet@google.com> Suggested-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [BACKPORT to 3.10: use previous sysctrl method] Signed-off-by: syphyr@gmail.com Change-Id: Ib5e91a60fe4f4c00afc27ed92b1bd8dfe39fb7c9
*	tcp: limit payload size of sacked skbs	Eric Dumazet	2019-07-18	5	-8/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 3b4929f65b0d8249f19a50245cd88ed1a2f78cff upstream. Jonathan Looney reported that TCP can trigger the following crash in tcp_shifted_skb() : BUG_ON(tcp_skb_pcount(skb) < pcount); This can happen if the remote peer has advertized the smallest MSS that linux TCP accepts : 48 An skb can hold 17 fragments, and each fragment can hold 32KB on x86, or 64KB on PowerPC. This means that the 16bit witdh of TCP_SKB_CB(skb)->tcp_gso_segs can overflow. Note that tcp_sendmsg() builds skbs with less than 64KB of payload, so this problem needs SACK to be enabled. SACK blocks allow TCP to coalesce multiple skbs in the retransmit queue, thus filling the 17 fragments to maximal capacity. CVE-2019-11477 -- u16 overflow of TCP_SKB_CB(skb)->tcp_gso_segs Backport notes, provided by Joao Martins <joao.m.martins@oracle.com> v4.15 or since commit 737ff314563 ("tcp: use sequence distance to detect reordering") had switched from the packet-based FACK tracking and switched to sequence-based. v4.14 and older still have the old logic and hence on tcp_skb_shift_data() needs to retain its original logic and have @fack_count in sync. In other words, we keep the increment of pcount with tcp_skb_pcount(skb) to later used that to update fack_count. To make it more explicit we track the new skb that gets incremented to pcount in @next_pcount, and we get to avoid the constant invocation of tcp_skb_pcount(skb) all together. Fixes: 832d11c5cd07 ("tcp: Try to restore large SKBs while SACK processing") Change-Id: Ia549e9b12cd033edd93f90e13c6c0e255f74c399 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
*	tcp: tcp_fragment() should apply sane memory limits	Eric Dumazet	2019-07-18	3	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit f070ef2ac66716357066b683fb0baf55f8191a2e upstream. Jonathan Looney reported that a malicious peer can force a sender to fragment its retransmit queue into tiny skbs, inflating memory usage and/or overflow 32bit counters. TCP allows an application to queue up to sk_sndbuf bytes, so we need to give some allowance for non malicious splitting of retransmit queue. A new SNMP counter is added to monitor how many times TCP did not allow to split an skb if the allowance was exceeded. Note that this counter might increase in the case applications use SO_SNDBUF socket option to lower sk_sndbuf. CVE-2019-11478 : tcp_fragment, prevent fragmenting a packet when the socket is already using more than half the allowed space Change-Id: I594a9f68263f774fa6f0824042bc287bba6dc927 Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jonathan Looney <jtl@netflix.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: Bruce Curtis <brucec@netflix.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
*	ext4: zero out the unused memory region in the extent tree block	Sriram Rajagopalan	2019-07-18	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 592acbf16821288ecdc4192c47e3774a4c48bb64 upstream. This commit zeroes out the unused memory region in the buffer_head corresponding to the extent metablock after writing the extent header and the corresponding extent node entries. This is done to prevent random uninitialized data from getting into the filesystem when the extent block is synced. This fixes CVE-2019-11833. Signed-off-by: Sriram Rajagopalan <sriramr@arista.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Change-Id: I5d74c1731ed4806c8ddc748c08f4d325eedb5317
*	mm/mincore.c: make mincore() more conservative	Jiri Kosina	2019-07-18	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 134fca9063ad4851de767d1768180e5dede9a881 upstream. The semantics of what mincore() considers to be resident is not completely clear, but Linux has always (since 2.3.52, which is when mincore() was initially done) treated it as "page is available in page cache". That's potentially a problem, as that [in]directly exposes meta-information about pagecache / memory mapping state even about memory not strictly belonging to the process executing the syscall, opening possibilities for sidechannel attacks. Change the semantics of mincore() so that it only reveals pagecache information for non-anonymous mappings that belog to files that the calling process could (if it tried to) successfully open for writing; otherwise we'd be including shared non-exclusive mappings, which - is the sidechannel - is not the usecase for mincore(), as that's primarily used for data, not (shared) text [jkosina@suse.cz: v2] Link: http://lkml.kernel.org/r/20190312141708.6652-2-vbabka@suse.cz [mhocko@suse.com: restructure can_do_mincore() conditions] Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1903062342020.19912@cbobk.fhfr.pm Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Josh Snyder <joshs@netflix.com> Acked-by: Michal Hocko <mhocko@suse.com> Originally-by: Linus Torvalds <torvalds@linux-foundation.org> Originally-by: Dominique Martinet <asmadeus@codewreck.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Chinner <david@fromorbit.com> Cc: Kevin Easton <kevin@guarana.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Cyril Hrubis <chrubis@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Daniel Gruss <daniel@gruss.cc> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Change-Id: I683073478cd809cdbc21f852b959eba070ce0141
*	mm: introduce vma_is_anonymous(vma) helper	Oleg Nesterov	2019-07-18	2	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit b5330628546616af14ff23075fbf8d4ad91f6e25 upstream. special_mapping_fault() is absolutely broken. It seems it was always wrong, but this didn't matter until vdso/vvar started to use more than one page. And after this change vma_is_anonymous() becomes really trivial, it simply checks vm_ops == NULL. However, I do think the helper makes sense. There are a lot of ->vm_ops != NULL checks, the helper makes the caller's code more understandable (self-documented) and this is more grep-friendly. This patch (of 3): Preparation. Add the new simple helper, vma_is_anonymous(vma), and change handle_pte_fault() to use it. It will have more users. The name is not accurate, say a hpet_mmap()'ed vma is not anonymous. Perhaps it should be named vma_has_fault() instead. But it matches the logic in mmap.c/memory.c (see next changes). "True" just means that a page fault will use do_anonymous_page(). Change-Id: I024c69016c5125b6f40e990a2f63c6630f641b28 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.16 as dependency of "mm/mincore.c: make mincore() more conservative"; adjusted context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit e3bcb8e29b639d822175be5cb1b8e6b124edf98e)
*	neigh: fix use-after-free read in pneigh_get_next	Eric Dumazet	2019-07-18	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit f3e92cb8e2eb8c27d109e6fd73d3a69a8c09e288 ] Nine years ago, I added RCU handling to neighbours, not pneighbours. (pneigh are not commonly used) Unfortunately I missed that /proc dump operations would use a common entry and exit point : neigh_seq_start() and neigh_seq_stop() We need to read_lock(tbl->lock) or risk use-after-free while iterating the pneigh structures. We might later convert pneigh to RCU and revert this patch. sysbot reported : BUG: KASAN: use-after-free in pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158 Read of size 8 at addr ffff888097f2a700 by task syz-executor.0/9825 CPU: 1 PID: 9825 Comm: syz-executor.0 Not tainted 5.2.0-rc4+ #32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 pneigh_get_next.isra.0+0x24b/0x280 net/core/neighbour.c:3158 neigh_seq_next+0xdb/0x210 net/core/neighbour.c:3240 seq_read+0x9cf/0x1110 fs/seq_file.c:258 proc_reg_read+0x1fc/0x2c0 fs/proc/inode.c:221 do_loop_readv_writev fs/read_write.c:714 [inline] do_loop_readv_writev fs/read_write.c:701 [inline] do_iter_read+0x4a4/0x660 fs/read_write.c:935 vfs_readv+0xf0/0x160 fs/read_write.c:997 kernel_readv fs/splice.c:359 [inline] default_file_splice_read+0x475/0x890 fs/splice.c:414 do_splice_to+0x127/0x180 fs/splice.c:877 splice_direct_to_actor+0x2d2/0x970 fs/splice.c:954 do_splice_direct+0x1da/0x2a0 fs/splice.c:1063 do_sendfile+0x597/0xd00 fs/read_write.c:1464 __do_sys_sendfile64 fs/read_write.c:1525 [inline] __se_sys_sendfile64 fs/read_write.c:1511 [inline] __x64_sys_sendfile64+0x1dd/0x220 fs/read_write.c:1511 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x4592c9 Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f4aab51dc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00000000004592c9 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005 RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000080000000 R11: 0000000000000246 R12: 00007f4aab51e6d4 R13: 00000000004c689d R14: 00000000004db828 R15: 00000000ffffffff Allocated by task 9827: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 __do_kmalloc mm/slab.c:3660 [inline] __kmalloc+0x15c/0x740 mm/slab.c:3669 kmalloc include/linux/slab.h:552 [inline] pneigh_lookup+0x19c/0x4a0 net/core/neighbour.c:731 arp_req_set_public net/ipv4/arp.c:1010 [inline] arp_req_set+0x613/0x720 net/ipv4/arp.c:1026 arp_ioctl+0x652/0x7f0 net/ipv4/arp.c:1226 inet_ioctl+0x2a0/0x340 net/ipv4/af_inet.c:926 sock_do_ioctl+0xd8/0x2f0 net/socket.c:1043 sock_ioctl+0x3ed/0x780 net/socket.c:1194 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0xd5f/0x1380 fs/ioctl.c:696 ksys_ioctl+0xab/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 9824: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 pneigh_ifdown_and_unlock net/core/neighbour.c:812 [inline] __neigh_ifdown+0x236/0x2f0 net/core/neighbour.c:356 neigh_ifdown+0x20/0x30 net/core/neighbour.c:372 arp_ifdown+0x1d/0x21 net/ipv4/arp.c:1274 inetdev_destroy net/ipv4/devinet.c:319 [inline] inetdev_event+0xa14/0x11f0 net/ipv4/devinet.c:1544 notifier_call_chain+0xc2/0x230 kernel/notifier.c:95 __raw_notifier_call_chain kernel/notifier.c:396 [inline] raw_notifier_call_chain+0x2e/0x40 kernel/notifier.c:403 call_netdevice_notifiers_info+0x3f/0x90 net/core/dev.c:1749 call_netdevice_notifiers_extack net/core/dev.c:1761 [inline] call_netdevice_notifiers net/core/dev.c:1775 [inline] rollback_registered_many+0x9b9/0xfc0 net/core/dev.c:8178 rollback_registered+0x109/0x1d0 net/core/dev.c:8220 unregister_netdevice_queue net/core/dev.c:9267 [inline] unregister_netdevice_queue+0x1ee/0x2c0 net/core/dev.c:9260 unregister_netdevice include/linux/netdevice.h:2631 [inline] __tun_detach+0xd8a/0x1040 drivers/net/tun.c:724 tun_detach drivers/net/tun.c:741 [inline] tun_chr_close+0xe0/0x180 drivers/net/tun.c:3451 __fput+0x2ff/0x890 fs/file_table.c:280 ____fput+0x16/0x20 fs/file_table.c:313 task_work_run+0x145/0x1c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:185 [inline] exit_to_usermode_loop+0x273/0x2c0 arch/x86/entry/common.c:168 prepare_exit_to_usermode arch/x86/entry/common.c:199 [inline] syscall_return_slowpath arch/x86/entry/common.c:279 [inline] do_syscall_64+0x58e/0x680 arch/x86/entry/common.c:304 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff888097f2a700 which belongs to the cache kmalloc-64 of size 64 The buggy address is located 0 bytes inside of 64-byte region [ffff888097f2a700, ffff888097f2a740) The buggy address belongs to the page: page:ffffea00025fca80 refcount:1 mapcount:0 mapping:ffff8880aa400340 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea000250d548 ffffea00025726c8 ffff8880aa400340 raw: 0000000000000000 ffff888097f2a000 0000000100000020 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888097f2a600: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc ffff888097f2a680: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888097f2a700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ^ ffff888097f2a780: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff888097f2a800: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour") Change-Id: I2e2d47ab5ba1c740515c3a0ed93c96f43bc1696d Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
*	kernel/sysctl.c: add missing range check in do_proc_dointvec_minmax_conv	Zev Weiss	2019-07-18	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 8cf7630b29701d364f8df4a50e4f1f5e752b2778 upstream. This bug has apparently existed since the introduction of this function in the pre-git era (4500e91754d3 in Thomas Gleixner's history.git, "[NET]: Add proc_dointvec_userhz_jiffies, use it for proper handling of neighbour sysctls."). As a minimal fix we can simply duplicate the corresponding check in do_proc_dointvec_conv(). Change-Id: Ibf26281f3b0c0d35aacafa006341b6ff8e7e002f Link: http://lkml.kernel.org/r/20190207123426.9202-3-zev@bewilderbeest.net Signed-off-by: Zev Weiss <zev@bewilderbeest.net> Cc: Brendan Higgins <brendanhiggins@google.com> Cc: Iurii Zaikin <yzaikin@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
*	net-sysfs: Fix mem leak in netdev_register_kobject	YueHaibing	2019-07-18	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 895a5e96dbd6386c8e78e5b78e067dcc67b7f0ab upstream. syzkaller report this: BUG: memory leak unreferenced object 0xffff88837a71a500 (size 256): comm "syz-executor.2", pid 9770, jiffies 4297825125 (age 17.843s) hex dump (first 32 bytes): 00 00 00 00 ad 4e ad de ff ff ff ff 00 00 00 00 .....N.......... ff ff ff ff ff ff ff ff 20 c0 ef 86 ff ff ff ff ........ ....... backtrace: [<00000000db12624b>] netdev_register_kobject+0x124/0x2e0 net/core/net-sysfs.c:1751 [<00000000dc49a994>] register_netdevice+0xcc1/0x1270 net/core/dev.c:8516 [<00000000e5f3fea0>] tun_set_iff drivers/net/tun.c:2649 [inline] [<00000000e5f3fea0>] __tun_chr_ioctl+0x2218/0x3d20 drivers/net/tun.c:2883 [<000000001b8ac127>] vfs_ioctl fs/ioctl.c:46 [inline] [<000000001b8ac127>] do_vfs_ioctl+0x1a5/0x10e0 fs/ioctl.c:690 [<0000000079b269f8>] ksys_ioctl+0x89/0xa0 fs/ioctl.c:705 [<00000000de649beb>] __do_sys_ioctl fs/ioctl.c:712 [inline] [<00000000de649beb>] __se_sys_ioctl fs/ioctl.c:710 [inline] [<00000000de649beb>] __x64_sys_ioctl+0x74/0xb0 fs/ioctl.c:710 [<000000007ebded1e>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 [<00000000db315d36>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<00000000115be9bb>] 0xffffffffffffffff It should call kset_unregister to free 'dev->queues_kset' in error path of register_queue_kobjects, otherwise will cause a mem leak. Change-Id: I92df8236ce1a8d5d3e541a20f0247dc4d8e6e5ef Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: 1d24eb4815d1 ("xps: Transmit Packet Steering") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Backported to 3.16: net_device pointer is called "net", confusingly] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
*	perf header: Fix wrong node write in NUMA_TOPOLOGY feature	Jiri Olsa	2019-07-18	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit b00ccb27f97367d89e2d7b419ed198b0985be55d upstream. We are currently passing the node index instead of the real node number. Change-Id: I1c41c0d83666f26a56debdf9436fb1090a9fb8db Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: fbe96f29ce4b ("perf tools: Make perf.data more self-descriptive (v8)" Link: http://lkml.kernel.org/r/20190219095815.15931-2-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> [bwh: Backported to 3.16: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
*	jbd2: clear dirty flag when revoking a buffer from an older transaction	zhangyi (F)	2019-07-18	1	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 904cdbd41d749a476863a0ca41f6f396774f26e4 upstream. Now, we capture a data corruption problem on ext4 while we're truncating an extent index block. Imaging that if we are revoking a buffer which has been journaled by the committing transaction, the buffer's jbddirty flag will not be cleared in jbd2_journal_forget(), so the commit code will set the buffer dirty flag again after refile the buffer. fsx kjournald2 jbd2_journal_commit_transaction jbd2_journal_revoke commit phase 1~5... jbd2_journal_forget belongs to older transaction commit phase 6 jbddirty not clear __jbd2_journal_refile_buffer __jbd2_journal_unfile_buffer test_clear_buffer_jbddirty mark_buffer_dirty Finally, if the freed extent index block was allocated again as data block by some other files, it may corrupt the file data after writing cached pages later, such as during unmount time. (In general, clean_bdev_aliases() related helpers should be invoked after re-allocation to prevent the above corruption, but unfortunately we missed it when zeroout the head of extra extent blocks in ext4_ext_handle_unwritten_extents()). This patch mark buffer as freed and set j_next_transaction to the new transaction when it already belongs to the committing transaction in jbd2_journal_forget(), so that commit code knows it should clear dirty bits when it is done with the buffer. This problem can be reproduced by xfstests generic/455 easily with seeds (3246 3247 3248 3249). Change-Id: I7ecfbfb8504e213fc3325517268e9c288c443840 Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
*	ANDROID: sdcardfs: Wait for file flush to complete	syphyr	2019-07-18	1	-1/+3
\| \| \| \| \| \| \| \| \|	Sdcardfs needs to wait for the file to finish writing before returning an error. Backport from 3.18 to 3.10 kernel. Change-Id: I0fbdfd9a4c46ad34b8826099d9e3b255289d4794
*	mm, oom: fix use-after-free in oom_kill_process	Shakeel Butt	2019-07-08	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit cefc7ef3c87d02fc9307835868ff721ea12cc597 upstream. Syzbot instance running on upstream kernel found a use-after-free bug in oom_kill_process. On further inspection it seems like the process selected to be oom-killed has exited even before reaching read_lock(&tasklist_lock) in oom_kill_process(). More specifically the tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task() and the put_task_struct within for_each_thread() frees the tsk and for_each_thread() tries to access the tsk. The easiest fix is to do get/put across the for_each_thread() on the selected task. Now the next question is should we continue with the oom-kill as the previously selected task has exited? However before adding more complexity and heuristics, let's answer why we even look at the children of oom-kill selected task? The select_bad_process() has already selected the worst process in the system/memcg. Due to race, the selected process might not be the worst at the kill time but does that matter? The userspace can use the oom_score_adj interface to prefer children to be killed before the parent. I looked at the history but it seems like this is there before git history. Change-Id: Ie6b01d64139c7ff44709569168ef868f372c2b6d Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com Reported-by: syzbot+7fbbfa368521945f0e3d@syzkaller.appspotmail.com Fixes: 6b0c81b3be11 ("mm, oom: reduce dependency on tasklist_lock") Signed-off-by: Shakeel Butt <shakeelb@google.com> Reviewed-by: Roman Gushchin <guro@fb.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
*	mm,oom: make oom_killer_disable() killable	Tetsuo Handa	2019-07-08	1	-7/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	While oom_killer_disable() is called by freeze_processes() after all user threads except the current thread are frozen, it is possible that kernel threads invoke the OOM killer and sends SIGKILL to the current thread due to sharing the thawed victim's memory. Therefore, checking for SIGKILL is preferable than TIF_MEMDIE. Change-Id: I0ff3858a7ed4a808b8b21bd3382847d3150735e3 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Rientjes <rientjes@google.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm/oom_kill.c: avoid attempting to kill init sharing same memory	Chen Jie	2019-07-08	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's possible that an oom killed victim shares an ->mm with the init process and thus oom_kill_process() would end up trying to kill init as well. This has been shown in practice: Out of memory: Kill process 9134 (init) score 3 or sacrifice child Killed process 9134 (init) total-vm:1868kB, anon-rss:84kB, file-rss:572kB Kill process 1 (init) sharing same memory ... Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 And this will result in a kernel panic. If a process is forked by init and selected for oom kill while still sharing init_mm, then it's likely this system is in a recoverable state. However, it's better not to try to kill init and allow the machine to panic due to unkillable processes. [rientjes@google.com: rewrote changelog] [akpm@linux-foundation.org: fix inverted test, per Ben] Signed-off-by: Chen Jie <chenjie6@huawei.com> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I5b573781c077173b3a472ba8282357a31b193557
*	mm/oom_kill: fix the wrong task->mm == mm checks in oom_kill_process()	Oleg Nesterov	2019-07-08	1	-2/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Both "child->mm == mm" and "p->mm != mm" checks in oom_kill_process() are wrong. task->mm can be NULL if the task is the exited group leader. This means in particular that "kill sharing same memory" loop can miss a process with a zombie leader which uses the same ->mm. Note: the process_has_mm(child, p->mm) check is still not 100% correct, p->mm can be NULL too. This is minor, but probably deserves a fix or a comment anyway. [akpm@linux-foundation.org: document process_shares_mm() a bit] Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Kyle Walker <kwalker@redhat.com> Cc: Stanislav Kozina <skozina@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I88d95c6ea31359de6cb50834e6ddce87d3afd1d8
*	mm/oom_kill: cleanup the "kill sharing same memory" loop	Oleg Nesterov	2019-07-08	1	-8/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Purely cosmetic, but the complex "if" condition looks annoying to me. Especially because it is not consistent with OOM_SCORE_ADJ_MIN check which adds another if/continue. Change-Id: I72998fd97f3562849fae56d151e867d7cde1326c Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Kyle Walker <kwalker@redhat.com> Cc: Stanislav Kozina <skozina@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm/oom_kill: remove the wrong fatal_signal_pending() check in oom_kill_process()	Oleg Nesterov	2019-07-08	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fatal_signal_pending() was added to suppress unnecessary "sharing same memory" message, but it can't 100% help anyway because it can be false-negative; SIGKILL can be already dequeued. And worse, it can be false-positive due to exec or coredump. exec is mostly fine, but coredump is not. It is possible that the group leader has the pending SIGKILL because its sub-thread originated the coredump, in this case we must not skip this process. We could probably add the additional ->group_exit_task check but this patch just removes the wrong check along with pr_info(). Change-Id: Icbf79bac26785838980325a418924c5d44c97d9d Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Kyle Walker <kwalker@redhat.com> Cc: Stanislav Kozina <skozina@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm, oom: remove task_lock protecting comm printing	David Rientjes	2019-07-08	3	-19/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The oom killer takes task_lock() in a couple of places solely to protect printing the task's comm. A process's comm, including current's comm, may change due to /proc/pid/comm or PR_SET_NAME. The comm will always be NULL-terminated, so the worst race scenario would only be during update. We can tolerate a comm being printed that is in the middle of an update to avoid taking the lock. Other locations in the kernel have already dropped task_lock() when printing comm, so this is consistent. Change-Id: I89f64666a1db5d414aa53862fd6b665bbb8125bc Signed-off-by: David Rientjes <rientjes@google.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm/oom_kill.c: suppress unnecessary "sharing same memory" message	Tetsuo Handa	2019-07-08	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	oom_kill_process() sends SIGKILL to other thread groups sharing victim's mm. But printing "Kill process %d (%s) sharing same memory\n" lines makes no sense if they already have pending SIGKILL. This patch reduces the "Kill process" lines by printing that line with info level only if SIGKILL is not pending. Change-Id: I5eeffd256929781863cf4ac0691e22fb24be46f3 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm/oom_kill.c: reverse the order of setting TIF_MEMDIE and sending SIGKILL	Tetsuo Handa	2019-07-08	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It was confirmed that a local unprivileged user can consume all memory reserves and hang up that system using time lag between the OOM killer sets TIF_MEMDIE on an OOM victim and sends SIGKILL to that victim, for printk() inside for_each_process() loop at oom_kill_process() can consume many seconds when there are many thread groups sharing the same memory. Before starting oom-depleter process: Node 0 DMA: 34kB (UM) 68kB (U) 416kB (UEM) 032kB 064kB 1128kB (M) 2256kB (EM) 2512kB (UE) 21024kB (EM) 12048kB (E) 14096kB (M) = 9980kB Node 0 DMA32: 314kB (UEM) 278kB (UE) 3216kB (UE) 1332kB (UE) 1464kB (UM) 7128kB (UM) 8256kB (UM) 8512kB (UM) 31024kB (U) 42048kB (UM) 3624096kB (UM) = 1503220kB As of invoking the OOM killer: Node 0 DMA: 114kB (UE) 88kB (UEM) 616kB (UE) 232kB (EM) 064kB 1128kB (U) 3256kB (UEM) 2512kB (UE) 31024kB (UEM) 12048kB (U) 04096kB = 7308kB Node 0 DMA32: 10494kB (UEM) 5078kB (UE) 15116kB (UE) 5332kB (UEM) 8364kB (UEM) 52128kB (EM) 25256kB (UEM) 11512kB (M) 61024kB (UM) 12048kB (M) 04096kB = 44556kB Between the thread group leader got TIF_MEMDIE and receives SIGKILL: Node 0 DMA: 04kB 08kB 016kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 04096kB = 0kB Node 0 DMA32: 04kB 08kB 016kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 04096kB = 0kB The oom-depleter's thread group leader which got TIF_MEMDIE started memset() in user space after the OOM killer set TIF_MEMDIE, and it was free to abuse ALLOC_NO_WATERMARKS by TIF_MEMDIE for memset() in user space until SIGKILL is delivered. If SIGKILL is delivered before TIF_MEMDIE is set, the oom-depleter can terminate without touching memory reserves. Although the possibility of hitting this time lag is very small for 3.19 and earlier kernels because TIF_MEMDIE is set immediately before sending SIGKILL, preemption or long interrupts (an extreme example is SysRq-t) can step between and allow memory allocations which are not needed for terminating the OOM victim. Fixes: 83363b917a29 ("oom: make sure that TIF_MEMDIE is set under task_lock") Change-Id: I4887754c2f1d9d193cc776069698546927a24cf5 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.com> Cc: David Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [4.0+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm, oom: remove unnecessary variable	David Rientjes	2019-07-08	1	-13/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	The "killed" variable in out_of_memory() can be removed since the call to oom_kill_process() where we should block to allow the process time to exit is obvious. Change-Id: Ic00ea1247542ce9c93a5ab18affd6f5b0c305aa9 Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm/oom_kill.c: print points as unsigned int	Wang Long	2019-07-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	In oom_kill_process(), the variable 'points' is unsigned int. Print it as such. Change-Id: Idfd50d95fe49d51d08005c1dfc249c9801c05a45 Signed-off-by: Wang Long <long.wanglong@huawei.com> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: oom_kill: simplify OOM killer locking	Johannes Weiner	2019-07-08	5	-114/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The zonelist locking and the oom_sem are two overlapping locks that are used to serialize global OOM killing against different things. The historical zonelist locking serializes OOM kills from allocations with overlapping zonelists against each other to prevent killing more tasks than necessary in the same memory domain. Only when neither tasklists nor zonelists from two concurrent OOM kills overlap (tasks in separate memcgs bound to separate nodes) are OOM kills allowed to execute in parallel. The younger oom_sem is a read-write lock to serialize OOM killing against the PM code trying to disable the OOM killer altogether. However, the OOM killer is a fairly cold error path, there is really no reason to optimize for highly performant and concurrent OOM kills. And the oom_sem is just flat-out redundant. Replace both locking schemes with a single global mutex serializing OOM kills regardless of context. Change-Id: Ieb0b621bc3a391cc0a826a3ae53bf28ea4a8dbe5 Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: oom_kill: remove unnecessary locking in exit_oom_victim()	Johannes Weiner	2019-07-08	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Disabling the OOM killer needs to exclude allocators from entering, not existing victims from exiting. Right now the only waiter is suspend code, which achieves quiescence by disabling the OOM killer. But later on we want to add waits that hold the lock instead to stop new victims from showing up. Change-Id: Icc7e5f3f30ebff2538501e8d0a4c9d03aacc6538 Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: oom_kill: generalize OOM progress waitqueue	Johannes Weiner	2019-07-08	1	-5/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out that the mechanism to wait for exiting OOM victims is less generic than it looks: it won't issue wakeups unless the OOM killer is disabled. The reason this check was added was the thought that, since only the OOM disabling code would wait on this queue, wakeup operations could be saved when that specific consumer is known to be absent. However, this is quite the handgrenade. Later attempts to reuse the waitqueue for other purposes will lead to completely unexpected bugs and the failure mode will appear seemingly illogical. Generally, providers shouldn't make unnecessary assumptions about consumers. This could have been replaced with waitqueue_active(), but it only saves a few instructions in one of the coldest paths in the kernel. Simply remove it. Change-Id: I5543005539c795ce4d5c67cc67781481750cc1e0 Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: oom_kill: switch test-and-clear of known TIF_MEMDIE to clear	Johannes Weiner	2019-07-08	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	exit_oom_victim() already knows that TIF_MEMDIE is set, and nobody else can clear it concurrently. Use clear_thread_flag() directly. Change-Id: Ic87613e60502357339905068c9a7b6d69ba0008f Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: oom_kill: clean up victim marking and exiting interfaces	Johannes Weiner	2019-07-08	5	-15/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rename unmark_oom_victim() to exit_oom_victim(). Marking and unmarking are related in functionality, but the interface is not symmetrical at all: one is an internal OOM killer function used during the killing, the other is for an OOM victim to signal its own death on exit later on. This has locking implications, see follow-up changes. While at it, rename mark_tsk_oom_victim() to mark_oom_victim(), which is easier on the eye. Change-Id: I8956f6357e98f17e0ae6096c6a2c7027886a4fda Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	mm: oom_kill: remove unnecessary locking in oom_enable()	Johannes Weiner	2019-07-08	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting oom_killer_disabled to false is atomic, there is no need for further synchronization with ongoing allocations trying to OOM-kill. Change-Id: I0d11c89d0949d9e9fbf870d50ef4eb398a78518f Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: David Rientjes <rientjes@google.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>