aboutsummaryrefslogtreecommitdiff
path: root/net/sched/Kconfig
Commit message (Collapse)AuthorAgeFilesLines
* net: pkt_sched: PIE AQM schemeVijay Subramanian2016-09-181-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Proportional Integral controller Enhanced (PIE) is a scheduler to address the bufferbloat problem. >From the IETF draft below: " Bufferbloat is a phenomenon where excess buffers in the network cause high latency and jitter. As more and more interactive applications (e.g. voice over IP, real time video streaming and financial transactions) run in the Internet, high latency and jitter degrade application performance. There is a pressing need to design intelligent queue management schemes that can control latency and jitter; and hence provide desirable quality of service to users. We present here a lightweight design, PIE(Proportional Integral controller Enhanced) that can effectively control the average queueing latency to a target value. Simulation results, theoretical analysis and Linux testbed results have shown that PIE can ensure low latency and achieve high link utilization under various congestion situations. The design does not require per-packet timestamp, so it incurs very small overhead and is simple enough to implement in both hardware and software. " Many thanks to Dave Taht for extensive feedback, reviews, testing and suggestions. Thanks also to Stephen Hemminger and Eric Dumazet for reviews and suggestions. Naeem Khademi and Dave Taht independently contributed to ECN support. For more information, please see technical paper about PIE in the IEEE Conference on High Performance Switching and Routing 2013. A copy of the paper can be found at ftp://ftpeng.cisco.com/pie/. Please also refer to the IETF draft submission at http://tools.ietf.org/html/draft-pan-tsvwg-pie-00 All relevant code, documents and test scripts and results can be found at ftp://ftpeng.cisco.com/pie/. For problems with the iproute2/tc or Linux kernel code, please contact Vijay Subramanian (vijaynsu@cisco.com or subramanian.vijay@gmail.com) Mythili Prabhu (mysuryan@cisco.com) Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: Mythili Prabhu <mysuryan@cisco.com> CC: Dave Taht <dave.taht@bufferbloat.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdiscTerry Lam2016-09-181-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the first size-based qdisc that attempts to differentiate between small flows and heavy-hitters. The goal is to catch the heavy-hitters and move them to a separate queue with less priority so that bulk traffic does not affect the latency of critical traffic. Currently "less priority" means less weight (2:1 in particular) in a Weighted Deficit Round Robin (WDRR) scheduler. In essence, this patch addresses the "delay-bloat" problem due to bloated buffers. In some systems, large queues may be necessary for obtaining CPU efficiency, or due to the presence of unresponsive traffic like UDP, or just a large number of connections with each having a small amount of outstanding traffic. In these circumstances, HHF aims to reduce the HoL blocking for latency sensitive traffic, while not impacting the queues built up by bulk traffic. HHF can also be used in conjunction with other AQM mechanisms such as CoDel. To capture heavy-hitters, we implement the "multi-stage filter" design in the following paper: C. Estan and G. Varghese, "New Directions in Traffic Measurement and Accounting", in ACM SIGCOMM, 2002. Some configurable qdisc settings through 'tc': - hhf_reset_timeout: period to reset counter values in the multi-stage filter (default 40ms) - hhf_admit_bytes: threshold to classify heavy-hitters (default 128KB) - hhf_evict_timeout: threshold to evict idle heavy-hitters (default 1s) - hhf_non_hh_weight: Weighted Deficit Round Robin (WDRR) weight for non-heavy-hitters (default 2) - hh_flows_limit: max number of heavy-hitter flow entries (default 2048) Note that the ratio between hhf_admit_bytes and hhf_reset_timeout reflects the bandwidth of heavy-hitters that we attempt to capture (25Mbps with the above default settings). The false negative rate (heavy-hitter flows getting away unclassified) is zero by the design of the multi-stage filter algorithm. With 100 heavy-hitter flows, using four hashes and 4000 counters yields a false positive rate (non-heavy-hitters mistakenly classified as heavy-hitters) of less than 1e-4. Signed-off-by: Terry Lam <vtlam@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: fq: Fair Queue packet schedulerEric Dumazet2016-09-181-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Uses perfect flow match (not stochastic hash like SFQ/FQ_codel) - Uses the new_flow/old_flow separation from FQ_codel - New flows get an initial credit allowing IW10 without added delay. - Special FIFO queue for high prio packets (no need for PRIO + FQ) - Uses a hash table of RB trees to locate the flows at enqueue() time - Smart on demand gc (at enqueue() time, RB tree lookup evicts old unused flows) - Dynamic memory allocations. - Designed to allow millions of concurrent flows per Qdisc. - Small memory footprint : ~8K per Qdisc, and 104 bytes per flow. - Single high resolution timer for throttled flows (if any). - One RB tree to link throttled flows. - Ability to have a max rate per flow. We might add a socket option to add per socket limitation. Attempts have been made to add TCP pacing in TCP stack, but this seems to add complex code to an already complex stack. TCP pacing is welcomed for flows having idle times, as the cwnd permits TCP stack to queue a possibly large number of packets. This removes the 'slow start after idle' choice, hitting badly large BDP flows, and applications delivering chunks of data as video streams. Nicely spaced packets : Here interface is 10Gbit, but flow bottleneck is ~20Mbit cwin is big, yet FQ avoids the typical bursts generated by TCP (as in netperf TCP_RR -- -r 100000,100000) 15:01:23.545279 IP A > B: . 78193:81089(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.545394 IP B > A: . ack 81089 win 3668 <nop,nop,timestamp 11597985 1115> 15:01:23.546488 IP A > B: . 81089:83985(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.546565 IP B > A: . ack 83985 win 3668 <nop,nop,timestamp 11597986 1115> 15:01:23.547713 IP A > B: . 83985:86881(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.547778 IP B > A: . ack 86881 win 3668 <nop,nop,timestamp 11597987 1115> 15:01:23.548911 IP A > B: . 86881:89777(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.548949 IP B > A: . ack 89777 win 3668 <nop,nop,timestamp 11597988 1115> 15:01:23.550116 IP A > B: . 89777:92673(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.550182 IP B > A: . ack 92673 win 3668 <nop,nop,timestamp 11597989 1115> 15:01:23.551333 IP A > B: . 92673:95569(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.551406 IP B > A: . ack 95569 win 3668 <nop,nop,timestamp 11597991 1115> 15:01:23.552539 IP A > B: . 95569:98465(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.552576 IP B > A: . ack 98465 win 3668 <nop,nop,timestamp 11597992 1115> 15:01:23.553756 IP A > B: . 98465:99913(1448) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.554138 IP A > B: P 99913:100001(88) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805> 15:01:23.554204 IP B > A: . ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.554234 IP B > A: . 65248:68144(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.555620 IP B > A: . 68144:71040(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.557005 IP B > A: . 71040:73936(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.558390 IP B > A: . 73936:76832(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.559773 IP B > A: . 76832:79728(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115> 15:01:23.561158 IP B > A: . 79728:82624(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.562543 IP B > A: . 82624:85520(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.563928 IP B > A: . 85520:88416(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.565313 IP B > A: . 88416:91312(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.566698 IP B > A: . 91312:94208(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.568083 IP B > A: . 94208:97104(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.569467 IP B > A: . 97104:100000(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.570852 IP B > A: . 100000:102896(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.572237 IP B > A: . 102896:105792(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.573639 IP B > A: . 105792:108688(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.575024 IP B > A: . 108688:111584(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.576408 IP B > A: . 111584:114480(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> 15:01:23.577793 IP B > A: . 114480:117376(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115> TCP timestamps show that most packets from B were queued in the same ms timeframe (TSval 1159799{3,4}), but FQ managed to send them right in time to avoid a big burst. In slow start or steady state, very few packets are throttled [1] FQ gets a bunch of tunables as : limit : max number of packets on whole Qdisc (default 10000) flow_limit : max number of packets per flow (default 100) quantum : the credit per RR round (default is 2 MTU) initial_quantum : initial credit for new flows (default is 10 MTU) maxrate : max per flow rate (default : unlimited) buckets : number of RB trees (default : 1024) in hash table. (consumes 8 bytes per bucket) [no]pacing : disable/enable pacing (default is enable) All of them can be changed on a live qdisc. $ tc qd add dev eth0 root fq help Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ] [ quantum BYTES ] [ initial_quantum BYTES ] [ maxrate RATE ] [ buckets NUMBER ] [ [no]pacing ] $ tc -s -d qd qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 256 quantum 3028 initial_quantum 15140 Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14) backlog 0b 0p requeues 14 511 flows, 511 inactive, 0 throttled 110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit [1] Except if initial srtt is overestimated, as if using cached srtt in tcp metrics. We'll provide a fix for this issue. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* first commitMeizu OpenSource2016-08-151-0/+653