The java.util.concurrent Synchronizer Framework 翻译
所属分类 java
浏览量 1695
ABSTRACT
Most synchronizers (locks, barriers, etc.) in the J2SE1.5
java.util.concurrent package are constructed using a small framework based on class AbstractQueuedSynchronizer.
This framework provides common mechanics for atomically managing synchronization state, blocking and unblocking threads, and queuing.
The paper describes the rationale, design, implementation, usage, and performance of this framework.
摘要
J2SE1.5 并发包 中的大多数同步器(锁、屏障等) 基于类AbstractQueuedSynchronizer的小框架构建。
这个框架为原子管理同步状态、阻塞和解除阻塞线程以及排队提供了通用机制。
本文介绍了该框架的原理、设计、实现、使用和性能。
1. INTRODUCTION
Java release J2SE-1.5 introduces package java.util.concurrent,
a collection of medium-level concurrency support classes created via Java Community Process (JCP) Java Specification Request (JSR) 166.
Among these components are a set of synchronizers – abstract data type (ADT) classes
that maintain an internal synchronization state (for example, representing whether a lock is locked or unlocked),
operations to update and inspect that state, and at least one method that will cause a calling thread to block if the state requires it,
resuming when some other thread changes the synchronization state to permit it.
Examples include various forms of mutual exclusion locks, read-write locks, semaphores, barriers, futures, event indicators, and handoff queues.
jdk1.5引入并发包, jcp jsr166 创建的 一组中级并发支持类
在这些组件中有一组同步器 抽象数据类型(ADT)类
维护内部同步状态(例如,表示锁是锁定的还是未锁定的),
更新和检查状态的操作,至少有一个方法会导致调用线程阻塞根据状态需要,
当其他线程更改同步状态以允许它时恢复。
示例包括各种形式的互斥锁、读写锁、信号量、屏障、futures、事件指示器和切换队列。
As is well-known nearly any synchronizer can be used to implement nearly any other.
For example, it is possible to build semaphores from reentrant locks, and vice versa.
However, doing so often entails enough complexity, overhead, and inflexibility to be at best a second-rate engineering option.
Further, it is conceptually unattractive. If none of these constructs are intrinsically more primitive than the others,
developers should not be compelled to arbitrarily choose one of them as a basis for building others.
Instead, JSR166 establishes a small framework centered on class AbstractQueuedSynchronizer,
that provides common mechanics that are used by most of the provided synchronizers in the package,
as well as other classes that users may define themselves.
The remainder of this paper discusses the requirements for this framework,
the main ideas behind its design and implementation, sample usages,
and some measurements showing its performance characteristics.
众所周知,几乎任何同步器都可以用来实现几乎任何其他同步器。例如,可以从可重入锁构建信号量,反之亦然。
然而,这样做通常需要足够的复杂性、开销和灵活性,充其量只能成为二流的工程选择。
此外,它在概念上没有吸引力。如果这些构念本质上没有一个比其他构念更原始,
开发人员不应该被迫任意选择其中一个作为构建其他组件的基础。
相反,JSR166建立了一个以AbstractQueuedSynchronizer为中心的小框架,
它提供了包中提供的大多数同步器所使用的通用机制,以及用户可能定义自己的其他类。
本文的其余部分将讨论此框架的需求,其设计和实现背后的主要思想、示例用法、并对其性能进行了测试。
2. REQUIREMENTS
2.1 Functionality
Synchronizers possess two kinds of methods:
at least one acquire operation that blocks the calling thread unless/until the synchronization state allows it to proceed,
and at least one release operation that changes synchronization state in a way that may allow one or more blocked threads to unblock.
The java.util.concurrent package does not define a single unified API for synchronizers.
Some are defined via common interfaces (e.g., Lock), but others contain only specialized versions.
So, acquire and release operations take a range of names and forms across different classes.
For example, methods Lock.lock, Semaphore.acquire, CountDownLatch.await, and FutureTask.get all map to acquire operations in the framework.
However, the package does maintain consistent conventions across classes to support a range of common usage options.
When meaningful, each synchronizer supports:
• Nonblocking synchronization attempts (for example, tryLock) as well as blocking versions.
• Optional timeouts, so applications can give up waiting.
• Cancellability via interruption, usually separated into one version of acquire that is cancellable, and one that isn't.
2.1功能
同步器有两种方法:
至少有一个获取操作阻塞调用线程,除非/直到同步状态允许它继续,
以及至少一个释放操作,该操作更改同步状态的方式可能允许一个或多个阻塞的线程解除阻塞。
并发包没有为同步器定义一个统一的API。
有些是通过公共接口(例如锁)定义的,但是其他的只包含专门的版本。
因此,不同类的获取和释放操作会有不同的名字和形式
获取操作的各种方法
Lock.lock, Semaphore.acquire, CountDownLatch.await, and FutureTask.get
但是,包在类之间保持一致的约定,以支持一系列通用使用选项。
每个同步器支持以下操作:
• 非阻塞同步尝试(例如tryLock)以及阻塞版本。
• 可选的超时,因此应用程序可以放弃等待。
• 通过中断实现的可取消性,通常分为可取消的和不可取消的获取版本
Synchronizers may vary according to whether they manage only exclusive states
– those in which only one thread at a time may continue past a possible blocking point
– versus possible shared states in which multiple threads can at least sometimes proceed.
Regular lock classes of course maintain only exclusive state, but counting semaphores,
for example, may be acquired by as many threads as the count permits.
To be widely useful, the framework must support both modes of operation.
The java.util.concurrent package also defines interface Condition,
supporting monitor-style await/signal operations that may be associated with exclusive Lock classes,
and whose implementations are intrinsically intertwined with their associated Lock classes.
同步器可以根据是否只管理独占状态而变化
-每次只有一个线程可以在可能的阻塞点之后继续运行的线程
-相对于可能的共享状态,其中多个线程有时至少可以继续。
常规锁类当然只维护独占状态,但是计数信号量,例如,可以由计数允许的尽可能多的线程获取。
要广泛使用,框架必须支持这两种操作模式。
并发包还定义了 Condition 接口,支持监视器式的 await/signal 操作,这些操作可能与独占锁类相关联,
其实现与相关的锁类在本质上是交织在一起的。
2.2 Performance Goals
Java built-in locks (accessed using synchronized methods and blocks) have long been a performance concern, and there is a sizable literature on their construction .
However, the main focus of such work has been on minimizing space overhead (because any Java object can serve as a lock)
and on minimizing time overhead when used in mostly-single-threaded contexts on uniprocessors.
Neither of these are especially important concerns for synchronizers:
Programmers construct synchronizers only when needed, so there is no need to compact space that would otherwise be wasted,
and synchronizers are used almost exclusively in multithreaded designs (increasingly often on multiprocessors) under which at least occasional contention is to be expected.
So the usual JVM strategy of optimizing locks primarily for the zero-contention case,
leaving other cases to less predictable "slow paths" is not the right tactic for typical multithreaded server applications that rely heavily on java.util.concurrent.
2.2 性能目标
Java内置锁(使用同步方法和块访问)长期以来一直是性能问题,关于它们的构造有大量的文献。
然而,这类工作的主要焦点是最小化空间开销(因为任何Java对象都可以充当锁)以及如何在单处理器上的单线程上下文中最小化时间开销。
这些对于同步器来说都不是特别重要的问题:程序员只在需要的时候构造同步器,所以没有必要压缩空间 ,
同步器几乎只用于多线程设计(越来越多地用于多处理器),在这种设计中,至少偶尔会出现争用。
所以JVM通常的优化锁的策略主要是针对零争用的情况,
对于严重依赖并发包的典型多线程服务器应用程序,将其他情况留给不那么可预测的“慢路径”不是正确的策略。
Instead, the primary performance goal here is scalability: to predictably maintain efficiency even,
or especially, when synchronizers are contended.
Ideally, the overhead required to pass a synchronization point should be constant no matter how many threads are trying to do so.
Among the main goals is to minimize the total amount of time during which some thread is permitted to pass a synchronization point but has not done so.
However, this must be balanced against resource considerations, including total CPU time requirements, memory traffic, and thread scheduling overhead.
For example, spinlocks usually provide shorter acquisition times than blocking locks,
but usually waste cycles and generate memory contention, so are not often applicable.
相反,这里的主要性能目标是可伸缩性:甚至可以预见地保持效率,特别是当同步器发生争用时。
理想情况下,无论有多少线程尝试通过同步点,所需的开销都应该是常量。
主要目标之一是最小化允许某些线程通过同步点但没有通过同步点的总时间。
但是,这必须与资源考虑因素相平衡,包括总CPU时间需求、内存流量和线程调度开销。
例如,自旋锁通常比阻塞锁提供更短的获取时间,但是通常会浪费周期并产生内存争用,所以通常不适用。
These goals carry across two general styles of use.
Most applications should maximize aggregate throughput, tolerating, at best, probabilistic guarantees about lack of starvation.
However in applications such as resource control,
it is far more important to maintain fairness of access across threads, tolerating poor aggregate throughput.
No framework can decide between these conflicting goals on behalf of users;
instead different fairness policies must be accommodated.
这些目标包含两种一般的使用风格。
大多数应用程序应该最大限度地提高总吞吐量,最多也只能容忍对缺乏饥饿的概率保证。
但在资源控制等应用中,更重要的是保持线程间访问的公平性,容忍较差的聚合吞吐量。
没有一个框架能够代表用户在这些冲突的目标之间做出决定;相反,必须适应不同的公平政策。
No matter how well-crafted they are internally, synchronizers will create performance bottlenecks in some applications.
Thus, the framework must make it possible to monitor and inspect basic operations to allow users to discover and alleviate bottlenecks.
This minimally (and most usefully) entails providing a way to determine how many threads are blocked.
不管同步器在内部设计得有多好,它都会在某些应用程序中造成性能瓶颈。
因此,框架必须能够监视和检查基本操作,以允许用户发现和缓解瓶颈。
这至少(也是最有用的)需要提供一种方法来确定有多少线程被阻塞。
3. DESIGN AND IMPLEMENTATION
The basic ideas behind a synchronizer are quite straightforward.
An acquire operation proceeds as:
while (synchronization state does not allow acquire) {
enqueue current thread if not already queued;
possibly block current thread;
}
dequeue current thread if it was queued;
And a release operation is:
update synchronization state;
if (state may permit a blocked thread to acquire)
unblock one or more queued threads;
Support for these operations requires the coordination of three basic components:
• Atomically managing synchronization state
• Blocking and unblocking threads
• Maintaining queues
支持这些操作需要协调好3种基本的操作
原子的管理同步状态
阻塞和解除租售线程
维护队列
It might be possible to create a framework that allows each of these three pieces to vary independently.
However, this would neither be very efficient nor usable.
For example, the information kept in queue nodes must mesh with that needed for unblocking,
and the signatures of exported methods depend on the nature of synchronization state.
也许可以创建一个框架,允许这三个部分中的每一个独立变化。
然而,这既不是很有效也不是很有用。
例如,队列节点中保存的信息必须与解除阻塞所需的信息相匹配,暴露方法的签名依赖于同步状态的性质。
The central design decision in the synchronizer framework was to choose a concrete implementation of each of these three components,
while still permitting a wide range of options in how they are used.
This intentionally limits the range of applicability,
but provides efficient enough support that there is practically never a reason not to use the framework
(and instead build synchronizers from scratch) in those cases where it does apply.
同步器框架的核心设计决策是选择这三个组件的具体实现,同时仍然允许在如何使用它们方面有广泛的选择。
这故意限制了适用性的范围,但是提供了足够有效的支持,实际上没有理由不使用该框架(相反,从头开始构建同步器)。
3.1 Synchronization State
Class AbstractQueuedSynchronizer maintains synchronization state using only a single (32bit) int,
and exports getState, setState, and compareAndSetState operations to access and update this state.
These methods in turn rely on java.util.concurrent.atomic support providing JSR133 (Java Memory Model) compliant volatile semantics on reads and writes,
and access to native compare-and-swap or load-linked/store-conditional instructions to implement compare-AndSet State,
that atomically sets state to a given new value only if it holds a given expected value.
3.1同步状态
AbstractQueuedSynchronizer仅使用一个(32位)int来维护同步状态,
并暴露 getState、setState和compareAndSetState操作来访问和更新此状态。
这些方法依次依赖 java.util.concurrent.atomic 支持在读写上提供符合JSR133 (Java内存模型)的 volatile 语义,
访问本地的 CAS 或 load-linked/store-conditional 指令来实现 CAS state
只有当状态具有给定的期望值时,才原子地将状态设置为给定的新值。
Restricting synchronization state to a 32bit int was a pragmatic decision.
While JSR166 also provides atomic operations on 64bit long fields,
these must still be emulated using internal locks on enough platforms that the resulting synchronizers would not perform well.
In the future, it seems likely that a second base class, specialized for use with 64bit state (i.e., with long control arguments), will be added.
However, there is not now a compelling reason to include it in the package.
Currently, 32 bits suffice for most applications.
Only one java.util.concurrent synchronizer class, CyclicBarrier,
would require more bits to maintain state, so instead uses locks (as do most higher-level utilities in the package).
将同步状态限制为32位int是一个实用的决定。
JSR166还提供64位长整形字段的原子操作,必须在足够多的平台上使用内部锁来模拟这些锁,这样产生的同步器就不能很好地执行。
在将来,很可能会出现第二个基类,专门用于64位状态 ,将被添加。
然而,现在没有一个令人信服的理由把它包括在这个包中。
目前,32位对大多数应用程序来说已经足够了。
只有一个 并发同步器类 CyclicBarrier,需要更多的位来维护状态,所以使用锁(就像包中的大多数高级实用程序一样)。
Concrete classes based on AbstractQueuedSynchronizer must define methods tryAcquire and tryRelease in terms of these exported state methods in order to control the acquire and release operations.
The tryAcquire method must return true if synchronization was acquired,
and the tryRelease method must return true if the new synchronization state may allow future acquires.
These methods accept a single int argument that can be used to communicate desired state;
for example in a reentrant lock, to re-establish the recursion count when re-acquiring the lock after returning from a condition wait.
Many synchronizers do not need such an argument, and so just ignore it.
基于AQS的具体类必须根据这些暴露的状态方法 定义 tryAcquire 和 tryRelease方法,以便控制获取和释放操作。
如果获得同步,tryAcquire 方法必须返回true,
如果新的同步状态允许以后获取,tryRelease方法必须返回true。
这些方法接受一个int参数,该参数可用于通信所需的状态;
例如,在重入锁中,要在从条件wait返回后重新获取锁时重新建立递归计数。
许多同步器不需要这样的参数,所以忽略它。
3.2 Blocking
Until JSR166, there was no Java API available to block and unblock threads for purposes of creating synchronizers that are not based on built-in monitors.
The only candidates were Thread.suspend and Thread.resume, which are unusable because they encounter an unsolvable race problem:
If an unblocking thread invokes resume before the blocking thread has executed suspend, the resume operation will have no effect.
3.2 阻塞
在JSR166之前,还没有可用的Java API来阻塞和取消阻塞线程,以便创建不基于内置监视器的同步器。
唯一的候选人是Thread.suspend 和 Thread.resume ,由于遇到无法解决的竞争问题而无法使用:
如果取消阻塞的线程在阻塞线程执行挂起之前调用resume,那么恢复操作将不起作用。
The java.util.concurrent.locks package includes a LockSupport class with methods that address this problem.
Method LockSupport.park blocks the current thread unless or until a LockSupport.unpark has been issued.
(Spurious wakeups are also permitted.) Calls to unpark are not "counted", so multiple unparks before a park only unblock a single park.
Additionally, this applies per-thread, not per-synchronizer. A thread invoking park on a new synchronizer might return immediately because of a "leftover" unpark from a previous usage.
However, in the absence of an unpark, its next invocation will block.
While it would be possible to explicitly clear this state, it is not worth doing so.
It is more efficient to invoke park multiple times when it happens to be necessary.
LockSupport 解决了这个问题 。
LockSupport.park 方法阻塞当前线程 , 直到 LockSupport.unpark被调用
(虚假的唤醒也是允许的)对unpark的调用不被“计数”,因此多个 unparks 只会解锁一个 park 阻塞。
此外,这适用于每线程,而不是每同步器。调用新同步器上的park的线程可能会立即返回,因为在以前的使用中有一个“剩余的”unpark。
然而,在没有unpark的情况下,它的下一个调用将被阻塞。
虽然可以显式地清除这种状态,但是不值得这样做。当需要多次调用park时,调用它会更有效。
This simple mechanism is similar to those used, at some level, in the Solaris-9 thread library ,
in WIN32 "consumable events", and in the Linux NPTL thread library,
and so maps efficiently to each of these on the most common platforms Java runs on.
(However, the current Sun Hotspot JVM reference implementation on Solaris and Linux actually uses a pthread condvar
in order to fit into the existing runtime design.)
The park method also supports optional relative and absolute timeouts, and is integrated with JVM Thread.
interrupt support — interrupting a thread unparks it.
这个简单的机制在某种程度上类似于Solaris-9线程库中使用的机制,在WIN32“可消费事件”和Linux NPTL线程库中,
因此,在Java运行的最常见平台上,可以高效地映射到这些平台。
(然而,目前Sun Hotspot JVM在Solaris和Linux上的参考实现实际上使用了pthread condvar 以适应现有的运行时设计。)
park方法还支持可选的相对超时和绝对超时,并与JVM线程集成。中断支持 ,中断一个线程并将其解除阻塞。
3.3 Queues
The heart of the framework is maintenance of queues of blocked threads,
which are restricted here to FIFO queues. Thus, the framework does not support priority-based synchronization.
3.3 队列
框架的核心是维护阻塞线程的队列,这里限制为FIFO队列。因此,该框架不支持基于优先级的同步。
These days, there is little controversy that the most appropriate choices for synchronization queues are non-blocking data structures
that do not themselves need to be constructed using lower-level locks.
And of these, there are two main candidates:
variants of Mellor-Crummey and Scott (MCS) locks, and variants of Craig, Landin, and Hagersten (CLH) locks .
Historically, CLH locks have been used only in spinlocks.
However, they appeared more amenable than MCS for use in the synchronizer framework
because they are more easily adapted to handle cancellation and timeouts, so were chosen as a basis.
The resulting design is far enough removed from the original CLH structure to require explanation.
目前,对于同步队列最合适的选择是非阻塞数据结构,这几乎没有争议,它们本身不需要使用较低级别的锁来构造。
其中,有两个主要的候选:
Mellor-Crummey and Scott (MCS)锁的变体,以及Craig Landin and Hagersten (CLH)锁的变体。
历史上,CLH锁只在自旋锁中使用。
然而,在同步器框架中,它们看起来比MCS更易于使用 ,
因为它们更容易用于处理取消和超时,所以选择它们作为基础。
最终的设计与原来的CLH结构相差甚远,需要解释。
A CLH queue is not very queue-like, because its enqueuing and dequeuing operations are intimately tied to its uses as a lock.
It is a linked queue accessed via two atomically updatable fields, head and tail, both initially pointing to a dummy node.
CLH队列不是很像队列,因为它的入队列和出队列操作与它作为锁的使用密切相关。
它是一个链接队列,通过两个原子可更新字段head和tail访问,它们最初都指向一个虚拟节点。
A new node, node, is enqueued using an atomic operation:
do { pred = tail;
} while(!tail.compareAndSet(pred, node));
The release status for each node is kept in its predecessor node.
So, the "spin" of a spinlock looks like:
while (pred.status != RELEASED) ; // spin
A dequeue operation after this spin simply entails setting the head field to the node that just got the lock:
head = node;
每个节点的释放状态都保存在其前置节点中。
Among the advantages of CLH locks are that enqueuing and dequeuing are fast, lock-free,
and obstruction free (even under contention, one thread will always win an insertion race so will make progress);
that detecting whether any threads are waiting is also fast (just check if head is the same as tail);
and that release status is decentralized, avoiding some memory contention.
CLH锁的优点之一是,进入和退出队列速度快,没有锁,无阻塞(即使在竞争中,总有一个线程插入成功,从而取得进展);
检测是否有线程正在等待也是快速的(只要检查头部是否与尾部相同);
而且发布状态是分散的,避免了一些内存竞争。
In the original versions of CLH locks, there were not even links connecting nodes.
In a spinlock, the pred variable can be held as a local.
However, Scott and Scherer showed that by explicitly maintaining predecessor fields within nodes,
CLH locks can deal with timeouts and other forms of cancellation:
If a node's predecessor cancels, the node can slide up to use the previous node's status field.
在CLH锁的最初版本中,甚至没有连接节点的链接。在自旋锁中,pred变量可以作为局部变量保存。
然而,Scott和Scherer表明,通过显式地维护节点中的前置节点字段,CLH锁可以处理超时和其他形式的取消:
如果节点的前一个节点取消,则该节点可以向上滑动以使用前一个节点的状态字段。
The main additional modification needed to use CLH queues for blocking synchronizers
is to provide an efficient way for one node to locate its successor.
In spinlocks, a node need only change its status, which will be noticed on next spin by its successor,
so links are unnecessary. But in a blocking synchronizer, a node needs to explicitly wake up (unpark) its successor.
使用CLH队列阻塞同步器所需的主要修改,是为一个节点找到它的后继节点提供一种有效的方法。
在自旋锁中,一个节点只需要改变它的状态,它的下一个自旋将会被它的继任者注意到,所以链接是不必要的。
但是在阻塞同步器中,节点需要显式地唤醒(取消阻塞)它的后继者。
An AbstractQueuedSynchronizer queue node contains a next link to its successor.
But because there are no applicable techniques for lock-free atomic insertion of double-linked list nodes using compareAndSet,
this link is not atomically set as part of insertion; it is simply assigned:
pred.next = node;after the insertion. This is reflected in all usages.
The next link is treated only as an optimized path.
If a node's successor does not appear to exist (or appears to be cancelled) via its next field,
it is always possible to start at the tail of the list
and traverse backwards using the pred field to accurately check if there really is one.
AbstractQueuedSynchronizer队列节点包含到其后续节点的下一个链接。
但是,由于没有使用compareAndSet对双链列表节点进行无锁原子插入的适用技术,
该链接不是原子性地设置为插入的一部分;它只是在插入之后简单地分配:
pred.next = node ;
这反映在所有的用法中,下一个链接只作为优化路径处理。
如果节点的后继节点似乎不存在(或似乎被取消),总是可以从列表的末尾开始
然后使用pred字段向后遍历,以准确地检查是否真的有一个。
A second set of modifications is to use the status field kept in each node
for purposes of controlling blocking, not spinning.
In the synchronizer framework, a queued thread can only return from an acquire operation
if it passes the tryAcquire method defined in a concrete subclass;
a single "released" bit does not suffice. But control is still needed to
ensure that an active thread is only allowed to invoke tryAcquire when it is at the head of the queue;
in which case it may fail to acquire, and (re)block.
This does not require a per-node status flag because permission
can be determined by checking that the current node's predecessor is the head.
And unlike the case of spinlocks, there is not enough memory contention reading head to warrant replication.
However, cancellation status must still be present in the status field.
第二个修改是使用每个节点中保存的status字段,用于控制是堵塞还是自转。
在同步器框架中,排队的线程只能从获取操作返回,通过在具体子类中定义的tryAcquisition方法;
一个单独的“释放”位是不够的。但是仍然需要控制,确保活动线程只允许在位于队列头部时调用tryAcquisition;
在这种情况下,它可能无法获得和(重新)阻塞。这并不需要每个节点的状态标志,因为权限可以通过检查当前节点的前置节点是否为head来确定。
与自旋锁的情况不同,没有足够的内存争用读取头来保证复制。但是,取消状态必须仍然存在于状态字段中。
The queue node status field is also used to avoid needless calls to park and unpark.
While these methods are relatively fast as blocking primitives go,
they encounter avoidable overhead in the boundary crossing between Java and the JVM runtime and/or OS.
Before invoking park, a thread sets a "signal me" bit,
and then rechecks synchronization and node status once more before invoking park.
A releasing thread clears status.
This saves threads from needlessly attempting to block often enough to be worthwhile,
especially for lock classes in which lost time waiting for the next eligible thread to acquire a lock accentuates other contention effects.
This also avoids requiring a releasing thread to determine its successor unless the successor has set the signal bit,
which in turn eliminates those cases where it must traverse multiple nodes
to cope with an apparently null next field unless signalling occurs in conjunction with cancellation.
队列节点状态字段还用于避免不必要的对park和unpark的调用。
虽然这些方法与阻塞原语相比速度相对较快,它们在Java 与 JVM运行时 OS之间 的边界上避免了开销。
在调用park之前,线程设置一个“signal me”位,然后在调用park之前再次检查同步和节点状态。
释放线程清除状态。这样就避免了不必要地频繁阻塞线程,
特别是对于锁类,其中等待下一个符合条件的线程获得锁所花费的时间会加重其他争用效果。
这也避免了需要释放线程来确定它的后继线程,除非后继线程已经设置了信号位,这反过来又消除了必须遍历多个节点的情况,
处理明显为空的下一个字段,除非在取消时同时发出信号。
Perhaps the main difference between the variant of CLH locks used in the synchronizer framework
and those employed in other languages is that garbage collection is relied on for managing storage reclamation of nodes,
which avoids complexity and overhead.
However, reliance on GC does still entail nulling of link fields
when they are sure to never to be needed. This can normally be done when dequeuing.
Otherwise, unused nodes would still be reachable, causing them to be uncollectable.
同步器框架中使用的CLH锁的变体与其他语言中使用的CLH锁的主要区别可能在于,依赖垃圾收集来管理节点的存储回收,从而避免了复杂性和开销。
然而,依赖GC仍然会导致链接字段为空,而这些字段肯定永远不需要。这通常可以在退出队列时完成。否则,未使用的节点仍然是可访问的,导致它们不可收集。
Some further minor tunings,
including lazy initialization of the initial dummy node required by CLH queues upon first contention,
are described in the source code documentation in the J2SE1.5 release.
一些更小的调优,包括CLH队列虚拟节点延迟初始化 , 在第一次争用时创建,
在J2SE1.5版本的源代码文档中进行了描述。
Omitting such details, the general form of the resulting implementation of the basic acquire operation
(exclusive, noninterruptible, untimed case only) is:
忽略这些细节 , 仅限排他的、不可中断的、不定时的情况 , 获取操作的一般实现
if (!tryAcquire(arg)) {
node = create and enqueue new node;
pred = node's effective predecessor;
while (pred is not head node || !tryAcquire(arg)) {
if (pred's signal bit is set) {park();}
else {compareAndSet pred's signal bit to true;}
pred = node's effective predecessor;
}
head = node;
}
And the release operation is:
释放操作
if (tryRelease(arg) && head node's signal bit is set) {
compareAndSet head's signal bit to false;
unpark head's successor, if one exists
}
The number of iterations of the main acquire loop depends, of course, on the nature of tryAcquire.
Otherwise, in the absence of cancellation, each component of acquire and release is a constant-time O(1) operation,
amortized across threads, disregarding any OS thread scheduling occuring within park.
主获取循环的迭代次数取决于tryAcquisition的性质。
否则,在没有取消的情况下,获取和释放的每个组件都是一个常量时间O(1)操作,
跨线程平摊,不考虑阻塞时任何操作系统线程调度
Cancellation support mainly entails checking for interrupt or timeout upon each return from park inside the acquire loop.
A cancelled thread due to timeout or interrupt sets its node status and unparks its successor so it may reset links.
With cancellation, determining predecessors and successors
and resetting status may include O(n) traversals (where n is the length of the queue).
Because a thread never again blocks for a cancelled operation, links and status fields tend to restabilize quickly.
取消支持主要是在获取循环中每次从park返回时检查中断或超时。
由于超时或中断而被取消的线程将设置其节点状态并 唤醒 其后续线程,以便重新设置链接。
随着撤销,确定前任和继任者 重置状态可以包括O(n)遍历(其中n是队列的长度)。
因为线程再也不会阻塞已取消的操作,所以链接和状态字段趋向于快速恢复。
3.4 Condition Queues
The synchronizer framework provides a ConditionObject class for use by synchronizers
that maintain exclusive synchronization and conform to the Lock interface.
Any number of condition objects may be attached to a lock object,
providing classic monitor-style await, signal, and signalAll operations, including those with timeouts,
along with some inspection and monitoring methods.
3.4 条件队列
同步器框架提供一个条件对象类供同步器使用,维护独占同步并符合锁接口。
任何数量的条件对象都可以附加到锁对象上,提供经典的监视器式等待、通知和通知所有 操作,包括超时操作,以及一些检查和监测方法。
The ConditionObject class enables conditions to be efficiently integrated with other synchronization operations, again by fixing some design decisions.
This class supports only Java-style monitor access rules in which condition operations are legal only when the lock owning the condition is held by the current thread .
Thus, a ConditionObject attached to a ReentrantLock acts in the same way as do built-in monitors (via Object.wait etc),
differing only in method names, extra functionality, and the fact that users can declare multiple conditions per lock.
通过修复一些设计决策,条件对象类使条件能够有效地与其他同步操作集成。
该类只支持java风格的monitor访问规则,其中只有当拥有条件的锁由当前线程持有时,条件操作才是合法的。
因此,附加到ReentrantLock的条件对象的行为与内置监视器(通过 Object.wait 等)的行为相同),
不同之处只在于方法名称、额外功能以及用户可以为每个锁声明多个条件。
A ConditionObject uses the same internal queue nodes as synchronizers,
but maintains them on a separate condition queue.
The signal operation is implemented as a queue transfer from the condition queue to the lock queue,
without necessarily waking up the signalled thread before it has re-acquired its lock.
条件对象使用与同步器相同的内部队列节点,但是在一个单独的条件队列中维护它们。
signal 操作实现为从条件队列到锁队列的队列传输,而不必在发出信号的线程重新获得锁之前唤醒它。
The basic await operation is:
create and add new node to condition queue;
release lock;
block until node is on lock queue;
re-acquire lock;
And the signal operation is:
transfer the first node from condition queue to lock queue;
Because these operations are performed only when the lock is held,
they can use sequential linked queue operations (using a nextWaiter field in nodes) to maintain the condition queue.
The transfer operation simply unlinks the first node from the condition queue,
and then uses CLH insertion to attach it to the lock queue.
因为这些操作只有在锁被锁住时才会执行,
它们可以使用顺序链接的队列操作(在节点中使用nextWaiter字段)来维护条件队列。
传输操作只是从条件队列中断开第一个节点,然后使用CLH插入将其附加到锁队列。
The main complication in implementing these operations is dealing with cancellation of condition waits due to timeouts or Thread.interrupt.
A cancellation and signal occuring at approximately the same time encounter a race whose outcome conforms to the specifications for built-in monitors.
As revised in JSR133, these require that if an interrupt occurs before a signal, then the await method must, after re-acquiring the lock, throw InterruptedException.
But if it is interrupted after a signal, then the method must return without throwing an exception, but with its thread interrupt status set.
实现这些操作的主要复杂性是处理由于超时或Thread.interrupt而取消条件等待。
在同一时间发生的取消和 signal 遇到竞争,其结果符合内置监视器的规范。
正如JSR133中修改过的,这些规则要求,如果中断发生在 signal 之前,那么wait方法必须在重新获取锁之后抛出InterruptedException。
但是如果它在一个信号之后被中断,那么这个方法必须返回而不抛出异常,但是要设置它的线程中断状态。
To maintain proper ordering, a bit in the queue node status records whether the node has been (or is in the process of being) transferred.
Both the signalling code and the cancelling code try to compareAndSet this status.
If a signal operation loses this race, it instead transfers the next node on the queue, if one exists.
If a cancellation loses, it must abort the transfer, and then await lock re-acquisition.
This latter case introduces a potentially unbounded spin.
A cancelled wait cannot commence lock re-acquisition until the node has been successfully inserted on the lock queue,
so must spin waiting for the CLH queue insertion compareAndSet being performed by the signalling thread to succeed.
The need to spin here is rare, and employs a Thread.yield to provide a scheduling hint that some other thread,
ideally the one doing the signal, should instead run.
While it would be possible to implement here a helping strategy for the cancellation to insert the node,
the case is much too rare to justify the added overhead that this would entail.
In all other cases, the basic mechanics here and elsewhere use no spins or yields,
which maintains reasonable performance on uniprocessors.
为了保持适当的顺序,队列节点状态中的一个位记录节点是否已经(或正在)被传输。
通知代码和取消代码都试图比较和设置此状态。
如果一个信号操作丢失了这个竞争,那么它将传输队列上的下一个节点(如果存在下一个节点)。
如果取消操作失败,则必须中止传输,然后等待锁的重新获取。
后一种情况引入了潜在的无界自旋。
在节点被成功插入锁队列之前,取消的等待不能开始重新获取锁,
因此必须自旋,等待发出信号的线程执行CLH队列插入compareAndSet才能成功。
这里很少需要自旋,需要使用 Thread.yield 提供一个调度提示,提示其他线程,
理想情况下,发出信号的应该运行。
虽然可以在这里实现一个帮助策略来插入节点,这种情况非常罕见,无法证明这将导致额外的开销。
在所有其他情况下,这里不使用自旋或yield ,在单处理器上保持合理的性能。
4. USAGE
Class AbstractQueuedSynchronizer ties together the above functionality and serves as a "template method pattern" base class for synchronizers.
Subclasses define only the methods that implement the state inspections and updates that control acquire and release.
However, subclasses of AbstractQueuedSynchronizer are not themselves usable as synchronizer ADTs,
because the class necessarily exports the methods needed to internally control acquire and release policies,
which should not be made visible to users of these classes.
All java.util.concurrent synchronizer classes declare a private inner AbstractQueuedSynchronizer subclass and delegate all synchronization methods to it.
This also allows public methods to be given names appropriate to the synchronizer.
4. 使用
AbstractQueuedSynchronizer将上述功能绑定在一起,作为同步器的“模板方法模式”基类。
子类只定义状态检查和更新的方法 用来实现控制获取和释放。
然而,AbstractQueuedSynchronizer的子类本身不能作为同步器ADTs使用,
因为类必须暴露内部控制获取和释放策略所需的方法,不应该对类的使用者可见。
所有 并发包 同步器类声明一个私有的内部AbstractQueuedSynchronizer子类,并将所有同步方法委托给它。
这还允许为公共方法指定适合于同步器的名称。
For example, here is a minimal Mutex class,
that uses synchronization state zero to mean unlocked, and one to mean locked.
This class does not need the value arguments supported for synchronization methods,
so uses zero, and otherwise ignores them.
例如,这里有一个最小的互斥类,它使用同步状态0表示解锁,使用状态1表示锁定。
该类不需要同步方法支持的值参数,所以用0,否则就忽略它们。
class Mutex {
class Sync
extends AbstractQueuedSynchronizer {
public boolean tryAcquire(int ignore) {
return compareAndSetState(0, 1);
}
public boolean tryRelease(int ignore) {
setState(0); return true; }
}
private final Sync sync = new Sync();
public void lock() { sync.acquire(0); }
public void unlock() { sync.release(0); }
}
A fuller version of this example, along with other usage guidance may be found in the J2SE documentation.
Many variants are of course possible.
For example, tryAcquire could employ "test-and-test-and-set" by checking the state value before trying to change it.
该示例的完整版本以及其他使用指南可以在J2SE文档中找到。
当然,有许多变种。例如,tryAcquisition可以使用“test-and-test-and-set”,在尝试更改状态值之前检查状态值。
It may be surprising that a construct as performance-sensitive as a mutual exclusion lock
is intended to be defined using a combination of delegation and virtual methods.
However, these are the sorts of OO design constructions that modern dynamic compilers have long focussed on.
They tend to be good at optimizing away this overhead, at least in code in which synchronizers are invoked frequently.
像互斥锁这样对性能敏感的构造可能会令人惊讶,使用委托和虚拟方法的组合来定义。
然而这些是现代动态编译器长期关注的面向对象设计结构。
至少在频繁调用同步器的代码中,它们往往能够很好地优化掉这种开销。
Class AbstractQueuedSynchronizer also supplies a number of methods that assist synchronizer classes in policy control.
For example, it includes timeout and interruptible versions of the basic acquire method.
And while discussion so far has focussed on exclusive-mode synchronizers such as locks,
the AbstractQueuedSynchronizer class also contains a parallel set of methods (such as acquireShared)
that differ in that the tryAcquireShared and tryReleaseShared methods
can inform the framework (via their return values) that further acquires may be possible,
ultimately causing it to wake up multiple threads by cascading signals.
AbstractQueuedSynchronizer提供了一些方法辅助同步器类来做策略控制。
它包含基本获取方法的超时和可中断版本。
虽然目前的讨论主要集中在独占模式同步器(如锁)上,
AbstractQueuedSynchronizer类还包含一组并行方法(如 acquireShared)
不同之处在于 tryAcquireShared 和 tryReleaseShared 方法可以通知框架(通过它们的返回值),进一步获取是可能的,最终通过级联信号唤醒多个线程。
Although it is not usually sensible to serialize (persistently store or transmit) a synchronizer,
these classes are often used in turn to construct other classes,
such as thread-safe collections, that are commonly serialized.
The AbstractQueuedSynchronizer and ConditionObject classes provide methods to serialize synchronization state,
but not the underlying blocked threads or other intrinsically transient bookkeeping.
Even so, most synchronizer classes merely reset synchronization state to initial values on deserialization,
in keeping with the implicit policy of built-in locks of always deserializing to an unlocked state.
This amounts to a noop, but must still be explicitly supported to enable deserialization of final fields.
虽然串行化(持久存储或传输)同步器通常是不明智的,这些类通常用于构造其他类,例如通常可序列化的线程安全集合。
AbstractQueuedSynchronizer和条件对象类提供了序列化同步状态的方法,但不是底层被阻塞的线程或其他本质上短暂的簿记。
即便如此,大多数同步器类只是在反序列化时将同步状态重置为初始值,与内置锁的隐式策略保持一致,总是反序列化到解锁状态。
这相当于一个noop,但仍然必须显式地支持它,以启用最终字段的反序列化。
4.1 Controlling Fairness
Even though they are based on FIFO queues, synchronizers are not necessarily fair.
Notice that in the basic acquire algorithm , tryAcquire checks are performed before queuing.
Thus a newly acquiring thread can “steal” access that is "intended" for the first thread at the head of the queue.
4.1 公平性控制
即使同步器基于FIFO队列,它们也不一定是公平的。
注意,在基本的获取算法中,tryAcquisition检查在排队之前执行。
因此,一个新获取的线程可以“窃取”队首的第一个线程的访问权限。
This barging FIFO strategy generally provides higher aggregate throughput than other techniques.
It reduces the time during which a contended lock is available
but no thread has it because the intended next thread is in the process of unblocking.
At the same time, it avoids excessive, unproductive contention
by only allowing one (the first) queued thread to wake up and try to acquire upon any release.
Developers creating synchronizers may further accentuate barging effects in cases
where synchronizers are expected to be held only briefly by defining tryAcquire to itself retry a few times before passing back control.
这种快速FIFO策略通常比其他技术提供更高的总吞吐量。
它减少了争用锁可用的时间,但是没有线程拥有它,因为预期的下一个线程正在解除阻塞的过程中。
与此同时,它避免了过多的、无益的争抢,通过只允许一个(第一个)排队的线程醒来,并尝试在释放时获取。
在某些情况下,创建同步器的开发人员可能会进一步强调阻塞效果,同步器在传递回控制之前,只会被短暂地持有,通过将tryAcquisition定义为 重试几次
barging 冲撞; 乱闯;
barging FIFO strategy
tryAcquire 先于 acquireQueued,后来的线程可能在等待中的线程之前acquire成功。 它能提供更高的吞吐量。
Barging FIFO synchronizers have only probablistic fairness properties.
An unparked thread at the head of the lock queue has an unbiased chance of winning a race with any incoming barging thread,
reblocking and retrying if it loses. However, if incoming threads arrive faster than it takes an unparked thread to unblock,
the first thread in the queue will only rarely win the race,
so will almost always reblock, and its successors will remain blocked.
With briefly-held synchronizers, it is common for multiple bargings and releases to occur on multiprocessors
during the time the first thread takes to unblock.
As seen below, the net effect is to maintain high rates of progress of one or more threads
while still at least probabilistically avoiding starvation.
阻塞FIFO同步器只有概率公平属性。在锁队列的最前面有一个为唤醒的线程,它与任何进入的barging线程竞争获胜的几率是无偏的,
如果失败,则重新阻塞并重试。但是,如果传入的线程比未唤醒的线程更快到达,队列中的第一个线程很少会赢得比赛,
因此,几乎总是会重新阻塞,其继任者也将继续被阻塞。
对于短暂持有的同步器,在第一个线程解除阻塞的时间内,在多处理器上发生多个bargings和释放是很常见的 。
维持一个或多个线程的高速处理 ,尽量避免饥饿
When greater fairness is required, it is a relatively simple matter to arrange it.
Programmers requiring strict fairness can define tryAcquire to fail (return false)
if the current thread is not at the head of the queue,
checking for this using method getFirstQueuedThread, one of a handful of supplied inspection methods.
当需要更大的公平性时,安排它是一件相对简单的事情。
需要严格公平的程序可以定义tryAcquire to fail(返回false),如果当前线程不在队列的最前面,
使用getFirstQueuedThread方法进行检查,这是提供的几种检查方法之一。
A faster, less strict variant is to also allow tryAcquire to succeed if the the queue is (momentarily) empty.
In this case, multiple threads encountering an empty queue may race to be the first to acquire,
normally without enqueuing at least one of them.
This strategy is adopted in all java.util.concurrent synchronizers supporting a "fair" mode.
一个更快、更不严格的变体是,如果队列(暂时)为空,还允许tryAcquire成功。
在这种情况下,遇到空队列的多个线程可能竞相成为第一个获取成功的线程,
支持“公平”模式的并发同步器 采用了这种策略
While they tend to be useful in practice, fairness settings have no guarantees,
because the Java Language Specification does not provide scheduling guarantees.
For example, even with a strictly fair synchronizer,
a JVM could decide to run a set of threads purely sequentially if they never otherwise need to block waiting for each other.
In practice, on a uniprocessor, such threads are likely to each run for a time quantum before being pre-emptively context-switched.
If such a thread is holding an exclusive lock, it will soon be momentarily switched back,
only to release the lock and block now that it is known that another thread needs the lock,
thus further increasing the periods during which a synchronizer is available but not acquired.
Synchronizer fairness settings tend to have even greater impact on multiprocessors,
which generate more interleavings, and hence more opportunities for one thread to discover that a lock is needed by another thread.
虽然它们在实践中往往很有用,但公平设置并不能保证,因为Java语言规范没有提供调度保证。
例如,即使使用严格公平的同步器,如果一组线程不需要相互阻塞,JVM可以决定完全按顺序运行一组线程。
实际上,在单处理器上,这样的线程很可能每次运行一段时间,然后预先切换上下文。
如果一个线程持有一个独占锁,它很快就会被切换回来。
同步器的公平设置会增加 同步器可用但没有被获取的时间
同步器公平性设置对多处理器的影响更大,这会产生更多的交错,从而使一个线程有更多的机会发现另一个线程需要锁。
Even though they may perform poorly under high contention when protecting briefly-held code bodies,
fair locks work well, for example, when they protect relatively long code bodies and/or with relatively long inter-lock intervals,
in which case barging provides little performance advantage and but greater risk of indefinite postponement.
The synchronizer framework leaves such engineering decisions to its users.
即使在保护短暂持有的代码体时,它们在高争用下可能表现得很差,
当它们保护相对较长的代码体 或具有相对较长的互锁间隔时,公平锁工作得很好
在这种情况下,争抢提供的性能优势很小,但无限期延迟的风险更大。
同步器框架将这样的工程决策留给用户。
4.2 Synchronizers
Here are sketches of how java.util.concurrent synchronizer classes are defined using this framework:
The ReentrantLock class uses synchronization state to hold the (recursive) lock count.
When a lock is acquired, it also records the identity of the current thread to check recursions
and detect illegal state exceptions when the wrong thread tries to unlock.
The class also uses the provided ConditionObject, and exports other monitoring and inspection methods.
The class supports an optional "fair" mode by internally declaring two different
AbstractQueuedSynchronizer subclasses (the fair one disabling barging)
and setting each ReentrantLock instance to use the appropriate one upon construction.
4.2 同步器
并发同步器类使用这个框架定义的概述:
ReentrantLock类使用同步状态来保存(递归)锁计数。
当获得锁时,它还记录当前线程的标识以检查递归,并检测非法状态异常时,错误的线程试图解锁。
该类还使用提供的condition对象,并暴露其他监视和检查方法。
该类通过在内部声明两个不同的模式来支持可选的“公平”模式
AbstractQueuedSynchronizer子类(一个禁用争抢的类)
并在构造时将每个ReentrantLock实例设置为适当的实例。
The ReentrantReadWriteLock class uses 16 bits of the synchronization state to hold the write lock count,
and the remaining 16 bits to hold the read lock count. The WriteLock is otherwise structured in the same way as ReentrantLock.
The ReadLock uses the acquireShared methods to enable multiple readers.
ReentrantReadWriteLock类使用16位同步状态来保存写锁计数,
以及剩余的16位来保存读锁计数。WriteLock的其他结构与ReentrantLock相同。
ReadLock使用 acquireShared 方法来启用多个读。
The Semaphore class (a counting semaphore) uses the synchronization state to hold the current count.
It defines acquireShared to decrement the count or block if nonpositive,
and tryRelease to increment the count, possibly unblocking threads if it is now positive.
信号量类(计数信号量)使用同步状态来保存当前计数。
它定义 acquireShared 递减计数 ,非正数时阻塞
tryRelease 增加计数,如果为正数,则解除阻塞线程。
The CountDownLatch class uses the synchronization state to represent the count. All acquires pass when it reaches zero.
CountDownLatch类使用同步状态来表示计数。当它达到零时,所有的 获取线程 通过。
The FutureTask class uses the synchronization state to represent the run-state of a future (initial, running, cancelled, done).
Setting or cancelling a future invokes release, unblocking threads waiting for its computed value via acquire.
FutureTask使用同步状态来表示未来的运行状态(初始、运行、取消、完成)。
设置或取消 调用,释放 阻塞等待其计算值的线程。
The SynchronousQueue class (a CSP-style handoff) uses internal wait-nodes that match up producers and consumers.
It uses the synchronization state to allow a producer to proceed when a consumer takes the item, and vice-versa.
SynchronousQueue(一种csp样式的切换)使用内部的等待节点来匹配生产者和消费者。它使用同步状态来允许生产者在消费者接受该项目时继续进行,反之亦然。
CSP-style ???
Users of the java.util.concurrent package may of course define their own synchronizers for custom applications.
For example, among those that were considered but not adopted in the package are classes
providing the semantics of various flavors of WIN32 events, binary latches, centrally managed locks, and tree-based barriers.
当然,并发包用户可以自定义应用程序自己的同步器。
例如,包中考虑但没有采用的类提供各种WIN32事件、二进制锁、集中管理锁和基于树的屏障的语义。
5. PERFORMANCE
While the synchronizer framework supports many other styles of synchronization in addition to mutual exclusion locks,
lock performance is simplest to measure and compare. Even so, there are many different approaches to measurement.
The experiments here are designed to reveal overhead and throughput.
5. 性能
虽然同步器框架支持除互斥锁之外的许多其他类型的同步,锁性能是最容易测量和比较的。
即便如此,还是有许多不同的测量方法。这里的实验旨在揭示开销和吞吐量。
In each test, each thread repeatedly updates a pseudo-random number computed using function nextRandom(int seed):
int t = (seed % 127773) * 16807 – (seed / 127773) * 2836;
return (t > 0)? t : t + 0x7fffffff;
On each iteration a thread updates, with probability S, a shared generator under a mutual exclusion lock,
else it updates its own local generator, without a lock.
This results in short-duration locked regions, minimizing extraneous effects when threads are preempted while holding locks.
The randomness of the function serves two purposes: it is used in deciding whether to lock
or not (it is a good enough generator for current purposes), and also makes code within loops impossible to trivially optimize away.
在每一次迭代中,一个线程以概率S更新一个互斥锁下的共享生成器,否则,它会在没有锁的情况下更新自己的本地生成器。
这将导致短时间的锁定区域,当线程在持有锁时被抢占时,会将无关的影响降到最低。
函数的随机性有两个目的:用于决定是否锁定或者不是(对于当前的目的,它是一个足够好的生成器),并且也使得循环中的代码不可能被简单地优化掉。
Four kinds of locks were compared: Builtin, using synchronized blocks; Mutex, using a simple Mutex class like that illustrated in section 4;
Reentrant, using ReentrantLock; and Fair, using ReentrantLock set in its "fair" mode.
All tests used build 46 (approximately the same as beta2) of the Sun J2SE1.5 JDK in "server" mode.
Test programs performed 20 uncontended runs before collecting measurements, to eliminate warm-up effects.
Tests ran for ten million iterations per thread, except Fair mode tests were run only one million iterations.
四种锁的测试 synchronized内置锁 简单的互斥类 重入锁非公平模式 重入锁公平模式
jdk1.5 46 server 模式
测试程序在收集测量值之前执行20次非竞争运行,以消除热身效果。
每个线程运行1000万个迭代的测试,公平模式测试只运行100万个迭代。
Tests were performed on four x86-based machines and four UltraSparc-based machines.
All x86 machines were running Linux using a RedHat NPTL-based 2.4 kernel and libraries.
All UltraSparc machines were running Solaris-9. All systems were at most lightly loaded while testing.
The nature of the tests did not demand that they be otherwise completely idle.
The "4P" name reflects the fact a dual hyperthreaded (HT) Xeon acts more like a 4-way than a 2-way machine.
No attempt was made to normalize across the differences here.
As seen below, the relative costs of synchronization do not bear a simple relationship to numbers of processors, their types, or speeds.
在4台基于x86的机器和4台基于 UltraSparc 的机器上进行了测试。所有x86机器都使用基于RedHat nptl的2.4内核和库运行Linux。
所有 UltraSparc 机器都在运行Solaris-9。在测试时,所有系统的负载都很轻。测试的性质并不要求它们完全空闲。
“4P”名称反映了一个事实,即双超线程(HT) Xeon的行为更像一个4路而不是2路机器。
这里没有试图使 差异 标准化 。如下所示,同步的相对成本与处理器的数量、类型或速度没有简单的关系。
UltraSPARC SUN公司处理器型号
Table 1 Test Platforms
Name Processors Type Speed (Mhz)
1P 1 Pentium3 900
2P 2 Pentium3 1400
2A 2 Athlon 2000
4P 2HT Pentium4/Xeon 2400
1U 1 UltraSparc2 650
4U 4 UltraSparc2 450
8U 8 UltraSparc3 750
24U 24 UltraSparc3 750
5.1 Overhead
Uncontended overhead was measured by running only one thread,
subtracting the time per iteration taken with a version setting S=0 (zero probability of accessing shared random) from a run with S=1.
Table 2 displays these estimates of the per-lock overhead of synchronized code over unsynchronized code, in nanoseconds.
The Mutex class comes closest to testing the basic cost of the framework.
The additional overhead for Reentrant locks indicates the cost of recording the current owner thread and of error-checking,
and for Fair locks the additional cost of first checking whether the queue is empty.
5.1开销
非竞争开销是通过只运行一个线程来测量的,
从S=1的运行中减去版本设置为S=0(访问共享随机的概率为0)时每次迭代所花费的时间。
表2显示了同步代码对非同步代码的每次锁开销的估计,单位为纳秒。
互斥类最接近于测试框架的基本成本。
重入锁的额外开销表明记录当前所有者线程和错误检查的成本,
对于公平锁,首先检查队列是否为空的额外开销。
Table 2 also shows the cost of tryAcquire versus the "fast path" of a built-in lock.
Differences here mostly reflect the costs of using different atomic instructions and memory barriers across locks and machines.
On multiprocessors, these instructions tend to completely overwhelm all others.
The main differences between Builtin and synchronizer classes are apparently
due to Hotspot locks using a compareAndSet for both locking and unlocking,
while these synchronizers use a compareAndSet for acquire and a volatile write
(i.e., with a memory barrier on multiprocessors, and reordering constraints on all processors) on release.
The absolute and relative costs of each vary across machines.
表2还显示了tryAcquisition的成本与内置锁的“快速路径”的比较。
这里的差异主要反映了跨锁和机器使用不同原子指令和内存屏障的成本。
在多处理器上,这些指令往往完全压倒其他所有指令。
内建类和同步器类之间的主要区别很明显
由于热点锁使用compareAndSet同时用于锁定和解锁,
而这些同步器使用compareAndSet进行获取和volatile写入(在多处理器上设置内存屏障,并在所有处理器上重新排序约束)。
每种机器的绝对成本和相对成本各不相同。
Table 2 Uncontended Per-Lock Overhead in Nanoseconds
Machine Builtin Mutex Reentrant Fair
1P 18 9 31 37
2P 58 71 77 81
2A 13 21 31 30
4P 116 95 109 117
1U 90 40 58 67
4U 122 82 100 115
8U 160 83 103 123
24U 161 84 108 119
Table 3 Saturated Per-Lock Overhead in Nanoseconds
Machine Builtin Mutex Reentrant Fair
1P 521 46 67 8327
2P 930 108 132 14967
2A 748 79 84 33910
4P 1146 188 247 15328
1U 879 153 177 41394
4U 2590 347 368 30004
8U 1274 157 174 31084
24U 1983 160 182 32291
Table 3 also illustrates that even with low internal overhead,
context switching time completely determines performance for Fair locks.
The listed times are roughly proportional to those for blocking and unblocking threads on the various platforms.
表3还说明,即使内部开销很低,上下文切换时间完全决定了公平锁的性能。
列出的时间大致与在各种平台上阻塞和解除阻塞线程的时间成正比。
Additionally, a follow-up experiment (using machine 4P only) shows that with the very briefly held locks used here,
fairness settings had only a small impact on overall variance.
Differences in termination times of threads were recorded as a coarse-grained measure of variability.
Times on machine 4P had standard deviation of 0.7% of mean for Fair, and 6.0% for Reentrant.
As a contrast, to simulate long-held locks,
a version of the test was run in which each thread computed 16K random numbers while holding each lock.
Here, total run times were nearly identical (9.79s for Fair, 9.72s for Reentrant).
Fair mode variability remained small, with standard deviation of 0.1% of mean, while Reentrant rose to 29.5% of mean.
此外,后续的实验(仅使用machine 4P)表明,这里使用的锁非常短暂,公平设置对总体方差的影响很小。
线程终止时间的差异被记录为粗粒度的可变性度量。
在机器4P上运行的时间的标准差为平均值的0.7%,而在可重入性上为6.0%。
相比之下,为了模拟长锁,测试的一个版本是,每个线程在持有每个锁的同时计算16K个随机数。
在这里,总运行时间几乎相同(Fair为9.79秒,Reentrant为9.72秒)。
公平模式变异性仍然很小,标准差为均值的0.1%,而可重入性上升到均值的29.5%。
5.2 Throughput
Usage of most synchronizers will range between the extremes of no contention and saturation.
This can be experimentally examined along two dimensions,
by altering the contention probability of a fixed set of threads,
and/or by adding more threads to a set with a fixed contention probability.
To illustrate these effects, tests were run with different contention probablilities and numbers of threads, all using Reentrant locks.
5.2 吞吐量
大多数同步器的使用将介于无争用和饱和的极端之间。
这可以从两个维度进行实验检验,通过改变一组固定线程的争用概率,以及/或通过向具有固定争用概率的集合添加更多线程。
为了说明这些效果,我们使用不同的争用概率和线程数运行测试,所有这些测试都使用可重入锁。
On uniprocessors (1P and 1U) performance degrades with increasing contention,
but generally not with increasing numbers of threads.
Multiprocessors generally encounter much worse slowdowns under contention.
The graphs for multiprocessors show an early peak in which contention involving only a few threads
usually produces the worst relative performance.
This reflects a transitional region of performance,
in which barging and signalled threads are about equally likely to obtain locks,
thus frequently forcing each other to block. In most cases,
this is followed by a smoother region, as the locks are almost never available,
causing access to resemble the near-sequential pattern of uniprocessors;
approaching this sooner on machines with more processors.
Notice for example that the values for full contention (labelled "1.000")
exhibit relatively worse slowdowns on machines with fewer processors.
在单处理器上(1P和1U),性能随着争用的增加而下降,但通常不会随着线程数量的增加而下降。
多处理器在争用下通常会遇到更糟糕的 减缓 。多处理器的图表显示了一个早期的峰值,在这个峰值中,只涉及几个线程的争用通常会产生最差的相对性能。
这反映了性能的一个过渡区域,在这个过渡区域中,barging和发出信号的线程获得锁的可能性大致相同,因此常常相互阻塞。
在大多数情况下,这之后是一个更平滑的区域,因为锁几乎永远不可用,导致访问类似于单处理器的近顺序模式;
在拥有更多处理器的机器上更快地实现这一点。例如,请注意,在处理器较少的机器上,全争用的值(标记为“1.000”)表现出相对较差的 减缓。
On the basis of these results, it appears likely that further tuning of blocking (park/unpark) support
to reduce context switching and related overhead could provide small but noticeable improvements in this framework.
Additionally, it may pay off for synchronizer classes to employ some form of adaptive spinning
for briefly-held highly-contended locks on multiprocessors, to avoid some of the flailing seen here.
While adaptive spins tend to be very difficult to make work well across different contexts,
it is possible to build custom forms of locks using this framework,
targetted for specific applications that encounter these kinds of usage profiles.
根据这些结果,似乎可以进一步调优阻塞(park/unpark)来减少上下文切换和相关的开销,可以在这个框架中提供小而明显的改进。
此外,同步器类采用某种形式的自适应旋转可能会带来回报,对于 在多处理器上 短暂持有持有 高度争用锁,以避免这里所看到的一些抖动。
虽然自适应自旋很难在不同的环境下很好地工作,使用这个框架可以构建自定义形式的锁,
6. CONCLUSIONS
As of this writing, the java.util.concurrent synchronizer framework is too new to evaluate in practice.
It is unlikely to see widespread usage until well after final release of J2SE1.5,
and there will surely be unexpected consequences of its design, API,implementation, and performance.
However, at this point, the framework appears successful in meeting the goals of providing an efficient basis for creating new synchronizers.
6. 结论
在撰写本文时,并发同步器框架是一个新的框架,在实际应用中很难对其进行评估。
J2SE1.5的最终版本发布很久之后,才可能看到广泛的使用,而且它的设计、API、实现和性能肯定会带来意想不到的结果。
然而,在这一点上,该框架似乎成功地实现了为创建新的同步器提供有效基础的目标。
7. ACKNOWLEDGMENTS
Thanks to Dave Dice for countless ideas and advice during the development of this framework,
to Mark Moir and Michael Scott for urging consideration of CLH queues,
to David Holmes for critiquing early versions of the code and API,
to Victor Luchangco and Bill Scherer for reviewing previous incarnations of the source code,
and to the other members of the JSR166 Expert Group (Joe Bowbeer, Josh Bloch, Brian Goetz, David Holmes, and Tim Peierls)
as well as Bill Pugh, for helping with design and specifications and commenting on drafts of this paper.
Portions of this work were made possible by a DARPA PCES grant,
NSF grant EIA-0080206 (for access to the 24way Sparc) and a Sun Collaborative Research Grant.
7. 致谢
感谢Dave Dice在开发这个框架过程中提出的无数想法和建议,
为了纪念莫尔和迈克尔·斯科特敦促考虑CLH队列,
感谢David Holmes批评早期版本的代码和API,
感谢Victor Luchangco和Bill Scherer回顾了以前的源代码版本,
以及JSR166专家组的其他成员(Joe Bowbeer、Josh Bloch、Brian Goetz、David Holmes和Tim Peierls)
以及Bill Pugh,感谢他帮助设计和规范以及对本文草稿的评论。
这项工作的一部分是由DARPA PCES资助的,
国家科学基金会资助ea -0080206(用于访问24路Sparc)和Sun合作研究资助。
上一篇
下一篇
科创板要点
aerospike增加节点
aerospike删除节点
java8 Lambda 实例
富爸爸穷爸爸 摘录
会计学习指南