浅谈JDK1.8-HashMap底层原理

it2022-05-06  2

HashMap的底层原理

HashMap是一个用于存储Key-Value键值对的集合,每一个键值对也叫Entry,这些个Entry分散存储在一个数组当中,这个数组就是HashMap的主干 HashMap数组每一个元素的初始值都是NULL 就是有一个初始大小为16的空数组,在HashMap进行put的时候,通过哈希函数计算出key的哈希值,然后存储到数组的相应位置上。

在JDK1.8中,HashMap的实现是基于数组+链表/红黑树(链表元素超过8)

其初始大小是16,扩容时容量翻倍

/** * The default initial capacity - MUST be a power of two. */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

红黑树的插入,删除和遍历的最坏时间复杂度都是log(n),因此,意外的情况或者恶意使用下导致hashCode()方法的返回值很差时,性能的下降将会是优雅的,但由于TreeNodes的大小是常规Nodes的两倍,所以只有桶中包含足够多的元素以供使用时,我们才会使用树,那么这个树为什么是8呢? 在官方文档中有一段描述: Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)). The first values are: 0: 0.60653066 1: 0.30326533 2: 0.07581633 3: 0.01263606 4: 0.00157952 5: 0.00015795 6: 0.00001316 7: 0.00000094 8: 0.00000006 more: less than 1 in ten million 理想情况下,在随机哈希代码下,桶中的节点频率遵循泊松分布,文中给出了桶长度K的频率表。由频率表可以看出,桶的长度超过8的概率非常小,作者应该是根据概率统计而选择了8作为阈值。

HashMap的基本属性

基本属性

/** * The default initial capacity - MUST be a power of two.初始大小 */ static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16 /** * The load factor used when none specified in constructor.负载因子 */ static final float DEFAULT_LOAD_FACTOR = 0.75f;

HashMap中的扩容是一项比较耗时的任务,如果能估算Map的容量,最好给它一个默认的初始值。

计算hash

/** * Computes key.hashCode() and spreads (XORs) higher bits of hash * to lower. Because the table uses power-of-two masking, sets of * hashes that vary only in bits above the current mask will * always collide. (Among known examples are sets of Float keys * holding consecutive whole numbers in small tables.) So we * apply a transform that spreads the impact of higher bits * downward. There is a tradeoff between speed, utility, and * quality of bit-spreading. Because many common sets of hashes * are already reasonably distributed (so don't benefit from * spreading), and because we use trees to handle large sets of * collisions in bins, we just XOR some shifted bits in the * cheapest possible way to reduce systematic lossage, as well as * to incorporate impact of the highest bits that would otherwise * never be used in index calculations because of table bounds. */ static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }

HashMap的数据存储结构

HashMap采用Entry数组来存储key-value键值对,每一个键值对组成了一个Entry实体,Entry实体实际上是一个单向的链表结构,它具有Next指针,可以连接下一个Entry实体,以此来解决Hash冲突的问题。 数组的存储区间是连续的,占用内存严重,故空间复杂度很大,但数组的二分查找的时间复杂度小,数组的特点是:查询快,插入和删除困难(因为插入或者删除发生在数组中间的时候,需要移动此位置后的所有元素的位置) 链表特点是:寻址困难,插入和删除容易(搜索的时候总是从第一个开始,啥时候找到了啥时候停止,不管是双向链表还是单向链表,存储的除了数据之外就是指针,这个地方找不到,移动指针到下一个地方再找) 像这种数据结构,就是HashMap的数据结构了,经过定义的Hash算法之后,将相应的hash值算出来的数据存储到相应的位置。 就跟大家小时候用过的新华字典是一个道理的东西,”王“,”旺“,”望“三个字用拼音查都是wang,这种汉字与拼音之间的关系就相当于hash函数,要想在字典详情页找到”王“,就得先找到wang所在的页数,然后所有的以wang为拼音的字都在那一块儿,我们在一个个找出自己需要的(链表)。 那么这个负载因子是怎么回事呢,就是说这种数组+链表的数据结构初始的数组长度是16的空数组,如果这个空数组有一部分满了,在符合扩容条件下的时候,就会进行扩容

/** * The next size value at which to resize (capacity * load factor). * * @serial */ // (The javadoc description is true upon serialization. // Additionally, if the table array has not been allocated, this // field holds the initial array capacity, or zero signifying // DEFAULT_INITIAL_CAPACITY.) int threshold;

threshold就是要调整的容量大小的下一个大小值(容量*负载因子),如果大小达到这个程度,就会调用扩容方法

扩容方法

/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * @return the table */ final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab != null) { for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; if (e.next == null) newTab[e.hash & (newCap - 1)] = e; else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); if (loTail != null) { loTail.next = null; newTab[j] = loHead; } if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }

在hashMap中,数组中存储的元素总是最后插入的元素(设计者考虑的是最后插入的元素使用频率高)

/** * Associates the specified value with the specified key in this map. * If the map previously contained a mapping for the key, the old * value is replaced. * * @param key key with which the specified value is to be associated * @param value value to be associated with the specified key * @return the previous value associated with <tt>key</tt>, or * <tt>null</tt> if there was no mapping for <tt>key</tt>. * (A <tt>null</tt> return can also indicate that the map * previously associated <tt>null</tt> with <tt>key</tt>.) */ public V put(K key, V value) { return putVal(hash(key), key, value, false, true); }

putVal方法

/** * Implements Map.put and related methods * * @param hash hash for key * @param key the key * @param value the value to put * @param onlyIfAbsent if true, don't change existing value * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { Node<K,V> e; K k; if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; else if (p instanceof TreeNode) e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } if (e != null) { // existing mapping for key V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) resize(); afterNodeInsertion(evict); return null; }

最后:HashMap不是线程安全的,如果想达到线程安全的作用,则要使用Hashtable,Hashtable的实现方法里面都添加了synchronized关键字来确保线程同步,因此相对而言HashMap性能会高一些,在多线程环境下如果使用HashMap则需要Collections.synchronizedMap()方法来获取一个线程安全的集合,这个方法其实就是帮我们在操作HashMap时自动添加了synchronized来实现线程同步。

public static void main(String[] args) { Map<String, Object> map = new HashMap<>(); Map<String, Object> map1 = Collections.synchronizedMap(map); }

类似于这种,调用工具类的方法扔进去一个map,在方法里面搓了一顿还是返回一个map,不过返回的这个map是线程安全的集合。

这些总结,以前都是写在某笔记上的,最近想的是不能老是一个人闭门造车,因此拿出来贴在博客上,所以如果大家看到了,发现有什么问题或者错误的地方,欢迎指正。


最新回复(0)