JDK 等源码 hash
jdk7 HashMap
JDK7 hashMap 的 hash 源码如下:
/**
* Retrieve object hash code and applies a supplemental hash function to the
* result hash, which defends against poor quality hash functions. This is
* critical because HashMap uses power-of-two length hash tables, that
* otherwise encounter collisions for hashCodes that do not differ
* in lower bits. Note: Null keys always map to hash 0, thus index 0.
*/
final int hash(Object k) {
int h = hashSeed;
if (0 != h && k instanceof String) {
return sun.misc.Hashing.stringHash32((String) k);
}
h ^= k.hashCode();
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
jdk7 ConcurrentHashMap
下面是 jdk7 源码中的 hash 算法。
/**
* Applies a supplemental hash function to a given hashCode, which
* defends against poor quality hash functions. This is critical
* because ConcurrentHashMap uses power-of-two length hash tables,
* that otherwise encounter collisions for hashCodes that do not
* differ in lower or upper bits.
*/
private int hash(Object k) {
int h = hashSeed;
if ((0 != h) && (k instanceof String)) {
return sun.misc.Hashing.stringHash32((String) k);
}
h ^= k.hashCode();
// Spread bits to regularize both segment and index locations,
// using variant of single-word Wang/Jenkins hash.
h += (h << 15) ^ 0xffffcd7d;
h ^= (h >>> 10);
h += (h << 3);
h ^= (h >>> 6);
h += (h << 2) + (h << 14);
return h ^ (h >>> 16);
}
- Wang/Jenkins hash算法
无符号64位整数版本
uint64_t hash(uint64_t key) {
key = (~key) + (key << 21); // key = (key << 21) - key - 1;
key = key ^ (key >> 24);
key = (key + (key << 3)) + (key << 8); // key * 265
key = key ^ (key >> 14);
key = (key + (key << 2)) + (key << 4); // key * 21
key = key ^ (key >> 28);
key = key + (key << 31);
return key;
}
String 经典 hash 算法
String
java中的字符串hash算法(BKDRHash)
Java中每个对象都有一个hashcode方法,主要作用是使得这些对象与容器配合使用,提供一个可供hash的整形数值给容器使用。
很显然不同对象的hashcode应该不一样,这样才能保证容器hash处理这个hashcode的时候不会出现大量冲突。
下面是java中String对象的hashcode实现方式。
JDK7 String 源码如下:
/**
* Returns a hash code for this string. The hash code for a
* <code>String</code> object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using <code>int</code> arithmetic, where <code>s[i]</code> is the
* <i>i</i>th character of the string, <code>n</code> is the length of
* the string, and <code>^</code> indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
char val[] = value;
for (int i = 0; i < value.length; i++) {
h = 31 * h + val[i];
}
hash = h;
}
return h;
}
为什么使用31作为计算的因子呢?
-
选择质数作为乘子,会大大降低hash冲突的概率。质数的值越大,hash冲突率越低
-
31参与乘法运算,可以被 JVM 优化,
31 * i = (i << 5) - i
-
使用 101 计算 hash code 容易导致整型溢出,导致计算精度丢失
times33 hash 算法
time33算法就是不停的将每个字符乘以33(Apache,PHP,Perl都使用了该算法),形式如下:
unsigned long hash = 0;
for (int i=0; i<len; i++)
{
hash = 33*hash + str[i];
}
PHP中hash初值选择的是5381
google提供的字符串hash算法
该算法的下载地址:http://code.google.com/p/cityhash/
这个算法集合功能包括将字符串散列成无符号32位数,无符号64位数,无符号128位数。
算法部分接口如下:
uint64 CityHash64(const char *buf, size_t len)
uint128 CityHash128(const char *s, size_t len)
uint32 CityHash32(const char *buf, size_t len)
STL中的字符串hash算法
STL中的hash_map底层使用的是hashtable,而不是一个红黑树。
hash_map对key的默认hash方式是:
struct hash<basic_string<_CharT,_Traits,_Alloc> > {
size_t operator()(const basic_string<_CharT,_Traits,_Alloc>& __s) const
{
return __stl_string_hash(__s);
}
}
__stl_string_hash
函数内容如下:
template <class _CharT, class _Traits, class _Alloc>
size_t __stl_string_hash(const basic_string<_CharT,_Traits,_Alloc>& __s) {
unsigned long __h = 0;
for (basic_string<_CharT,_Traits,_Alloc>::const_iterator __i = __s.begin();
__i != __s.end();
++__i)
__h = 5*__h + *__i;
return size_t(__h);
}
从上面可以看出,STL中默认计算hash的方式是:
hash[i] = 5*hash[i-1] + str[i]
FNV哈希算法
该算法对于非常相近的字符串效果很好(比如URL,IP地址等),可以保持较小的冲突率。
它有两个版本FNV1和FNV1a,下面是各自算法的hash过程。
FNV1
hash = offset_basis
for each octet_of_data to be hashed
hash = hash * FNV_prime
hash = hash xor octet_of_data
return hash
FNV1a
hash = offset_basis
for each octet_of_data to be hashed
hash = hash xor octet_of_data
hash = hash * FNV_prime
return hash
offset_basis,FNV_prime这个两个参数对于生成不同位数的hash有不同的取值,下面是它们不同情况下的取值:
FNV_prime的取值如下:
32 bit FNV_prime = 224 + 28 + 0x93 = 16777619
64 bit FNV_prime = 240 + 28 + 0xb3 = 1099511628211
128 bit FNV_prime = 288 + 28 + 0x3b = 309485009821345068724781371
offset_basis的取值如下:
32 bit offset_basis = 2166136261
64 bit offset_basis = 14695981039346656037
128 bit offset_basis = 144066263297769815596495629667062367629
更多参考 http://www.isthe.com/chongo/tech/comp/fnv/
常见 hash 算法
32 bit MixFunction
这个函数都是由我们上面讨论的一些基本的式子,之所以让采用自身赋值(即 f(key) = key )的原因是为了传递可逆性与雪崩效应。
/* Thomas Wang's32 bit Mix Function */
unsigned intdictIntHashFunction(unsigned int key)
{
key += ~(key << 15);
key ^= (key >> 10);
key += (key << 3);
key ^= (key >> 6);
key += ~(key << 11);
key ^= (key >> 16);
return key;
}
64 bit Mix Functions
这是 redis 源码中的一个指纹生成函数,它会根据 dict 结构体的当前状态生成一个标识, 如果之后该结构体的内容发生变化,其指纹也会发生变化。
/* A fingerprintis a 64 bit number that represents the state of the dictionary
* at a given time, it's just a few dictproperties xored together.
* When an unsafe iterator is initialized, weget the dict fingerprint, and check
* the fingerprint again when the iterator isreleased.
* If the two fingerprints are different itmeans that the user of the iterator
* performed forbidden operations against thedictionary while iterating. */
long longdictFingerprint(dict *d) {
long long integers[6], hash = 0;
int j;
//将dict结构体中的几个状态放入到数组中,以便后面应用到64 bit MixFunctions中。
//dict结构体其实就是一个hash表的实现,而这些状态其实就是第一、第二哈希表的表地址、表大小与
//已用条目的数量
integers[0] = (long) d->ht[0].table;
integers[1] = d->ht[0].size;
integers[2] = d->ht[0].used;
integers[3] = (long) d->ht[1].table;
integers[4] = d->ht[1].size;
integers[5] = d->ht[1].used;
/* We hash N integers by summing everysuccessive integer with the integer
* hashing of the previous sum. Basically:
*
* Result =hash(hash(hash(int1)+int2)+int3) ...
*
* This way the same set of integers in adifferent order will (likely) hash
* to a different number. */
//利用64 bit Mix Functions,将这些状态信息混合到hash中,组成最后的指纹,如果这些状态中有一个
//出现变化,可以通过一个算法逆推出该状态变化之前的值。例如,d->ht[0].size发生变化,则我们可
//以通过hash和其他的几个状态,逆推出d->ht[0].size的最初值。
for (j = 0; j < 6; j++) {
hash += integers[j];
/* For the hashing step we use TomasWang's 64 bit integer hash. */
hash = (~hash) + (hash << 21); //hash = (hash << 21) - hash - 1;
hash = hash ^ (hash >> 24);
hash = (hash + (hash << 3)) +(hash << 8); // hash * 265
hash = hash ^ (hash >> 14);
hash = (hash + (hash << 2)) +(hash << 4); // hash * 21
hash = hash ^ (hash >> 28);
hash = hash + (hash << 31);
}
return hash;
}
MurmurHash2
MurmurHash是一种很出名的非加密型哈希函数,适用于一般的哈希检索操作。
目前有三个版本(MurmurHash1、MurmurHash2、MurmurHash3)。
- 算法思想
该算法的基本思想就是把 key 分成 n 组,每组 4 个字符,把这 4 个字符看成是一个 uint_32,进行 n 次运算, 得到一个 h,然会在对 h 进行处理,得到一个相对离散的哈希结果。
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.io.UnsupportedEncodingException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
/**
* This is a very fast, non-cryptographic hash suitable for general hash-based
* lookup. See http://murmurhash.googlepages.com/ for more details.
* <p/>
* <p>
* The C version of MurmurHash 2.0 found at that site was ported to Java by
* Andrzej Bialecki (ab at getopt org).
* </p>
*/
public class MurmurHash {
private static final String UTF_8 = "UTF-8";
public static long hash64A(byte[] data, int seed) {
return hash64A(ByteBuffer.wrap(data), seed);
}
public static long hash64A(ByteBuffer buf, int seed) {
ByteOrder byteOrder = buf.order();
buf.order(ByteOrder.LITTLE_ENDIAN);
long m = 0xc6a4a7935bd1e995L;
int r = 47;
long h = seed ^ (buf.remaining() * m);
long k;
while (buf.remaining() >= 8) {
k = buf.getLong();
k *= m;
k ^= k >>> r;
k *= m;
h ^= k;
h *= m;
}
if (buf.remaining() > 0) {
ByteBuffer finish = ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN);
// for big-endian version, do this first:
// finish.position(8-buf.remaining());
finish.put(buf).rewind();
h ^= finish.getLong();
h *= m;
}
h ^= h >>> r;
h *= m;
h ^= h >>> r;
buf.order(byteOrder);
return h;
}
public long hash(byte[] key) {
return hash64A(key, 0x1234ABCD);
}
public long hash(String key) {
return hash(encode(key));
}
private byte[] encode(String data) {
try {
return data.getBytes(UTF_8);
} catch (UnsupportedEncodingException e) {
throw new IllegalArgumentException(e);
}
}
}
djb hash
djb 哈希算法很简单,只有几行代码,其功能与 MurmurHash 类似,都是将字符串转换为 hash 值,下面是它的代码:
/* And a caseinsensitive hash function (based on djb hash) */
unsigned int dictGenCaseHashFunction(constunsigned char *buf, int len) {
unsigned int hash = (unsignedint)dict_hash_function_seed;
while (len--)
hash = ((hash << 5) + hash) +(tolower(*buf++)); /* hash * 33 + c */
return hash;
}
String hash 的详细讲解
大家都知道,HashMap中定位到桶的位置 是根据Key的hash值与数组的长度取模来计算的。
jdk 源码
JDK8中的hash 算法:
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
取模算法:
hash(key)&(length-1)
取模算法为什么用的是位与运算?
由于位运算直接对内存数据进行操作,不需要转成十进制,因此处理速度非常快。
对2的倍数取模,只要将数与2的倍数-1做按位与运算即可。
对原理感兴趣的可以参考 java 位运算
为什么不直接使用key.hashcode()进行取模运算?
我们知道hash的目的是为了尽量分布均匀。
取模做位与运算的时候,实际上刚刚开始数组的长度一般比较小,只利用了低16位,高16位是用不到的。
这种情况下,产生hash冲突的概率会大大增加。
这样设计保证了对象的hashCode的高16位的变化能反应到低16位中,相比较而言减少了hash冲突的情况 。
选用亦或的方式是因为&和 | 都会使得结果偏向0或者1 ,并不是均匀的概念。 |
java hash 实现大全
前言
对于下面的多种 hash 算法,感兴趣可以去查看详细的原理。
平时整理为工具类,直接使用即可。
实现
/**
* Hash算法大全<br>
* 推荐使用FNV1算法
* @algorithm None
* @author Goodzzp 2006-11-20
* @lastEdit Goodzzp 2006-11-20
* @editDetail Create
*/
public class HashAlgorithms
{
/**
* 加法hash
* @param key 字符串
* @param prime 一个质数
* @return hash结果
*/
public static int additiveHash(String key, int prime)
{
int hash, i;
for (hash = key.length(), i = 0; i < key.length(); i++)
hash += key.charAt(i);
return (hash % prime);
}
/**
* 旋转hash
* @param key 输入字符串
* @param prime 质数
* @return hash值
*/
public static int rotatingHash(String key, int prime)
{
int hash, i;
for (hash=key.length(), i=0; i<key.length(); ++i)
hash = (hash<<4)^(hash>>28)^key.charAt(i);
return (hash % prime);
// return (hash ^ (hash>>10) ^ (hash>>20));
}
// 替代:
// 使用:hash = (hash ^ (hash>>10) ^ (hash>>20)) & mask;
// 替代:hash %= prime;
/**
* MASK值,随便找一个值,最好是质数
*/
static int M_MASK = 0x8765fed1;
/**
* 一次一个hash
* @param key 输入字符串
* @return 输出hash值
*/
public static int oneByOneHash(String key)
{
int hash, i;
for (hash=0, i=0; i<key.length(); ++i)
{
hash += key.charAt(i);
hash += (hash << 10);
hash ^= (hash >> 6);
}
hash += (hash << 3);
hash ^= (hash >> 11);
hash += (hash << 15);
// return (hash & M_MASK);
return hash;
}
/**
* Bernstein's hash
* @param key 输入字节数组
* @param level 初始hash常量
* @return 结果hash
*/
public static int bernstein(String key)
{
int hash = 0;
int i;
for (i=0; i<key.length(); ++i) hash = 33*hash + key.charAt(i);
return hash;
}
//
//// Pearson's Hash
// char pearson(char[]key, ub4 len, char tab[256])
// {
// char hash;
// ub4 i;
// for (hash=len, i=0; i<len; ++i)
// hash=tab[hash^key[i]];
// return (hash);
// }
//// CRC Hashing,计算crc,具体代码见其他
// ub4 crc(char *key, ub4 len, ub4 mask, ub4 tab[256])
// {
// ub4 hash, i;
// for (hash=len, i=0; i<len; ++i)
// hash = (hash >> 8) ^ tab[(hash & 0xff) ^ key[i]];
// return (hash & mask);
// }
/**
* Universal Hashing
*/
public static int universal(char[]key, int mask, int[] tab)
{
int hash = key.length, i, len = key.length;
for (i=0; i<(len<<3); i+=8)
{
char k = key[i>>3];
if ((k&0x01) == 0) hash ^= tab[i+0];
if ((k&0x02) == 0) hash ^= tab[i+1];
if ((k&0x04) == 0) hash ^= tab[i+2];
if ((k&0x08) == 0) hash ^= tab[i+3];
if ((k&0x10) == 0) hash ^= tab[i+4];
if ((k&0x20) == 0) hash ^= tab[i+5];
if ((k&0x40) == 0) hash ^= tab[i+6];
if ((k&0x80) == 0) hash ^= tab[i+7];
}
return (hash & mask);
}
/**
* Zobrist Hashing
*/
public static int zobrist( char[] key,int mask, int[][] tab)
{
int hash, i;
for (hash=key.length, i=0; i<key.length; ++i)
hash ^= tab[i][key[i]];
return (hash & mask);
}
// LOOKUP3
// 见Bob Jenkins(3).c文件
// 32位FNV算法
static int M_SHIFT = 0;
/**
* 32位的FNV算法
* @param data 数组
* @return int值
*/
public static int FNVHash(byte[] data)
{
int hash = (int)2166136261L;
for(byte b : data)
hash = (hash * 16777619) ^ b;
if (M_SHIFT == 0)
return hash;
return (hash ^ (hash >> M_SHIFT)) & M_MASK;
}
/**
* 改进的32位FNV算法1
* @param data 数组
* @return int值
*/
public static int FNVHash1(byte[] data)
{
final int p = 16777619;
int hash = (int)2166136261L;
for(byte b:data)
hash = (hash ^ b) * p;
hash += hash << 13;
hash ^= hash >> 7;
hash += hash << 3;
hash ^= hash >> 17;
hash += hash << 5;
return hash;
}
/**
* 改进的32位FNV算法1
* @param data 字符串
* @return int值
*/
public static int FNVHash1(String data)
{
final int p = 16777619;
int hash = (int)2166136261L;
for(int i=0;i<data.length();i++)
hash = (hash ^ data.charAt(i)) * p;
hash += hash << 13;
hash ^= hash >> 7;
hash += hash << 3;
hash ^= hash >> 17;
hash += hash << 5;
return hash;
}
/**
* Thomas Wang的算法,整数hash
*/
public static int intHash(int key)
{
key += ~(key << 15);
key ^= (key >>> 10);
key += (key << 3);
key ^= (key >>> 6);
key += ~(key << 11);
key ^= (key >>> 16);
return key;
}
/**
* RS算法hash
* @param str 字符串
*/
public static int RSHash(String str)
{
int b = 378551;
int a = 63689;
int hash = 0;
for(int i = 0; i < str.length(); i++)
{
hash = hash * a + str.charAt(i);
a = a * b;
}
return (hash & 0x7FFFFFFF);
}
/* End Of RS Hash Function */
/**
* JS算法
*/
public static int JSHash(String str)
{
int hash = 1315423911;
for(int i = 0; i < str.length(); i++)
{
hash ^= ((hash << 5) + str.charAt(i) + (hash >> 2));
}
return (hash & 0x7FFFFFFF);
}
/* End Of JS Hash Function */
/**
* PJW算法
*/
public static int PJWHash(String str)
{
int BitsInUnsignedInt = 32;
int ThreeQuarters = (BitsInUnsignedInt * 3) / 4;
int OneEighth = BitsInUnsignedInt / 8;
int HighBits = 0xFFFFFFFF << (BitsInUnsignedInt - OneEighth);
int hash = 0;
int test = 0;
for(int i = 0; i < str.length();i++)
{
hash = (hash << OneEighth) + str.charAt(i);
if((test = hash & HighBits) != 0)
{
hash = (( hash ^ (test >> ThreeQuarters)) & (~HighBits));
}
}
return (hash & 0x7FFFFFFF);
}
/* End Of P. J. Weinberger Hash Function */
/**
* ELF算法
*/
public static int ELFHash(String str)
{
int hash = 0;
int x = 0;
for(int i = 0; i < str.length(); i++)
{
hash = (hash << 4) + str.charAt(i);
if((x = (int)(hash & 0xF0000000L)) != 0)
{
hash ^= (x >> 24);
hash &= ~x;
}
}
return (hash & 0x7FFFFFFF);
}
/* End Of ELF Hash Function */
/**
* BKDR算法
*/
public static int BKDRHash(String str)
{
int seed = 131; // 31 131 1313 13131 131313 etc..
int hash = 0;
for(int i = 0; i < str.length(); i++)
{
hash = (hash * seed) + str.charAt(i);
}
return (hash & 0x7FFFFFFF);
}
/* End Of BKDR Hash Function */
/**
* SDBM算法
*/
public static int SDBMHash(String str)
{
int hash = 0;
for(int i = 0; i < str.length(); i++)
{
hash = str.charAt(i) + (hash << 6) + (hash << 16) - hash;
}
return (hash & 0x7FFFFFFF);
}
/* End Of SDBM Hash Function */
/**
* DJB算法
*/
public static int DJBHash(String str)
{
int hash = 5381;
for(int i = 0; i < str.length(); i++)
{
hash = ((hash << 5) + hash) + str.charAt(i);
}
return (hash & 0x7FFFFFFF);
}
/* End Of DJB Hash Function */
/**
* DEK算法
*/
public static int DEKHash(String str)
{
int hash = str.length();
for(int i = 0; i < str.length(); i++)
{
hash = ((hash << 5) ^ (hash >> 27)) ^ str.charAt(i);
}
return (hash & 0x7FFFFFFF);
}
/* End Of DEK Hash Function */
/**
* AP算法
*/
public static int APHash(String str)
{
int hash = 0;
for(int i = 0; i < str.length(); i++)
{
hash ^= ((i & 1) == 0) ? ( (hash << 7) ^ str.charAt(i) ^ (hash >> 3)) :
(~((hash << 11) ^ str.charAt(i) ^ (hash >> 5)));
}
// return (hash & 0x7FFFFFFF);
return hash;
}
/* End Of AP Hash Function */
/**
* JAVA自己带的算法
*/
public static int java(String str)
{
int h = 0;
int off = 0;
int len = str.length();
for (int i = 0; i < len; i++)
{
h = 31 * h + str.charAt(off++);
}
return h;
}
/**
* 混合hash算法,输出64位的值
*/
public static long mixHash(String str)
{
long hash = str.hashCode();
hash <<= 32;
hash |= FNVHash1(str);
return hash;
}
拓展阅读
参考资料
- hash 算法
http://web.archive.org/web/20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm
https://segmentfault.com/a/1190000010990136
https://blog.csdn.net/jasper_xulei/article/details/18364313
https://www.cnblogs.com/hzorac/p/5399042.html
https://www.jianshu.com/p/32faad2d711f
http://blog.jobbole.com/106733/
https://blog.csdn.net/tanggao1314/article/details/51457585
- 相关算法