What is hash function in java

Содержание

Hash : How does it work internally?
5 Answers 5
What is a hash function in java?
8 Answers 8

Hash : How does it work internally?

I know my question list is big but I would really appreciate if I can get some clear answers to these questions as I really want to understand the topic.

Try Hash table on Wikipedia instead. A hash function is used as part of the process, but does not explain «how» a hash table works.

There is no such thing as a HashList in Java or any other language I am aware of. Don’t use code formatting for text that isn’t code.

5 Answers 5

Here is a good explanation about hashing. For example you want to store the string «Rachel» you apply a hash function to that string to get a memory location. myHashFunction(key: «Rachel» value: «Rachel») —> 10 . The function may return 10 for the input «Rachel» so assuming you have an array of size 100 you store «Rachel» at index 10. If you want to retrieve that element you just call GetmyHashFunction(«Rachel») and it will return 10. Note that for this example the key is «Rachel» and the value is «Rachel» but you could use another value for that key for example birth date or an object. Your hash function may return the same memory location for two different inputs, in this case you will have a collision you if you are implementing your own hash table you have to take care of this maybe using a linked list or other techniques.
Here are some common hash functions used. A good hash function satisfies that: each key is equally likely to hash to any of the n memory slots independently of where any other key has hashed to. One of the methods is called the division method. We map a key k into one of n slots by taking the remainder of k divided by n. h(k) = k mod n . For example if your array size is n = 100 and your key is an integer k = 15 then h(k) = 10 .
Hashtable is synchronised and Hashmap is not. Hashmap allows null values as key but Hashtable does not.
The purpose of a hash table is to have O(c) constant time complexity in adding and getting the elements. In a linked list of size N if you want to get the last element you have to traverse all the list until you get it so the complexity is O(N). With a hash table if you want to retrieve an element you just pass the key and the hash function will return you the desired element. If the hash function is well implemented it will be in constant time O(c) This means you dont have to traverse all the elements stored in the hash table. You will get the element «instantly».
Of couse a programer/developer computer scientist needs to know about data structures and complexity =)

Both the links you have provided take me to wiki page which i have already visited and have mentioned in question that i have gone through them so can you update your first 2 points ?

Hashing means generating a (hopefully) unique number that represents a value.
Different types of values ( Integer , String , etc) use different algorithms to compute a hashcode.
HashMap and HashTable are maps; they are a collection of unqiue keys, each of which is associated with a value.
Java doesn’t have a HashList class. A HashSet is a set of unique values.
Getting an item from a hashtable is constant-time with regard to the size of the table.
Computing a hash is not necessarily constant-time with regard to the value being hashed.
For example, computing the hash of a string involves iterating the string, and isn’t constant-time with regard to the size of the string.
These are things that people ought to know.

No, it won’t. It is not possible to generate a unique 32-bit number for every possible string. That’s why collisions exist.

@Rachel: Hash don’t try to generate unique number. It tries to create a homogenous distribution for ouput, so that each output value would have roughly 1/nr_of_possible_hash_values probability.

Often you can use the number direct as a hashcode. long has a limited range, so this would be a perfect hashfunction with a 1:1 mapping and no collisions.

@Rachel — what @ruslik means is this: a hash outputs a number, and that number is within a specific range of numbers. As an example, that number might be between 0 and 2^32-1. When you hash some data, a number in that range is returned, and ideally every single possible number is equally likely to be returned. The reason you want this for Hash Tables/Maps is that the table can be thought of as an array. When you hash a value, you use that number as the index in the array. You want to avoid using the same index twice. If the numbers a uniformly distributed then a collision is less likely.

Hashing is transforming a given entity (in java terms — an object) to some number (or sequence). The hash function is not reversable — i.e. you can’t obtain the original object from the hash. Internally it is implemented (for java.lang.Object by getting some memory address by the JVM.
The JVM address thing is unimportant detail. Each class can override the hashCode() method with its own algorithm. Modren Java IDEs allow for generating good hashCode methods.
Hashtable and hashmap are the same thing. They key-value pairs, where keys are hashed. Hash lists and hashsets don’t store values — only keys.
Constant-time means that no matter how many entries there are in the hashtable (or any other collection), the number of operations needed to find a given object by its key is constant. That is — 1, or close to 1
This is basic computer-science material, and it is supposed that everyone is familiar with it. I think google have specified that the hashtable is the most important data-structure in computer science.

Читайте также: Binary Converter

Can you example algorithm implementation for generating hashcode function from long ? Also I was asked in interview what is the algorithm for generating hashcode, I was not sure of currently internal working and so wanted to understand how is it done internally.

you can look at java.lang.Long ‘s documentation for that — the code is one-line: return (int)(value ^ (value >>> 32));

I’ll try to give simple explanations of hashing and of its purpose.

First, consider a simple list. Each operation (insert, find, delete) on such list would have O(n) complexity, meaning that you have to parse the whole list (or half of it, on average) to perform such an operation.

Hashing is a very simple and effective way of speeding it up: consider that we split the whole list in a set of small lists. Items in one such small list would have something in common, and this something can be deduced from the key. For example, by having a list of names, we could use first letter as the quality that will choose in which small list to look. In this way, by partitioning the data by the first letter of the key, we obtained a simple hash, that would be able to split the whole list in ~30 smaller lists, so that each operation would take O(n)/30 time.

However, we could note that the results are not that perfect. First, there are only 30 of them, and we can’t change it. Second, some letters are used more often than others, so that the set with Y or Z will be much smaller that the set with A . For better results, it’s better to find a way to partition the items in sets of roughly same size. How could we solve that? This is where you use hash functions. It’s such a function that is able to create an arbitrary number of partitions with roughly the same number of items in each. In our example with names, we could use something like

This would assure a quite even distribution and configurable number of sets (also called buckets).

Источник

What is a hash function in java?

I have check out this Wikipedia page on it, but I still don’t understand it. Can someone please help my dim-witted mind to understand the concepts of hashing, hashtable/hashmap, and hash functions? Some examples would really help.

What about the wikipedia article don’t you understand? Otherwise we’d just be repeating the same info.

The article seems quite clear to me, so I would find it difficult to come up with an alternative explanation in general. Could you be more specific about what it is you don’t understand in that article?

8 Answers 8

The Wikipedia article will have a lot of technical information, but a simplistic view of hashing is something like the following.

Imagine that there’s a magical function that can give a number to any object. Given the same object, it always return the same number.

Immediately now you have a quick way to test if two objects are the same: ask this function for their numbers and compare. If they’re different, then they’re not the same.

But what if they have the same number? Could two different objects have the same number?

Yes, this is possible in most scenario. Let’s say that the function can only give numbers between 1..10, for example, and there are 100 different objects. Then of course some different objects must have the same number. This is what is called a «collision». A «collision» makes our quick equality test not as useful, so as much as possible we want to minimize its happening. A good magical function is one that would try to minimize the number of «collision».

So what else can you do with this number? Well, you can use it to index an array. Given an object, you can put it at the index given by the number from this magical function. This array is essentially what a hashtable is; this magical function is a hash function.

A hash function is a way to create a compact representation of an arbitrarily large amount of data. In java with the hashcode method this means somehow describing the state of your object (no matter how large) in an int (4 bytes). And is usually written to be a fairly fast as explained below.

To simplify in hashtables/hashmaps the hashcode serves as sort of a cheap equals. Take two objects a and b of type Foo lets says to figure out if a.equals(b) takes 500 ms where as calculating a (efficient) hashcode only take 10ms. So if we want to know if a.equals(b) instead of doing that directly first we will look at the hashcodes and ask does a.hashCode() == b.hashCode(). Note that this will take only 20ms in our example.

Because of the API definition of hashcode we know that if the hashcode of a is not equal to b then a.equals(b) should never be true. So in our above test if we see the hashcodes are unequal then we never need to do the longer .equals() test, this is why you should always override hashCode and equals together.

You may also see references about writing «good» or «well distributed» hashcodes. This has to do with the fact that the inverse of the previous statements about hashcode and equals is not true. More specifically a.hashCode() == b.hashCode() does not necessarily imply a.equals(b) So the idea of a good hashcode is you reduce the likelyhood of a.hashCode() == b.hashCode() when a.equals(b) is false. You may have seen this referred to as a collision of a hash function.

Back to hashmaps/tables. These are based on key/value pairs. So when you add or retrieve a value you will supply a key. So the first thing the map has to do is look for the key, which means finding something that .equals() the key you provide. But as we discussed above .equals() can be incredibly slow which means comparisons can be greatly sped up by checking hashcodes first. Since when the hashcodes are well distributed you should know quickly when x is definitely != y.

Now in addition to the comparison hashmaps/tables actually use the hashcodes to organize their internal storage of the data, however I think that is beyond the scope of what you are looking to understand at this point.

Источник