http://stackoverflow.com/questions/11955409/non-blocking-async-dns-resolving-in-java
Is there a clean way to resolve a DNS query (get IP by hostname) in Java asynchronously, in non-blocking way (i.e. state machine, not 1 query = 1 thread - I'd like to run tens of thousands queries simultaneously, but not run tens of thousands of threads)?
What I've found so far:
Clarification. I have a fairly large (several TB per day) amount of logs. Every log line has a host name that can be from pretty much anywhere around the internet and I need an IP address for that hostname for my further statistics calculations. Order of lines doesn't really matter, so, basically, my idea is to start 2 threads: first to iterate over lines:
Is there a clean way to resolve a DNS query (get IP by hostname) in Java asynchronously, in non-blocking way (i.e. state machine, not 1 query = 1 thread - I'd like to run tens of thousands queries simultaneously, but not run tens of thousands of threads)?
What I've found so far:
- Standard
InetAddress.getByName()
implementation is blocking and looks like standard Java libraries lack any non-blocking implementations. - Resolving DNS in bulk question discusses similar problem, but the only solution found is multi-threaded approach (i.e. one thread working on only 1 query in every given moment of a time), which is not really scalable.
- dnsjava library is also blocking only.
- There are ancient non-blocking extensions to dnsjava dating from 2006, thus lacking any modern Java concurrency stuff such as
Future
paradigm usage and, alas, very limited queue-only implementation. - dnsjnio project is also an extension to dnsjava, but it also works in threaded model (i.e. 1 query = 1 thread).
- asyncorg seems to be the best available solution I've found so far targeting this issue, but:
- it's also from 2007 and looks abandoned
- lacks almost any documentation/javadoc
- uses lots of non-standard techniques such as
Fun
class
Clarification. I have a fairly large (several TB per day) amount of logs. Every log line has a host name that can be from pretty much anywhere around the internet and I need an IP address for that hostname for my further statistics calculations. Order of lines doesn't really matter, so, basically, my idea is to start 2 threads: first to iterate over lines:
- Read a line, parse it, get the host name
- Send a query to DNS server to resolve a given host name, don't block for answer
- Store the line and DNS query socket handle in some buffer in memory
- Go to the next line
- Wait for DNS server to answer any query (using
epoll
/kqueue
like technique) - Read the answer, find which line it was for in a buffer
- Write line with resolved IP to the output
- Proceed to waiting for the next answer
AnyEvent
shows me that my idea is generally correct and I can easily achieve
speeds like 15-20K queries per second this way (naive blocking
implementation gets like 2-3 queries per second - just the sake of
comparison - so that's like 4 orders of magnitude difference). Now I
need to implement the same in Java - and I'd like to skip rolling out my
own DNS implementation ;)
No comments:
Post a Comment