Thursday, July 13, 2006

P11N, not good. :-(

I bought the P11N from Daum Oncat(http://oncat.daum.net). P11N is PMP + Navigation system. Its OS is Windows Embedded OS 5 core. It supports media player, music player, doc viwer, radio, TV in/out, something like that. I like the software part of P11N. But the bad thing is that its touch screen doesn't work. Yester, I spent almost 3 hours to set it up. I installed the lastest firm-ware and patch. I googled to find another guy that had same problem like me before. I really did my best.

The thing that I can do at last is just to complain.

Today, I called the company's AS center. They said there could be problem. What????? I just bought it yesterday! not a week or a month ago!

I need this navigation system this weekend. I can't go anywhere without it. OTL(frustration)

Refund?

Volunteer Computing in Enterprise

In these days, I've been working on Volunteer Computing in Enterpirse(shortly, VC). At the very first time when I started this project, I thought VC will be a ultimate solution in parallel computing. You know how many computers waste their CPU time. As far as I know most of computers' power in enterprise is available after the employees go home. They don't power off their computers. I think it will be fifty-fifty that a computer is turned on or off. Suppose there are 100 machines in a enterprise and a machine's FLOPS is about 10. Then VC can give the enterprise about 1000 FLOPS without any investment.

BUT, there are many problems in VC. Securities, Hetrogeneous, Unstableness, Unpredictable, Scheduling, and so on. Some of them is from birth. I mean, because it is VC, not dedicated system, Hetrogeneous or Unstableness could be problem and can't help it in some meaning. But when I tried to apply this project(BOINC) into the enterprise(1noon.com), the most important problem to me was unpredictable. Unpredictable means that the system operator can't know when it will be complete. The reason that the enterprise use a VC is to get the better and faster result without any investment. But the time is not infinite. there is a time limit. You should know it will be complete in some time. If it can't complete their job in some time, it's like that the job fail.

So what have I done to predict the completion time of a job? First of all, I log the server's state every single minute. And I use the data to predict the completion time based on the following simple algorithms:
  1. CurrentTime + (((CurrentTime - StartTime) / JobsDone) * JobsRemained)
  2. CurrentTime + (((CurrentTime - StartTimeOfLast10Job) / 10) * JobsRemained)
  3. CurrentTime + (((CurrentTime - OneHour) / JobsDone1HourAgo) * JobsRemained)
After the research, I find the most accurate algorithm is #1(that is based on the total jobs), the next is #3, and the last is #2. You know, there are many factors(e.g., CPU, Memory available, HDD available, Network bandwidth, and so on) that can affect the result, so it is impossible that the one algorithm must be always better that others.

Prediction is everything? I don't think so. It's just the tip of the iceberg. If you have a successful result for the prediction, you should control the system to meet the deadline. After some trying, I figured out that it is not easy, and even impossible to some point. :-(

First of all, BOINC doesn't not support a dynamic scheduling and even if I implement it by myself, I'm not sure I can't meet the deadline. It's almost uncontrollable.

So, what should I do to demonstrated my contribution to 1noon(now it's NHN)? There are other problems that I should solve.

The best candidate is image-indexing. Originally, the problem that I want to solve is the similliar image searching. There are more than 100 million images on the internet. There would be similiar images more than 10%, I think. Especially, if the image is the picture of a very famous actor or player, most of images would be similiar or same. That's the reason why the image search engine should filter same images and don't display them to users.

In the next article, I will explain the way to find similiar images.