Monday, October 31, 2011

How to create a cost sharing app using Google Spreadsheets

I want to document a solution to a practical problem: a few people may want to share the expenses of various kind in a fair fashion.

The straightforward way is for everyone to just announce their expenses, and collect 1/n parts of them from everyone else. However, it is obvious that this is a very tedious way of doing things: say, 5 people spend 10 euros each on some things, thus each of them will need to give 4 times 2 euro - and receive 4 times 2 euro - the result of all this activity being exactly the same when it has started!

Thus, it's a clear area of optimization. At first I thought about a web app. Google apps could do that. However, I decided to try to be even lazier and to implement the whole thing using Google spreadsheets. The result is a successful "app" which in the main "screen" (the main sheet of the spreadsheet) displays for every person whether they owe money to someone or whether they need to collect money. A person who needs to collect money can chase some of the debtors, and then, once collected, put a note about that. The system will recalculate the debts.

First, the "theory" (if I may call this so:).

We start with a per-person value of how much expense do they incur. The "fairness" means that all of them would have equal expenses - so the target value is the average of the expenses. As one person gives the money to the other one in order to "balance" - this event needs to be recorded in order to decrease the imbalance for both - in different directions.

That's all that is there, really. The rest is the mechanics.

First, let's assume that the users are "normal" people and they do not want to enter the data into the spreadsheet manually (not only it is boring but is also very error prone).

So, we create so called "web form". Pitifully only one instance of this seems to be possible. Anyway, we make a web form with 5 fields:
"My name is": dropdown list with all the names.
"I have made an": dropdown with 2 choices: "expense" and "repayment"
"How many euros": the text input to put the $$$ value in.
"What for?": freeform text field to document the purpose of transaction.
"If you repaid to someone, to whom?": dropdown with the default choice of "---" and then the same names as in the first field.

Having to manually enter the names is somewhat annoying, I haven't figured out the way around it. Anyway, as you try entering the data into this form, you notice it starts to fill the sheet in. You will see the first line of the sheet being: "A>Timestamp |B> I am: | C>I have made a: | D>How many euros ? | E>What for ? | F>If you repaid someone, to whom did you repay ?" (the "A>" is the name of the column, for reference).

Starting from row 2 you will see the actual data. We need to shift the data to row 12 - the rows 2..11 we will need for various aggregate calculations.

The column "K" will hold the labels for them - just for us to remember. The contents of this column, starting from "K2":

Who
Got/Made
Type
Total
DeltaAvg(+/-)
Repaid
Received
Balance

Now, in L2, M2, N2... we need to put in the names - exactly the same way as we have typed them earlier in the web form. Let's say we have four people: Amy, Bob, Carol, Dave. L2 will read Amy, M2 - Bob, N2 - Carol, O2 - Dave; Repeat two more times in the same order. So P2 and T2 will read Amy, etc.

Why the names are repeated three times ? This will become obvious shortly. Go to the third line, and leave L3..O3 empty, then P3..S3 will become "made" and T3..W3 will become "got". Now go to the line 4, and the cells L4..O4 will need to have "expense", and the cells P4..W3 will need to have "repayment".

Now this clarifies the purpose of the three groups - this is a mechanism to untangle the data from the dense web form: each column will only get a value if that particular person was involved in that particular activity.

To calculate the total expenses per person, enter the following formula into L5: =SUM(L$12:L), and copypaste it into M5..W5 - notice that the column will auto-change. This is precisely as we need it. The line 5 will hold the calculated totals per activity per person.

Now let's calculate the average of total expenses per person. Enter the formula into the J6: =average(L5:O5), and put the label into J5 saying "Average total expense" or something similar.

L6 will hold the signed delta between the average and the expenses for Amy: =$J$6-L$5; copy-paste this same formula to M6..O6.

L7 will hold the amount of how much Amy has repaid to others, so the formula will be naturally the sum of her repayments: =SUM(P12:P); similarly to previous value - copy-paste this formula into M7..O7.

L8 will hold the amount of how much Amy has received from others in repayments. So, the formula is: =SUM(T12:T); again copy-paste this formula into M8..O8 - the column will auto-update to correct value.

L9 will hold the final "balance" for Amy - if she needs to pay, the value is positive, if she needs to get money from others, it will be negative. The formula is: =L6-L7+L8
and similarly needs to be copy-pasted into M9..O9.

We are almost there - now we need to actually populate the rows with the monetary values... One problem: the rows with the actual data are being dynamically added, how do we add the formulae there ? The answer is with the "arrayformula" - it autoexpands down as needed.

So, put the following formula into L12: =ARRAYFORMULA(IF($B12:$B=L$2,IF($C12:$C=L$4,$D12:$D,0),0))

This says that if the name of the participant matches the name for this column and the name of the activity matches - then the number entered will appear here, else it will be zero.

Copy this formula to M12..S12. This way we make sparse tables for two activities - entering expenses, and entering the repayments to others. If you move to L13, you will see the following formula autopopulated: =CONTINUE(L12, 2, 1). Neat, huh ?

Now we need to populate the third group - the synthetic one that captures the amounts the person *receives* in repayments. For this we need just to slightly modify the above arrayformula, and put it into T12: =ARRAYFORMULA(IF($F12:$F=T$2, IF($C12:$C=T$4,$D12:$D,0), 0))

You see that the only thing changed is the column from the form entry with which we compare.

Copy-paste this formula to U12..W12. We are almost there.

This completes the "brains" of the application, and the line 9 is already usable,
but we can make it more user friendly. Rename the current only sheet into "Expenses" and lock it. Create a new sheet called "Dashboard".

Go to cell B3 in this spreadsheet, and enter the formula: =Expenses!L2 - this will autopopulate the first name. Copy-paste this formula to C3..E3 - this will populate the other names.

Now we need to tell the status of the balances of these people.

Go to B5, and enter this formula:

=IF(Expenses!L9>0;concatenate("Needs to pay " , Expenses!L9, " euros"); IF(Expenses!L9<0; Concatenate("Needs to take ", -Expenses!L9, " euros"); ""))

This will translate the sign into the appropriate action for the person.

Last touch: we can liven the actions up, if we color the "needs to pay" actions with one color, and "needs to take" with another color.

Right-click the B5, and select "Conditional formatting".

Add a new rule with "text contains" "pay" and select red background and green color, and another rule with "text contains" "take" and select green background and red color.

Copy-paste the cell from B5 to C5..E5.

Lock this sheet too. You are done and now have an expense tracking application fully embedded into google spreadsheets.

Use the Web form to enter the expenses and the notes of repayment - and use the "Dashboard" as an indicator of ongoing balances - you can set your own rules when you need to repay to others, e.g. when the amount of money reaches over certain threshold.

There is also in theory a possible loss of precision that might happen as the total expense grows. If you think this is a problem - write your solution. I think I know how to solve it, but I don't want to spoil the fun for you!

Oh - and if you find this writeup useful - please leave a comment!

Wednesday, August 10, 2011

Want my attention ? Send me something useful.

[this is an expansion of my tweet earlier today.]

Getting the email address of your users and keeping in touch with them is a classic technique. So classic that it's been mentioned in umpteen different places - I won't even bother to find the references. So classic that quite a lot of companies do pick it up, eventually.

However, there is one detail. The emails that you send need to be wanted by the user. This is a detail that escapes frequently. As a result, I am getting a lot of updates from various companies "hey, we are doing this!" or "please take a look at our new new model of humbambillistic hyperbanana!" - while, if they noticed, I did not really use their service. I tried and did not find it too compelling, or it did not solve my problem.

More impulsive of you will exclaim - "hey but unwanted email is precisely the definition of spam!"

Well, not quite. The thing is, I might have agreed to be mailed at a particular point in time. When I thought this source might be interesting. However, the first mails proved not as much substance as I wanted. Should I unsubscribe ? Maybe. However, sometimes with the subcontractors acting on behalf of representatives of the newly formed subsidirary of the ... well, you get the point. It might not be trivial.
On the other hand, I do not want to send this straight away to GMail's "spam" folder - because it will harm the users who maybe are interested.

So, I found a way that would not harm anyone and requires minimal involvement from my side - simply create filters and transit those "not interesting anymore" messages to a special folder to settle.

Maybe I read it one day when I don't have anything to do, maybe I won't. But I won't read them now.

I do this if I consciously remember that I get more than two messages with the first three sentences not holding the content that is informative and useful for me.
This reveals the other side of the coin - it's somewhat ok to poke me if I do not remember when was the last time you poked me. Unless the background pokes from different places burst together such that I get sufficiently annoyed with random flood of poke-like messages on a particular day.

I wonder if I am unique in seeing this problem (and this maybe just my own laziness), or there is indeed something to it.

Thursday, August 4, 2011

Uploading the files to the VMWare ESXi server

Every now and then I need to upload the stuff like an ISO image to my ESXi server.
Since I do not use windows, my only remaining option is the CLI - which is big and clunky. However, today I digged a bit and after reading the CLI tools source realized that uploading the file onto an ESXi host is trivially simple:
  1. Browse the datastores till you get to the correct place, take that URL and append the target file name.
  2. perform the HTTP PUT request for that URL, supplying the data.
This way, the only tool you need to upload the files to VMWare ESXi really is curl:
curl -k -X PUT --data-binary @IMAGE.ISO 'https://user:pass@host/folder/FOLDERNAME/IMAGE.ISO?dcPath=DCNAME&dsName=DATASTORE'
EDIT:
The above command is a problem if the files you are uploading are large (say, a VMDK or a DVD image), and the machine you are working on is not memory-rich. You will observe the error message from curl "curl: option --data-binary: out of memory". A better approach seems to be to use "-T" option, which allows to specify the file name to upload:

curl -k -X PUT -T IMAGE.ISO 'https://user:pass@host/folder/FOLDERNAME/IMAGE.ISO?dcPath=DCNAME&dsName=DATASTORE'

Thursday, July 28, 2011

Microbenchmarking of luajit-based server (again: this time against lighttpd)

The other day inbetween the meetings I've ported my toy event loop experiment to use ljsyscall library.

Here are the results of running of ab -n 100000 -c 1000. First let's make a baseline:

lighttpd



# ab -n 100000 -c 1000 http://localhost:80/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Finished 100000 requests


Server Software: lighttpd/1.4.19
Server Hostname: localhost
Server Port: 80

Document Path: /
Document Length: 10 bytes

Concurrency Level: 1000
Time taken for tests: 7.854757 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Total transferred: 24331590 bytes
HTML transferred: 1001300 bytes
Requests per second: 12731.14 [#/sec] (mean)
Time per request: 78.548 [ms] (mean)
Time per request: 0.079 [ms] (mean, across all concurrent requests)
Transfer rate: 3025.05 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 39 281.0 15 3023
Processing: 5 19 16.5 21 740
Waiting: 3 14 15.9 14 734
Total: 9 58 287.6 38 3746

Percentage of the requests served within a certain time (ms)
50% 38
66% 40
75% 42
80% 43
90% 45
95% 47
98% 50
99% 94
100% 3746 (longest request)
#

luajit


Now let's try it on the primitive event loop.

# ab -n 100000 -c 1000 http://localhost:12345/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Finished 100000 requests


Server Software:
Server Hostname: localhost
Server Port: 12345

Document Path: /
Document Length: 13 bytes

Concurrency Level: 1000
Time taken for tests: 8.232656 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Total transferred: 5503355 bytes
HTML transferred: 1300793 bytes
Requests per second: 12146.75 [#/sec] (mean)
Time per request: 82.327 [ms] (mean)
Time per request: 0.082 [ms] (mean, across all concurrent requests)
Transfer rate: 652.77 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 44 336.4 6 3016
Processing: 4 10 13.9 10 766
Waiting: 3 8 13.9 7 762
Total: 10 55 341.0 16 3767

Percentage of the requests served within a certain time (ms)
50% 16
66% 17
75% 17
80% 18
90% 20
95% 22
98% 28
99% 3018
100% 3767 (longest request)


Upon multiple runs the numbers vary slightly, of course, but they stay within the same ballpark. I think this shows luajit is a very viable platform for server development.

----

Update:

curious cat as I am, I've added a HTTP parser that is made by a yet-unpublished-and-inefficient-and-incomplete patch for ragel that generates Lua state machines (and a ragel code from an earlier post about http parser.


ab -n 100000 -c 100 http://localhost:12345/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:
Server Hostname: localhost
Server Port: 12345

Document Path: /
Document Length: 19 bytes

Concurrency Level: 100
Time taken for tests: 21.980 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Total transferred: 6100000 bytes
HTML transferred: 1900000 bytes
Requests per second: 4549.62 [#/sec] (mean)
Time per request: 21.980 [ms] (mean)
Time per request: 0.220 [ms] (mean, across all concurrent requests)
Transfer rate: 271.02 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 1 22 3.1 21 30
Waiting: 1 22 3.1 21 30
Total: 3 22 3.1 21 30

Percentage of the requests served within a certain time (ms)
50% 21
66% 25
75% 25
80% 25
90% 25
95% 26
98% 27
99% 27
100% 30 (longest request)


This is not stellar, but I did not see yet where the bottlenecks are and if I can speed it up. Either way the 4.5K is still not too bad.

Update2:

with concurrency of 1000, I notice connection resets... After fixing up the mixing SIGPIPE handler of SIG_IGN, they still happen, apparently. Further debugging pending...

Saturday, June 25, 2011

Your IETF-writing will never be the same again

I've been reading up on SPDY today, and I noticed something interesting -
the spec at Mike Belshe's github repo has the extension ".xml".

So what, you say ? Given that it was rendered as the html output of the xml2rfc, I suspected there was XSLT in the game. And indeed, with one more search, I found the origin of this: Julian Reschke's work.

This is definitely very cool - it means you have to store one less format, and the results are immediately visible as you change the xml.

I will definitely give it a shot for the next draft to see if it lives up to my (inflated) expectations.

Friday, June 24, 2011

I came up with a new word today: faceboob

(v) to faceboob (someone): to earn money off the the humans' tendency to be curious, as well as to their natural tendency to trust their friends - specifically when using the social networks. Example use: "John Doe was faceboobed today."

Now, enough words, lets get to colorful pictures!

Prologue



You see this:

active_window_screenshot2011-06-23,23:44:06

Oh, boy! Aha! Free Boobz! Especially since one of your friends (or more!) shared this. Gotta be good!

Let's get another browser. But first lets investigate a bit - the link looks like an URL shortener.

active_window_screenshot2011-06-24,00:08:14

Yeah, it is chinese url shortener FTW! Better yet, if you go to home page it talks about a link being a spam and being deleted. This already rings the alarm bells, but let's see further - let's open up the link in the browser that is logged out of Facebook and see what happens.

active_window_screenshot2011-06-23,23:55:27

Whoa!!! Free boobz are just one click away! They need to verify the age - of course. Gonna be something really juicy. Better yet, the button is also helpfully localized. Thank you my friends! When we click this link, we get a popup to login to facebook.
Cool - so this is how it appeared in my friend feed. I'm the one with the OCD - the others simply tried to verify their age... oops. Anyway, lets close this annoying popup, I am not going to login anyway. Let's pretend that all happened in the background and see what happens further.

active_window_screenshot2011-06-23,23:56:05

Whoa! To verify my age I am going to get a chance to win an iPad ? Unbelievable! I am a lucky man today! iPads and iBoobz are falling from the sky! Let's click! I want a MacBook Air though - because I already have an iPad.

active_window_screenshot2011-06-23,23:47:30

The question asks something like "is it possible to play the videos on the MacBook Pro?" First, what a stupid question. Of course even the kid knows it is possible. Second, you are supposed to ask about the MacBook Air - I don't want a MacBook Pro!
Anyway, let's answer "Ja" - and go forward, towards the free gear and free boobz!
(the eye notices something in the right-top corner, but quickly dismisses it as uninteresting).

active_window_screenshot2011-06-23,23:47:36

Ahhah. Now the top-right thing starts to make sense. 1 euro for SMS sent, 1 euro for SMS received. This is an IQ test - so I enter my phone number, such that they send me an SMS and charge 1 euro. Brilliant, lets stick in some random number that is not a valid phone# and move on. I guess this is the time for the real age verification.

active_window_screenshot2011-06-23,23:53:36

Congratulations! You've been charged 1 euro for receiving an SMS! Now please also help us charge you another euro for sending an SMS! I start to be a bit unhappy about the whole situation - 2 euros for boobz aint really free anymore! I start to suspect the evil plot in this. They want to rip me off - let's go back to the first tab where they had at least the background pic.

An unpleasant surprise awaits me back there, there's no flirt at all on that page anymore. Instead, it coldly notifies:

active_window_screenshot2011-06-23,23:47:07

(Note that the text "Not completed" periodically changes into "checking..." - which makes a great illusion that the code *actually* checks whether you have completed the darn survey or not). The boobz are in the background - go and complete this survey, ASAP!

Well, there is a slight chance of a bug. And the page admits it. It offers to complete another survey if this survey does not bring the keys to the kingdom. Let's go and see about the iPad - two is better than one, after all... This one starts in a rather cheerful manner:

active_window_screenshot2011-06-23,23:44:58

It prompts me to select my favourite color! Yay! I prefer black.

active_window_screenshot2011-06-23,23:45:07

Wow, now is the real quiz: it asks which icon corresponds to the Facetime. That was easy. *click*

active_window_screenshot2011-06-23,23:45:17


Meeeeeh. This smiling lady promises me a charge of another 1 euro, and still no boobz!

I think I am going to give up here - no free boobz today. I'll better go read slashdot.

The practical takeaways from this little exercise, to avoid being faceboobed, remember these three simple rules:

  1. There is no free lunch boobz.

  2. Be bear aware! Always check where the links lead, even if it is your friend posting them.
  3. Use condoms.Always check them in a separate browser that is fully patched and is NOT logged into the site that you got the link from.



Remember, you are the target too.

And, unlike the computers, your antivirus is not auto-updated.

Keep this in mind.

update: oh, I forgot. When closing the browser, I get this popup:

active_window_screenshot2011-06-23,23:54:09

Now, this is called a plain greed, my friends.

Don't leave your mobile numbers to these folks. They are too greedy to rip the money off you.

And if you really really wanna boobz - here they are. (warning: age restricted, may be NSFW for you): not a faceboobed link.

Monday, June 20, 2011

Your account has been compromised. Will you finally stop reusing the passwords ?

Logging into gmail in the morning, I've got a prompt to change my password, which mentioned suspicious activity with my account. When logging in, I saw the possible reason why.

The MtGox bitcoin "exchange", where I created account to see what it is all look like, has been hacked, and the database has been leaked. Bummer. I even found my email and the user id on pastebin.com, alongside with the hash of my password. This can't be good, can it ?

Well, except since some while I started to be more pedantic about having separate passwords for different sites, no exceptions. My online passwords are 40-characters random hex strings, different for each site. This meant a sigh of relief - the data that the miscreants obtained is of minimal impact to other sites that I use.

So, the unique passwords saved my bacon today. If you reuse your passwords on more than one site, you should think a second time what would happen with your accounts in the occurence like this. If you are a windows user, you could get the Password Safe program by Bruce Schneier - it's open source and free. In this day and age, you should avoid reusing the passwords, period.

I would also use this chance to describe how my "separate password" strategy looks like.

There are two components of it:


  1. A program called "sha1sum" - which exists on most of the *nix versions, and essentially just calculates a cryptographic one-time hash function.


  2. A text file with the editor which stores the material for my passwords, in plain text.Each line of the text file contains two strings: the site name and a random string. I choose the random string to be sufficiently long to be difficult to guess / brute force (some 30 characters or so).


When I need to get the password for a website, I open up the file and find the line that is corresponding to that website.

Then I start up the "sha1sum" program, it starts to await for my input. I copypaste the site name and a memorable "master password" - that I do not write down anywhere. This is the thing that allows me to not worry extremely much about the safety of the file with the key material (alone it would not give out an easy target. Even though of course I do not put it out in plain view)

After I entered both strings on one line, I press enter and then Ctrl-D - this signals an end of input for the sha1sum program, and it spits out a 40-character hex number at me.

Great, now it is time to launch the sha1sum again - this time I copy-paste the hex number from the previous run, and then the "random text string" from the file.
After that I press Enter and then Ctrl-D - getting another long hex string as a result.

This is my password that I now copy-paste into the web site that I need to login to.
Here's how it looks like:


ayourtch@ayourtch-lnx:~$ grep gmail p-material
gmail 1243pyupqwe,jl23hl23khjkh23khpw'@
ayourtch@ayourtch-lnx:~$ sha1sum
gmail this is my secret phrase
f3c446b01b24022c136bde50d32a1f9d4e9cd7fb -
ayourtch@ayourtch-lnx:~$ sha1sum
f3c446b01b24022c136bde50d32a1f9d4e9cd7fb 1243pyupqwe,jl23hl23khjkh23khpw'@
8e485f361bfe1834d281209121eb5d4a8b52bcb9 -
ayourtch@ayourtch-lnx:~$


The bold string is the password to copy-paste into the login screen. (NB: of course the data in the example above is not real, it's my mockup just to illustrate the principle ;-)

While this scheme is certainly not pixel-perfect cryptography, it gives certain advantages:

  • Even if the service is dumb enough to store the passwords in plain text - the attacker does not gain much when they hack it. They will just get the password to a particular site - but they would not learn much how did you derive this essentially random string.

  • Even if the attacker gets the file where the strings are written down - they do not gain much, as they would need to know the "master password" in order to create correct passwords for any of the web sites.

  • It is simple, light, and independent from the vendor (browser, etc.) - practically anywhere there is a sha1sum - so I only need a (tiny) file with site names/strings to carry around. The smaller the data to secure, the easier it is to secure it.


Of course, this also has a couple of disadvantages:

  • entering the passwords is a bit annoying. You need to do some manual operation.

  • entering the passwords is a bit annoying. This is the iPad version.



However, I value my peaceful sleep much more than the annoyance of the process.

Do you value your sleep ? If you do - stop reusing the passwords before you lose it.

Thursday, June 16, 2011

Coursemare incorporated promotional video

I've created this one a while ago, but did not move it from xtranormal to YouTube. The happy spectators of my little "NAT vs. IPv6" fun were asking if I did anything else - so here it is. This time our target it the applications that go out of their way in alienating their users.

Friday, June 10, 2011

Indexing wikipedia data dump with ZF

I have decided to try indexing the wikipedia HTML data dump with my zettair fork.

The specs of the machine: 8GB RAM, 8-core Xeon X3440 machine at 2.53Ghz.

multicore nature of the machine did not really matter, since the code is single threaded.

Some info on the dataset:

14257665 documents in total (just above 14 million, that is)

233GiB of data as per "du -kh" output.

The index size is 17Gb.

Indexing this set took about 1.5 days:

real 1935m54.592s
user 138m42.620s
sys 23m49.060s

The search times obviously depend on the word, below are some dummy samples, "cold" is the first time you search, "warm" is the subsequent time.

"wikipedia"

cold: 20 results of 11194872 shown (took 3.886843 seconds)
warm: 20 results of 11194872 shown (took 1.542165 seconds)

"integral":

cold: 20 results of 26981 shown (took 1.168747 seconds)
warm: 20 results of 26981 shown (took 0.065107 seconds)

"schematic":

cold: 20 results of 5056 shown (took 1.040630 seconds)
warm: 20 results of 5056 shown (took 0.022333 seconds)

"surfing":

cold: 20 results of 15340 shown (took 0.840945 seconds)
warm: 20 results of 15340 shown (took 0.039736 seconds)

"asd":
cold: 20 results of 1198 shown (took 0.868518 seconds)
warm: 20 results of 1198 shown (took 0.055091 seconds)

An interesting task probably would be to see how well does the indexing scale (i.e. reindex only partial datasets and graph them). But I figured I'd write up what I have for now.

Wednesday, May 25, 2011

The ads that make your page beautiful

Maybe it's my humble page with links about situation in Japan, or my periodic scouting on japanese-language sites, I got myself a Persistent Sakura House syndrome. What is it ? It's an English-speaking advertisement about the nice rentals in Tokyo that is persistently following me all around the web.

I am seriously starting to check it out.

And closer to understanding how half of the world buys the junk they buy.

Anyway, this is not about Tokyo per se, nor it is about the persistent advertisements that program the minds of unsuspecting crowds.

It's about something else.

When I told my friends at Google this story we laughed, and the consensus was that with my chaotic behavior I made the algorithm believe I am very interested to go to Tokyo. Nonetheless, if I clear the cookies, the phenomenon will probably disappear (it does not show on the Firefox that I mostly do not use and which has different sets of cookies).

But I will never do it myself.

Because the ads I see make the sites look better.



Don't you agree that it is a pretty tasteful banner ?

A banner that does not sleazily try to sell something to me. A banner that (despite my general distaste with pink) is nice and calm to stare at. Maybe because of that, or maybe because of its calm layout, it makes the sites look better, in my opinion.

So, here's a million dollar idea.

Besides monetizing the additional value that the contextual advertising brings to the merchant, create and monetize the value that the website owners would get out of showing only nicer-looking advertisements on their websites.

How ? I don't know. That's why the ideas like this are dime a dozen.

Is this even something valid ? No clue. A serious amount of A/B testing should show.

I just noticed it, because this ad is persistently frequent - yet, it does not get annoying. More so, I've even gone to the site a couple of times and read the reviews - the folks seem to be pretty positive.

I am thinking of visiting Tokyo one day.

Can't say of anything else, but they make great looking ads for sure.

Monday, May 23, 2011

Strange.

I am at St.Pancras, in the Eurostar lounge. Got a special friendly patdown treatment - maybe because I forgot some coins in the back pocket of the jeans. Never mind - I got used to "random checks" in my previous life.

But wait. I'm traveling. Without my habitual travel drink. Rewind the time. I need go to the Starbucks. I want my "Caramel Macchiato".

Strange. They changed the look of the menu, or they had it differently to begin with. No pseudoitalian size denominations. Seems like now you can build your own drinksfrom components - or it is just an avant-garde design. I can see "macchiato" separately, and "caramel" separately.

Too complicated. Just asked the same thing as usual. And, lo and behold, the person on the other side replied with the usual correction - "Venti ?", to which I reply "large" - as usual, in a futile attempt to abandon the adjectives that do not make much sense. So they still have it, and this will be keep me alive till I get home. The world's still going round.

There's a newspaper on the table - the first page tells how some poor orphan kid had hit a jackpot.

He is nineteen and he got a million pounds from the charity that is run by the newspaper. Good for him. If he is clever, he can make good use of this money. I hope the other hundreds or thousands of orphans both in London and elsewhere in the universe will wish him well. Or maybe they would not. The readers who gave money to him would certainly wish him well enough to give the money - but what about the other orphans ? No, they don't exist. They're just part of what we call 'the society', they do not visible out from the background of the crowd.

Try running in the middle of the crowd that is standing still. Or better, try stopping in the middle of the crowd that rushes ahead. You will be run over. Not because of anyone's evilness, just that for the others you are only a part of the crowd.

They all run towards their dreams. Those who have no dreams to run after, rush even faster to not think about that. They skip days, thinking this will get them faster to the destination. They exhaust themselves to blood while running.

Even if they do not know what the destination is, they will still have a shade of suspicion. An exhausted mind is a harbor for the suspicions, and there are enemies hiding in everything that is suspicious.

"Something strange ? Report to the police!" - says the plaque in big font.

By the way, I have something strange to report. There is free WiFi but I could not find a garbage bin to put the empty paper cup into.

Maybe the garbage bins are dangerous (I heard they can put cigarettes into them - and the smoke looks very scary! Be very afraid!).

Maybe they were just forgotten. Tiny details are easy to forget when you're making grand things.

Maybe I am just not seeing them - much like I did not see the cars when crossing the streets - looking wrong when I should be looking right.

I want to ask the policemen, but this can easily be considered a strange behavior - if there are no bins, then it means no one needs them.

I quietly leave the cup on the table and rush with the rest of the crowd to the gate. They finally announced the boarding, and we are all in a hurry to grab our places - the places that no one else can take. Would be strange if someone did - how would they not be afraid to be reported to the police for the strange behavior ?

I look back and see my cup still there, already welcoming newly arrived burger wraps.

Hopefully it's not too strange of an idea to leave an empty cup on the table when I can't find a garbage bin. I say "thank you" in advance to the person who will have to get this cup to the secret location.

Soon I will return and try again. Maybe one day I crack this strange puzzle.

Sunday, April 24, 2011

A new name for you

The world is steadily moving away from exchanging the phone numbers towards exchanging the online handles. For example, getting people friended on facebook.

This is quite annoying for me, because, even if my first name is pretty easy to spell,
the last name is as far from something easy like "Smith" as it can be.

So I inevitably end up writing it. Boring. Now, lately I've used a different trick. telling people to search for the stuff I am involved in, e.g. "Happy Eyeballs" make for a pretty good search to fish my name and contact info from.

Now, I thought why not do it in a bit more dedicated fashion - and just grab a domain that is easy to remember for normal people, and hook it to a key-value store.

Here comes the secondna.me.

go to secondna.me to search on someone's "second name", or create your own - the "second name" can really be anything that is easy to remember and does not require spelling by letters.

There is no login, no password, everything is write-once-read-many. So, if you have made a mistake, too bad.

The value is escaped to translate the < into &lt; - so, no fancy HTML either.

Just good old plain text.

If you think it does look fugly, I agree. But I think the utility trumps the looks.

Good enough for me, for now at least.

And tell me what you think.

(yes, it's like about.me, after an intensive diet course).

Tuesday, April 19, 2011

A toast to differences

Once upon a time, there lived a gardener. And he grew oranges and lemons. He did not eat enough of the lemons himself, so he sold them in the neighboring village - there they grew bananas, but they were too lazy to grow enough of them to feed the entirety of the village. So, every day they bought lemons from him, and picked those poor souls who were to feed with lemons.

Needless to say, the poor souls had hated the nasty fruit - it gave them the pains in their bellies. And soon they started to hate the gardener himself - yet, they continued to buy the lemons from him.

It was a very saddening fact for the gardener - after all, he was honestly growing his best fruits - and the oranges were not bitter at all - sweet and refreshing. So, one day he decided to make a pleasant surprise for his faithful customers that should make them happier - he brought them oranges instead of lemons, but did not tell.

The customers were enraged - "Not only you exploit us by selling these bitter fruits to us, but now you also brought them in rotten color and blown up shape!". They've beaten the poor gardener, and trashed his cargo.

So, let's have a drink to the gentle attention for differences - if you're a gardener, try to sell not just lemons alone; and if you're on the market buying - not everything that is slightly round is bitter.

Monday, April 11, 2011

A story to remember when you are afraid to fail...

Found a great anecdote here, translating:

The stage: Autumn. A lake. Wild geese are warming up to start flying to the south. At the front a well-built goose is stretching his muscles, all enjoying himself and looking forward to fly, when a little gray duck approaches him and starts the dialog:

- So-o-o-o... You're fly-i-i-ng south ?
- Yeah. Gotta fly south. It's warm there.
- So-o-o-o... A-a-and... I will sta-a-ay here... To Fre-e-e-eze...
- Come on, let's fly with us. Southbound.
- So-o-o-o... You've got big wings... I do not... I will fall and die...
- No problem - we'll catch you and keep you up. Air streams and all that.
- So-o-o-o... But I will get hungry on the way... And die-e-e-e-e...
- We're gonna pick up the bugs. Big, fat and tasty bugs.
- So-o-o-o... The bugs are bi-i-i-ig... Your beaks are bi-i-i-ig... Mine is small... I will choke and die...
- No problem, we'll chew them up for you. You're gonna make it.
- So-o-o-o...

(the goose straightens up and looks at the duck):

- Okay. Fuck off.
- So-o-o-o... As usu-u-u-ual... Exactly as I tho-o-o-ought... (while walking away).

Tuesday, April 5, 2011

A functional (almost) prelude on the topic of mergesort

I know of table.sort, yes. But thought it was fun nonetheless.


function apply(x, f)
for i, v in ipairs(x) do
f(v, i)
end
end

function filter(x, f)
local out = {}
local fa = function(v, i)
if f(v, i) then table.insert(out, v) end
end
apply(x, fa)
return out
end

function merge(...)
local args = { ... }
local out = {}
local fa = function(v)
table.insert(out, v)
end
apply(args, function(v) apply(v, fa) end)
return out
end

function sort(x)
if #x < 2 then
return x
else
local ix = math.floor(#x/2)
local smallx = sort(filter(x, function(v) return v<x[ix] end))
local samex = filter(x, function(v) return v == x[ix] end)
local bigx = sort(filter(x, function(v) return v>x[ix] end))
return merge(smallx, samex, bigx)
end
end


function show(x)
apply(x, print)
end

a = { 1,3,2,4,2,1,1,23,12,22 }
show(sort(a))


NB: of course this is *not* a functional style by far, since the apply() assumes the f() will have the side effects.

Tuesday, March 15, 2011

Links about the situation in Japan

---
This page is to keep the links about the events in Japan. The links to the "original" websites are going over CoralCDN - hence the ".nyud.net" appended to the domain name. Read more about the CoralCDN at http://www.coralcdn.org/. Chances are that not all of the websites may bear the load they will get - so use the CoralCDN so the sites do not die. I'll update this post as I find more links.
---

Countermeasures for 2011 Tohoku - Pacific Ocean Earthquake





Some baseline on the radiation:

Smoking: "Based on careful assessments of the concentrations of 210Po in the lung tissues, it was estimated that the "hot spots" received an annual dose of about 160 millisievert (about 16,000 millirem), two of the more common units for expressing doses from ionizing radiation." [Health Physics Society]. Divided by 8760 (24*365), this gives 18.26 microsievert/hour delta atop the background radiation levels. Another source with radiation in cigarettes. So if you are smoking you probably should take a note of this.

Here's another image, a scan from a book, I found it on a blog entry in Japanese about radiation impacts. 1 rad = 10 mSv = 10 mGy, the author writes;

radiation-normal-tissues

And one more reference about radiation effects. And a diagram for comparing the various sources.

So, after setting up this baseline, you can go and look at the data.

Facts:

Geiger counter Chiba
Geiger counter in Tokyo
Video of a geiger counter in Tokyo
One more video of a geiger counter in Tokyo.
Google doc with info of the three counters above and the radiation from http://www.bousai.ne.jp/eng/index.html.
Japan radiation open data (from the maintainer of the above google doc)
Graphical dashboard based on these values
A geiger counter on the translatlantic flight
crowdsourcing data on radiation


Saitama prefecture readings

List of 5.0+ earthquakes for the past 7 days


Articles:

Graphic showing the radiation levels at the power plants vs. the various reference points
http://www.simon-cozens.org/content/radiation-tokyo-how-read-geiger-counter - has good explanation on how to read the counters and what the numbers relate to. This is where I got the first geiger counter link above.
Articles by MIT Department of Nuclear Science and Engineering about the japanese nuclear reactors.
Some Perspective On The Japan Earthquake

TV/Video:

MIT technical briefing recording
http://www.ustream.tv/channel/yokosonews
http://www.ustream.tv/channel/nhk-world-tv
http://www.ustream.tv/channel/nhk-gtv
http://www.ustream.tv/channel/tbstv
http://www.youtube.com/tbsnewsi
http://www.earthcam.com/japan/tokyo/
Fukushima Daiichi Nuclear Power Station camera
One other Fukushima webcam

Twitter:
A person who was translating the TBS
Reuters. Level-headed reporting without the hysteria.
periodic reports on radiation levels

TEPCO press releases:

http://www.tepco.co.jp/en/press/corp-com/release/index-e.html

NASA:

the Japanese earthquake should have caused Earth to rotate a bit faster, shortening the length of the day by about 1.8 microseconds

Japan Atomic Industrial Forum

http://www.jaif.or.jp/english/

INES levels:

http://www.iaea.org/Publications/Factsheets/English/ines.pdf

IAEA briefings:

Briefing videos
IAEA updates page

Networking-related:

http://www.jpnap.net/english/jpnap-tokyo-i/traffic.html
http://gigaom.com/broadband/in-japan-many-under-sea-cables-are-damaged/


Tuesday, March 8, 2011

Autoextraction of Abstracts from RFCs and drafts

An idee fixe (uh, I mean *one more*) of mine is to somehow organize a collection of IETF docs - RFCs/drafts that are somehow touching IPv6 (Thanks to Fred Baker for this nice puzzle).

So, what I have is 140 megabytes of data, sitting in just under 2000 files that represent the RFCs and various drafts.

First step of doing anything at all with this pile is to be able to chop it into some chunks - try to put the congruent parts side by side, move the ASCII pictures aside, and similar mundane tasks.

The first step of doing that is to try to extract the part that is there in almost every IETF doc - the "Abstract" section. In general, the section titles are starting with 0-column indent - while the text of the paragraphs typically has 2+ columns indentation. However, this is a general rule. There are zillions of exceptions over the years. Variations of spelling, wrong indents, MS-DOS carriage returns, all sorts of nasty mess. Anyway, the first try at this has concluded.

I extract the titles out of the pagebreak-placed titles and this is noticeable - in some of them you have the month and the year glued on the right side. This is something that should get fixed eventually, if I figure some heuristic.

Here's a result in case you find it useful at all:

Abstracts from some RFCs and drafts.

Sunday, March 6, 2011

Interesting bits from HTTP/1.1 RFC

(Originally under title "Is your server HTTP/1.1 compliant ?" - but I realised that it's not really a relevant one)

Today after watching this excellent wireshark kung-fu video with Hansang Bae, I decided to comb through the HTTP/1.1 spec and see what other interesting bits I can fish from there that are less frequent/interesting to play with or are othewise noteworthy.
Here they go for your entertainment.

To allow for transition to absoluteURIs in all requests in future
versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI
form in requests, even though HTTP/1.1 clients will only generate
them in requests to proxies.

In layman terms: your compliant server must understand not only the classic "GET / HTTP/1.1", but also "GET http://www.yourhost.com/ HTTP/1.1"). In case the clients upgrade. All but one of the servers that I did a quick test with, haven't seen this part of the spec. Or optimized it out.

An origin server that does not allow resources to differ by the
requested host MAY ignore the Host header field value when
determining the resource identified by an HTTP/1.1 request. (But see
section 19.6.1.1 for other requirements on Host support in HTTP/1.1.)

Immediately after that follows a big blurb how the server is supposed to derive the host name from the absolute URI. I.e. the one that almost no-one seems to support. So the implementations deliberately ignore the spec. Or are not attentive in reading it ?
The in-progress work from HTTPBis workgroup in IETF also specifies the absolute URIs. So, some house cleaning will be in order.

A very interesting bit about the pipelining:

Clients which assume persistent connections and pipeline immediately
after connection establishment SHOULD be prepared to retry their
connection if the first pipelined attempt fails. If a client does
such a retry, it MUST NOT pipeline before it knows the connection is
persistent. Clients MUST also be prepared to resend their requests if
the server closes the connection before sending all of the
corresponding responses.


This brings 'must be prepared to act robust' general statement makes me think of all sorts of interesting failure modes (yes, and indeed I've seen some of those in real life) - however, this 'retry' also brings potential L7 hook to the Happy Eyeballs logic. In some sort, maybe, later. Having a L7 hook would be a good thing - application may have a better idea about failures than layer 3/4. Anyway, I digress.

Another interesting piece:


This means that clients, servers, and proxies MUST be able to recover
from asynchronous close events. Client software SHOULD reopen the
transport connection and retransmit the aborted sequence of requests
without user interaction so long as the request sequence is
idempotent (see section 9.1.2).


This is also a HUGE RED FLAG to the application developers: never EVER use "GET" for anything that is not idempotent. Theoretically the server should not see the two requests, but with some conditions it might (say, proxy inbetween ?) All of this is still true for the specs that are being prepared in HTTPBis WG now.

Here's the well-known humorous piece:


Clients that use persistent connections SHOULD limit the number of
simultaneous connections that they maintain to a given server. A
single-user client SHOULD NOT maintain more than 2 connections with
any server or proxy.


Year right. Web2.0 apps do exactly that. Not.


The Max-Forwards request-header field MAY be used to target a
specific proxy in the request chain. When a proxy receives an OPTIONS
request on an absoluteURI for which request forwarding is permitted,
the proxy MUST check for a Max-Forwards field. If the Max-Forwards
field-value is zero ("0"), the proxy MUST NOT forward the message;
instead, the proxy SHOULD respond with its own communication options.


Is there already a HTTP-level "traceroute" to poke at caches on the way ?

Kind of obvious, but interesting clarification nonetheless:


The fundamental difference between the POST and PUT requests is
reflected in the different meaning of the Request-URI. The URI in a
POST request identifies the resource that will handle the enclosed
entity. That resource might be a data-accepting process, a gateway to
some other protocol, or a separate entity that accepts annotations.
In contrast, the URI in a PUT request identifies the entity enclosed
with the request -- the user agent knows what URI is intended and the
server MUST NOT attempt to apply the request to some other resource.


The "correct" code for post-POST redirections should be 303, not the 302, but 302 was for "older clients":


Note: Many pre-HTTP/1.1 user agents do not understand the 303
status. When interoperability with such clients is a concern, the
302 status code may be used instead, since most user agents react
to a 302 response as described here for 303.


A fun fact while I tested this, is that both Firefox and Chromium send exactly 21 request before giving up and saying "it's a redirect loop". Even if the target URIs in the "Location:" header in the reply are different. Buyer, beware. It's more than "old 5" that the spec warns about - but there's no ultra-clever heuristics to get the redirect loop, either.

This stops on section 12, I'll maybe go through the rest tomorrow and see if I can gather some other interesting pieces.

Saturday, March 5, 2011

Hardware hacking: Peltier TEG experiment - parts list.

During the discussion over a beer at Betagroup yesterday, the topic of human-power for the devices came up. Today I stumbled upon this one and could not help but grab a couple of the "energy harvesters". They supposedly can generate up to 2v at the temperature delta of 75C. The 75C difference between the 36C of my body (hot side) and the "cold side" of -39C would mean it's a pretty useful device for a winter in Siberia.

So, let's try a different approach with a LTC3108EGN - the step-up converter.
The schematic suggests to use the CQ200 from Honeywell. But with such a form factor I am not sure I would be much interested to experiment with it :-)

Also let's get a coil that is needed. The capacitors I've in my pile already, should be no problem.

You wonder what this will be used for ? Well, of course my beloved ATTiny45, which is supposed to have a pretty low power consumption - the "V" models boast 300 microamps at 1.8V when running at 1mhz. This should be perfect, if I estimated correctly the amount of electricity the TEG will generate at low delta.

All right, now just wait till all the parts get here and then a bit of soldering and we see if this idea works.

Friday, February 25, 2011

ActionController::InvalidAuthenticityToken

There are two types of answers on the web: wrong and correct.

Copying the correct answer here:

you need to add <%= token_tag %> into your form to make this.

Do not comment out the code that throws out the error - it exists for a very good reason: Cross-Site Request Forgery protection.

Which choice ?

In one of the books I've been reading they were talking about choices and I think I came up with a mental experiment that has some interesting complexity in it.

Imagine, you meet a wealthy man who wants to donate some of his wealth to help the poor children of Humbrian Republic. He has $1000000 he wants to donate. There are 10000 children that this money could feed for a year. Without these money, the chances that they die of hunger are tripled. The same chances as without the money are awaiting them after the year - even if the money were to be donated.

However, this man is of a rather evil kind, and he gives you only four choices:

1. You and him part ways, without any decisions made - and he does not donate anything whatsoever.

2. You have to keep the $10000, and the remaining money goes to the poor children of Humbrian Republic - so only 9/10th of the children get the food. As a condition for spending this money you have to keep mentioning that the money you're spending might have been better donated to feed 1000 children - each time you spend something from it.

3. You get to keep $500000 - so only half of the children get the food for a year. You do not get any strings attached to this split - though only 1/2 of them get the food.

4. You get to keep $900000 - so, 1/10th of the money gets donated and you get to keep 9/10th. As a condition you have to talk on the phone to 10 children randomly from the entire set - whether they got or not the money. All of the children will know that you have kept 9/10th of the money.

Which of the four would be your choice. No need to answer, as I guess this would be a pretty confidential matter. (And please do not blame me - as I said it is a deliberate purely theoretical construct ;-)

EDIT: when I mention you get to "keep" the money it means you can not, by contractual agreement, give it away - you have to spend it on your own needs.

Saturday, February 19, 2011

More fun with luajit and ffi: >300Kpps of UDP traffic

Today I added the UDP interface (rudimentary so far) to my experimental event loop.

In the github repo there is the code that simply echoes the UDP packet back.

For fun, I decided to wrap the sendto call into the cycle that repeats it 100000 times.

Here is the output:


23:26:52.988762 IP6 ::1.12345 > ::1.59283: UDP, length 4
23:26:52.988765 IP6 ::1.12345 > ::1.59283: UDP, length 4
23:26:52.988767 IP6 ::1.12345 > ::1.59283: UDP, length 4
23:26:52.988770 IP6 ::1.12345 > ::1.59283: UDP, length 4

This means that the luajit2+FFI can generate between 300Kpps and 500Kpps in UDP. Yay.

Friday, February 18, 2011

Fun with LuaJIT and FFI library - httpd microbenchmarking

When I saw the new FFI library in the LuaJIT project, I immediately thought "yummy". I thought I'd give it a shot during the evening, but.... curiosity had killed the cat, so...

First thing that got annoying quite quickly was the need to prefix every C function call with ffi.C. That's annoying. Can we do better ? Yes, we can. We add this:

-- try to use FFI when no lua symbol found...
setmetatable(_G, { __index = ffi.C } )


And the life immediately becomes great - you get the power of the declared functions plus the flexibility of the lua language, so you can write stuff like "local s = socket(AF_INET6, SOCK_STREAM, 0)". Tres cool.

Now, let's see how much this all is worth performance wise.

Let's write a funny high-level wrapper, that at its "application" core will look like this:

local HTTP_REPLY = [[HTTP/1.0 200 OK
Content-Type: text/plain

This is test
]]

local MAX_FD = 2560

local ss = socket_set(MAX_FD)

local my_accept_cb = function(fds, i)
local cb = {}
cb.read = function(fds, i, data, len)
fds.send(i, HTTP_REPLY, #HTTP_REPLY)
fds.close(i)
end
cb.close = function(fds, i)
-- print("Closed socket")
end
return cb
end

while not ss.add_listener(12345, my_accept_cb) do
sleep(1)
end
print("Added listener, please run the test")
while true do
local n = ss.poll(1000)
end


What do we do here ? We create a high-level abstraction which I called a "socket set" - that encapsulates all the boring event loop that usually exists in programs; and then add a listener socket on port 12345 to this set, which has a callback that is called upon accept and can return either the table with callbacks for the newly accepted socket - or nil - then the connection will be closed immediately.

The accepted socket callbacks are such that they implement a very naive "HTTP server" - just for microbenchmarking.

Now, let's give it a whirl with ab.

The resuls are quite entertaining, here is how they look like:

# while (true); do ab -n 1000 -c 1000 http://localhost:12345/ 2>&1 | grep "Requests per"; done
Requests per second: 8433.91 [#/sec] (mean)
Requests per second: 12374.55 [#/sec] (mean)
Requests per second: 12720.06 [#/sec] (mean)
Requests per second: 12688.58 [#/sec] (mean)
Requests per second: 13144.58 [#/sec] (mean)
Requests per second: 10742.29 [#/sec] (mean)
Requests per second: 8651.27 [#/sec] (mean)
Requests per second: 12748.76 [#/sec] (mean)
Requests per second: 8145.58 [#/sec] (mean)
Requests per second: 12946.32 [#/sec] (mean)
Requests per second: 8648.65 [#/sec] (mean)
Requests per second: 13047.85 [#/sec] (mean)
Requests per second: 11550.14 [#/sec] (mean)
Requests per second: 12904.22 [#/sec] (mean)
Requests per second: 12968.32 [#/sec] (mean)
Requests per second: 13219.47 [#/sec] (mean)
Requests per second: 8244.09 [#/sec] (mean)
Requests per second: 12056.47 [#/sec] (mean)
Requests per second: 12834.83 [#/sec] (mean)
Requests per second: 13288.86 [#/sec] (mean)
Requests per second: 11455.02 [#/sec] (mean)
Requests per second: 11130.03 [#/sec] (mean)
Requests per second: 8034.32 [#/sec] (mean)
Requests per second: 12566.76 [#/sec] (mean)


This is not too bad at all.

Here is the similar test with lighttpd running on the same machine, serving the default static page:


Requests per second: 12080.50 [#/sec] (mean)
Requests per second: 9397.88 [#/sec] (mean)
Requests per second: 9948.47 [#/sec] (mean)
Requests per second: 12906.39 [#/sec] (mean)
Requests per second: 9284.53 [#/sec] (mean)
Requests per second: 4281.34 [#/sec] (mean)
Requests per second: 9143.44 [#/sec] (mean)
Requests per second: 12422.21 [#/sec] (mean)
Requests per second: 9170.19 [#/sec] (mean)
Requests per second: 12603.03 [#/sec] (mean)
Requests per second: 9413.54 [#/sec] (mean)
Requests per second: 12981.62 [#/sec] (mean)
Requests per second: 8615.56 [#/sec] (mean)
Requests per second: 9849.98 [#/sec] (mean)
Requests per second: 9869.43 [#/sec] (mean)
Requests per second: 9818.84 [#/sec] (mean)
Requests per second: 3384.56 [#/sec] (mean)
Requests per second: 9127.59 [#/sec] (mean)
Requests per second: 1528.56 [#/sec] (mean)
Requests per second: 9157.01 [#/sec] (mean)


Not bad at all for a high-level code, what do you think ?

If you want to toy with it yourself - it's on github.

Thursday, February 17, 2011

Fun links: 17 Feb 2011


  • Android virtualized - really cool stuff, with a whole lot of potential. Imagine being able to save your mobile somewhere else than on mobile. Yummy business opportunities, I think.
  • Making presentations in TeX - I made the first step towards using TeX - installed it. The results so far are not terribly cool, but it's only the hands to blame.
  • A fractional horsepower news network - I think this is the beginnings of the pendulum going back from "mainframe", centralized model to a distributed it. Of course, while the history repeats itself, it does so in a whimsical way, so how exactly it will look like, is a question. Maybe in a form of a Freedom Box. I've got some of that.

Wednesday, February 16, 2011

Me talking...

In the past couple of weeks I got "on stage" few times, which was quite a lot of fun. Even more fun was that these moments were shot on video as well. So, here we go, a link collection: here's me talking at Cisco Live 2011 in London about Advanced Firewalls - and the weekend after, my FOSDEM talk about Lighting up IPv6 in Mongrel2.

Elevators: the curiosity

This blog post is intended to serve as a reminder for myself as well as a teaser for everyone else.

First a bit of history of the trigger. The old elevators in the buildings in Belgium are pretty peculiar - they have no doors. Yes, if you are surprised, I was too - when I first arrived - back in Russia (and in Soviet Russia) they had the doors - either the sliding ones or closing ones. So this optimization was weird and a bit frightening.

However, over time I got used to this, and even found it kind of cool to watch the wall as in the evenings the elevator was pulling me up to the 14th floor where I live, and down to the ground in the mornings.

All is good - but the regulators came. Apparently these door-less elevators were considered unsafe by someone. I can imagine either someone got squeezed in some unpleasant way, and hence the reaction. Net result: a seemingly IR-laser based emergency stop mechanism (if you reach your hand towards the wall as the elevator is moving and cross the surface, it stops) - which is pretty cool; and the decrease in speed of the elevator.

It's the latter which is utterly uncool and is a trigger for this - it, at least perceptually has noticeably increased my waiting time in enough number of the mornings to start wondering "why".

And I started to ponder - what is the best mathematical model for expressing my annoyance with this situation in numbers ? And it seems it is a curious one, much more involved than I'd expect at the first glance.

Some reading links on the topic:

So, in short it seems to be a pretty fun modeling topic - even if we do not get into the mechanical problems and keep ourselves busy only with traffic handling problems.

The analytical question, which, after this pre-investigation I am afraid is not so trivial is the function T(v,p) - where the function value is the upper bound on my waiting time when needing to go down from my 14th floor in the morning, "v" is the elevator velocity and the "p" is the probability interval. i.e. T(1,0.95) == 40 would mean that with the 95% probability I would have to wait less than 40 seconds, assuming the speed of the elevator of 1 m/s.

Saturday, February 12, 2011

Google app engine + facebook apps links

Started looking at Google apps engine + facebook combo. Some links that look useful to keep around for myself:

GAE

FB+GAE

Misc


Update: Looks like all of these are obsolete, the way to go is the Python graph api SDK...

Fun links: 12 Feb 2011

Going to start the "links" series, even if for myself only. Too much to post on facebook, I don't want to spam there. And easier to find here if it is indexed...

Monday, January 24, 2011

First practical joke with gncci - clamping the MSS

Today I thought that a great hack might be to try out per-host MSS clamping in userland.

Sounds like a fun idea, but ideas are not much without the execution - so this diff adds the hook for the setsockopt() - and, as a side effect the o.setsockopt() into the Lua-land. (The latter was actually what I wanted, but it was silly not to add the whole hook).

The result - it does wonders with PMTUD-blackholed websites, the world becomes wonderful again. All of this even without needing root to tweak the MTU on the physical interface.

Look, ma, no hands!

Sunday, January 23, 2011

gncci - general network connections contortion interface

Ok, the name is a bit wacky, because I wanted a bit of wordplay - the idea came up during the happy eyeballs discussion within the v6ops maillist, and I coded it up yesterday in one marathon run.

But I think the result is fun. You get an easy-to-tweak middle layer between the application making the socket calls and the socket calls proper. And it's not the C that you have to sweat in - it's all Lua baby.

So, by loading this with LD_PRELOAD you can code and a little bit of scripting you can do a lot of interesting things - check what connections the application is making, "sniff" the content that the app is sending onto the socket and receiving from the network, even deny some of the connection attempts or mangle the DNS packets so the application connects to the hosts that you have defined.

The possibilities are endless.

Bugs: it seems that the constructor code made it shaky. In retrospect, probably it should be protected with spinlock (similar to the existing code, which first was a spinlock, but then I discovered the multithreaded apps like firefox is might get into a deadlock - so I had to do thread-local variables).

Anyway, it works at least for some values of "works". Have fun.

Saturday, January 22, 2011

Export Mercurial repository into git repository

Wanna export mercurial or SVN repository to git ?

fast-export helps with that.

Thanks to Akhil for the tip.

The most interesting commands are: (to be done in the empty git repository that will be the target):


/path/to/hg-fast-export.sh -r /path/to/hg_repo
git-repack -a -d -f

Friday, January 21, 2011

Sending the file descriptors between processes

What seems like very nice library: http://www.normalesup.org/~george/comp/libancillary/

This hides the intricacies of passing the file descriptors between the processes.

If it is portable enough, that'll be nice.

Wednesday, January 19, 2011

On ssl caching

tl;dr note for myself:

SSL-protected docs can be cached if you send the header "Cache-Control: public, max-age=31536000".

An elegant solution to avoid the hassle of explicitly specifying http/https references, assuming it does work (RFC says it should).

Tuesday, January 18, 2011

Politics as a catalyst of wealth

The world of politics, at least as far as I comprehend it - is very complicated. The politics, is, well, politics.

Must be especially tough close to the elections - I'd imagine this is the most impacting period. The tooth that was pulled out a year ago does not hurt - but the one that was pulled out yesterday hurts all right. It must be really tough to operate in such an environment. And then those lobbyist folks with their ideas...

So here's a thought experiment that I think might help the society get easier life for the politicians and for the society of the whole.

Abandon the elections.

Instead, mandate the direct democracy by law. Yes, by direct democracy, I mean the entities like Demoex in Sweden.
But, make such a technology country-wide and mandated.

"But laymen people are not great at taking the decisions" - you will say - "and, by the way, who has the time for this ?".

Precisely, this is where the catch is. Unless the things go utterly haywire, no-one *really* wants to deal with the mundane details. But, alas - you have to - it's by law.

But, the law also would not prohibit to hire someone to do this job for you. So we will have someone who will be paid to read the to-be-ratified papers and make the decisions on behalf of the others.

How's that called ? You're right - politician.

What's the difference compared to now ? The difference is the money flow. Right now it's faceless - the electorate pays the taxes, then they go to say "this guy will represent me for X years". And then the taxes pay the rent to that guy. As a result we have a fairly non-responsive system.

In the proposed equation - you can hire the guy per-month, per-week, per-hour if you wish.

The responsiveness to the customer needs would immediately be reflected in the rewards. The customers being those who do not have to spend the time on the elections - but simply outsource their responsibility to democratically decide. And the %% of the taxes that was going into the common flow of running expenses of the government, can be dramatically reduced.

Hey, but all of that would be a very bad idea! No-one would then take any unpopular decisions, because they are risky and unpopular!

The reality is:

  1. In business this works, with a suitable business plan.
  2. Else, noone is taking any unpopular decisions anyway - most of the time.


So it is not much worse than the status quo.

But, hey, what happens with all the lobbying that is supposedly [sic] happens ?

Simple. Convince the masses that Idea X is a good idea. Make a decent school in the neighborhood - and those who go there will become your live advertisement - much better than on TV. And they can then spend more time watching the actual film, anyway.

This all seems to be too simple to not have tried before and too simple to not contain any catch.

If you know some good historic examples on where this has been tried, I'd be curious to know the occasion and the outcome.

Compiling Lua on the fly with tcc.

tcc is quite fast.

This is in the "src" directory of Lua 5.1.4 (I added a small file with the definitions for the missing string functions).


$ time tcc *[^ca].c lgc.c lfunc.c -run lua.c -e 'print "hello!"'
hello!

real 0m0.165s
user 0m0.140s
sys 0m0.030s



Fun, even if fairly useless.

The performance of the lua interpreter that I had compiled statically, was about 1/2 of that compiled with gcc, so it is not a racer by all means - if you need speed, you will look at luajit.

Monday, January 17, 2011

Exploring the dimensions of lisp


On an orthogonal axis, a couple of interesting exhibits:

  • LuaLisp - lisp implemented entirely in Lua.
  • LispmFPGA - a lisp machine in FPGA project.

If I find something else worth adding - I'll update this post.

Windows 7 in kvm: some observations and tips

I spent the better part of this weekend reversing the status quo from running Ubuntu in VirtualBox on Windows 7 to running Windows 7 in kvm on top of Ubuntu. The result is catastrophic success, and makes me quite happy. This post is to share some little hacks/experiences that I've accumulated in the process.


  1. giving the "-usbdevice tablet" to kvm will avoid you from needing to click into the kvm window.

  2. using VNC to access the Windows VM is neat. This is achieved by adding "-vnc 127.0.0.1:1 -daemonize" to the kvm command line. Handy when you have windows running long updates. You can simply close the VNC and let it chug along.

  3. typing "sendkey ctrl-alt-delete" into the monitor console (you get to it by pressing ctrl-alt-2 and back to gui by pressing ctrl-alt-1) [and, FWIW, using "F8" vnc popup, too] is a bit cumbersome. Therefore, since I'm going to use it alone, I'll redirect the console by "-monitor tcp:127.0.0.1:31337,server,nowait". Then I can create a shell script "send-ctrl-alt-del" that will contain the simple netcat call: "echo 'sendkey ctrl-alt-delete' | nc localhost 31337" - and then bind this to some key combo in fluxbox.

  4. do not be afraid to undershoot with disk space for win7. You can create a sparse 1gb file by doing "dd if=/dev/zero of=1gb.img bs=1024 count=1 seek=1048576" and then by appending this file to the image "cat 1gb.img >>win7.img". After that in the disk administration you can simply grow the volume by 1Gb.

  5. for accessing the files on host, if you do not have anything SMB-talking, you can get away with using the SSHFS. http://dokan-dev.net/en/ is precisely that. It allows userland filesystems as windows, and includes the SSHFS filesystem. It's a bit rough on edges, but works. Assuming you are using the kvm built-in networking, you will always connect to 10.0.2.2.

    If this is your personal client machine, no need to keep ssh open to the world - so edit the /etc/ssh/sshd_config and put "ListenAddress 127.0.0.1" there. Do not forget to restart sshd afterwards.

    When you boot up your Windows you can connect the drives back to the host. Beware to turn off the cache - else the changes would not be reflected immediately and such a setup becomes tough to work with.


Hopefully this will be useful to you.

Sunday, January 16, 2011

How to disable touchpad on thinkpad laptops in X11

Thanks to Nico Schottelius for this hint.

In short, I needed to do three things:


  • xinput list - to find which ID does the touchpad have (11 for me)
  • xinput list-props 11 - to find which ID does the "enabled" setting have (125)
  • xinput set-prop 11 125 0 - to disable the annoying peripherial.

Saturday, January 15, 2011

How to quit smoking ?

I long wanted to share my experiences with getting rid of the tobacco addiction, and I've noticed one of my friends abandoning the habit of smoking, so this is a good trigger to do so.

So - how to quit smoking ? The first trick - don't. No, I do not mean continue smoking. Do not put this hard and fast barrier, this non-negotiable resolution - "never, never again". Phrase it differently for yourself - "I will stop smoking. Just for some time. I can always get back to it if I want."

Why do I say so ? Because smoking is an activity with an immediate short-term positive feedback, and big long-term negative feedback. But since human intuition is pretty bad at assessing the long-term events, the short-term positive has a much more dramatic impact. So, you put for yourself this prohibitive barrier. More over, it's not one but just two hardships that you create for yourself:


  1. The difficulty of having to survive the lack of the short-term positive feel of smoking
  2. The fear of guilt that you will experience when you violate your "never again" promise to yourself. This guilt will make your future 'quitting' attempts harder in itself - because you will be afraid of this negative feeling that comes when you would not be able to make it.


The second one has a huge ripple effect in my opinion - not only you are feeling the guilt in yourself violating your own promise, you also lower your own self-esteem. Subsequently you try to create an escape out of this labyrinth by either saying that it was not really 'for real' or that you have the unique condition which subjects you to strong physical addiction. Well, save yourself from this nonsense. Therefore:

Don't quit - but stop.

"quit" implies a one-time action (and 'never again') - while "stop" does not carry such a pathetic charge. It's simply a declaration of fact that you are transitioning from the smoking state to a non-smoking state. It leaves the freedom of the decision to restart smoking later, if you wish, up to you - without losing your dignity with yourself.

Isn't it good ?

Now, when we sorted out the self-guilt part, we need to figure out what to do with that urge. It's damn hard to avoid going for this awesome morning cigarette after you grabbed your first cup of coffee. Isn't it ? Do you feel this itchy feeling, that pushes you to run and get yourself a pack to get a puff ?

Well, that's my second lesson that I learned. Never try to stop smoking without significantly changing the environment/lifestyle - if you do, you are unnecessarily increasing the difficulty of your task. You don't do that with other activities - so why would you do it when it is about such an important thing as your future health ?

Also - when you change the environment and your schedule - ensure there is no "dead spot" that would make you think to fill it with smoking.

These two learnings make business trips and vacations an ideal place to stop smoking:
your environment is usually unfamiliar, so your mind is concentrated on reacting to the unknown factors rather than following the routine. As well - during these periods your schedule is usually busier than normal, which gives ever more distractions from smoking.

Another important thing that a change in the environment brings - it takes away the "social routine" aspect of smoking. By and large, the social part of smoking is probably the biggest drag of it - and by eliminating it, at least temporarily, you can make your life easier. (Later, when you restart the contacts with your smoker friends, you can simply mention to them that you were not smoking for a few weeks. This will gain you a whole lot of respect and awe from them :-)


The first key period is roughly about a week - that's when the real "physical" effects of nicotine wear out. (NB: This is totally unscientific, and is just my observation on myself). During this period you still may feel some "physical" urge to smoke.


You thought it'd get easier after that ? Well, the physical part is gone. But now we've got a much tougher nut to crack - it's in your head. It lasts about a month starting from your "stop point". At this point, while you are managing to abstain from smoking, the balance is very fragile - and a lot of factors can tip it.

The result of such a tipping will be an avalanche-like wave of desire to get a cigarette - which feels almost physical! Don't believe it - it is not. It's just your brain getting stuck. Whenever you get into such a situation, whatever happens, try not to think about the large green monkey with a banana. :-) Or, more seriously - just try to dissect what seems to be a physical need for a smoke into your reasoning on *why* it drags you there and what triggered it.

Some triggers that will probably cause you do it: a smoking scene in a movie. An argument with other people that will make you stressed and you will want to relieve the stress the same way as you'd do before - with a smoke. A time when you have to wait for something just long enough to have a cigarette. (programmers: "make all", anyone ?)

Fighting these urges is probably the toughest part in the whole exercise. I don't have a good recipe for it. Keep strong and remember that you can distract yourself by other activities. Like, watching this video about marshmallows:



So here we go, few weeks have successfully passed and you're so proud of yourself - you didn't smoke a single cig since you started. Time to pat yourself on the back! Also - note that now you are feeling yourself much better. You are less tired when you wake up in the morning. You have more energy throughout the day.

Note this good feeling well. Try to remember it very hard - and try to compare it with your feelings when you were regularly smoking. This is one of the ways you are building up your barrier to fall into puffing again - strengthening the current state of non-smoking.

But there it comes - somewhere along, friends get you to the party. Modest amounts of alcohol, and then there will always be a smoker at the party. You feel a desire that is irresistible. You think I'd say "be strong and resist the temptation?" Wrong. Go for it - full speed. Get a good doze of alcohol and feel free to smoke as much as you'd do before stopping. Why ? That's because in the morning you're gonna get a terrible hangover! The body, adapted to the absence of nicotine, will have to fight two poisons at once - alcohol and nicotine. A splitting headache guaranteed. So, if you are wise enough, reading this will be enough for you to not smoke that cigarette at the party. If you wanna try it out - go ahead. Just remember not to go overboard - I am not a doctor and I am not responsible if your body decides to go fubar on you.

Sometimes you may feel the circumstances warrant a cigarette - a good talk, what not.
Again - if you are sufficiently long into the process (say, 3+ weeks) - there is nothing to be afraid - you can use this to your advantage. Light up a cigarette, smoke it, and notice for yourself it feels like you put some shit in the mouth. So what was the point of putting it ?

If you repeat these exploratory actions not too close to each other (so as not to get the tastebuds numbed down by a smoke) - you can build a pretty good psychological link of cause-effect: "I puff a cig => it tastes like shit". It will be as good as once you might just ask yourself "so what's the point? I'll just stand nearby if all I wanna do is talk".

Eventually, you realise you have much stronger incentive psychologically to not smoke - and you build up enough of a set of short-term and near-short-term negatives to prevent your brain from tricking itself that it wants a smoke: headache, tiredness, bad taste while smoking, headache in the morning - these are all immediate enough to be quantifiably annoying (as opposed to some very distant death from some weird disease. Which will never happen anyway. 'Cos it's pretty hard for an individual to assess their own death. We overcome the fear of death back in the childhood and never re-learn to feel it again).

Why I say this can work ? I don't have a peer-reviewed scientific evidence to it. It's only based on my own experience. I managed to stop smoking once for half a year, when I was on a 1-month business trip, a few years back. Since then I was looking for a good occasion to repeat this - since I liked the feeling.

Last year starting March I completely stopped smoking - till September. In September I got a block of cigarettes as a present, which laid there for a few weeks, but then the physical presence of the cigarettes and a couple of "favorable" moments did make me pick up some of the smoking habit again. Which I promptly regretted - because I got used to that extra energy that I previously had. Mid-december I stopped again and hardly a couple of cigarettes since then - but I don't feel the "urge". Feels great not to have this dependency. And I do not have the guilt for temporary "restarting" fallback. I never promised to "never ever" in the first place - so I did not violate any promises to myself.

It's up to you to decide how much I practice what I preached here, whether to believe it or not, and whether it is better to stop smoking than to quit smoking. I'd be delighted to hear your results in the comments.