Saturday, June 25, 2011

Your IETF-writing will never be the same again

I've been reading up on SPDY today, and I noticed something interesting -
the spec at Mike Belshe's github repo has the extension ".xml".

So what, you say ? Given that it was rendered as the html output of the xml2rfc, I suspected there was XSLT in the game. And indeed, with one more search, I found the origin of this: Julian Reschke's work.

This is definitely very cool - it means you have to store one less format, and the results are immediately visible as you change the xml.

I will definitely give it a shot for the next draft to see if it lives up to my (inflated) expectations.

Friday, June 24, 2011

I came up with a new word today: faceboob

(v) to faceboob (someone): to earn money off the the humans' tendency to be curious, as well as to their natural tendency to trust their friends - specifically when using the social networks. Example use: "John Doe was faceboobed today."

Now, enough words, lets get to colorful pictures!


You see this:


Oh, boy! Aha! Free Boobz! Especially since one of your friends (or more!) shared this. Gotta be good!

Let's get another browser. But first lets investigate a bit - the link looks like an URL shortener.


Yeah, it is chinese url shortener FTW! Better yet, if you go to home page it talks about a link being a spam and being deleted. This already rings the alarm bells, but let's see further - let's open up the link in the browser that is logged out of Facebook and see what happens.


Whoa!!! Free boobz are just one click away! They need to verify the age - of course. Gonna be something really juicy. Better yet, the button is also helpfully localized. Thank you my friends! When we click this link, we get a popup to login to facebook.
Cool - so this is how it appeared in my friend feed. I'm the one with the OCD - the others simply tried to verify their age... oops. Anyway, lets close this annoying popup, I am not going to login anyway. Let's pretend that all happened in the background and see what happens further.


Whoa! To verify my age I am going to get a chance to win an iPad ? Unbelievable! I am a lucky man today! iPads and iBoobz are falling from the sky! Let's click! I want a MacBook Air though - because I already have an iPad.


The question asks something like "is it possible to play the videos on the MacBook Pro?" First, what a stupid question. Of course even the kid knows it is possible. Second, you are supposed to ask about the MacBook Air - I don't want a MacBook Pro!
Anyway, let's answer "Ja" - and go forward, towards the free gear and free boobz!
(the eye notices something in the right-top corner, but quickly dismisses it as uninteresting).


Ahhah. Now the top-right thing starts to make sense. 1 euro for SMS sent, 1 euro for SMS received. This is an IQ test - so I enter my phone number, such that they send me an SMS and charge 1 euro. Brilliant, lets stick in some random number that is not a valid phone# and move on. I guess this is the time for the real age verification.


Congratulations! You've been charged 1 euro for receiving an SMS! Now please also help us charge you another euro for sending an SMS! I start to be a bit unhappy about the whole situation - 2 euros for boobz aint really free anymore! I start to suspect the evil plot in this. They want to rip me off - let's go back to the first tab where they had at least the background pic.

An unpleasant surprise awaits me back there, there's no flirt at all on that page anymore. Instead, it coldly notifies:


(Note that the text "Not completed" periodically changes into "checking..." - which makes a great illusion that the code *actually* checks whether you have completed the darn survey or not). The boobz are in the background - go and complete this survey, ASAP!

Well, there is a slight chance of a bug. And the page admits it. It offers to complete another survey if this survey does not bring the keys to the kingdom. Let's go and see about the iPad - two is better than one, after all... This one starts in a rather cheerful manner:


It prompts me to select my favourite color! Yay! I prefer black.


Wow, now is the real quiz: it asks which icon corresponds to the Facetime. That was easy. *click*


Meeeeeh. This smiling lady promises me a charge of another 1 euro, and still no boobz!

I think I am going to give up here - no free boobz today. I'll better go read slashdot.

The practical takeaways from this little exercise, to avoid being faceboobed, remember these three simple rules:

  1. There is no free lunch boobz.

  2. Be bear aware! Always check where the links lead, even if it is your friend posting them.
  3. Use condoms.Always check them in a separate browser that is fully patched and is NOT logged into the site that you got the link from.

Remember, you are the target too.

And, unlike the computers, your antivirus is not auto-updated.

Keep this in mind.

update: oh, I forgot. When closing the browser, I get this popup:


Now, this is called a plain greed, my friends.

Don't leave your mobile numbers to these folks. They are too greedy to rip the money off you.

And if you really really wanna boobz - here they are. (warning: age restricted, may be NSFW for you): not a faceboobed link.

Monday, June 20, 2011

Your account has been compromised. Will you finally stop reusing the passwords ?

Logging into gmail in the morning, I've got a prompt to change my password, which mentioned suspicious activity with my account. When logging in, I saw the possible reason why.

The MtGox bitcoin "exchange", where I created account to see what it is all look like, has been hacked, and the database has been leaked. Bummer. I even found my email and the user id on, alongside with the hash of my password. This can't be good, can it ?

Well, except since some while I started to be more pedantic about having separate passwords for different sites, no exceptions. My online passwords are 40-characters random hex strings, different for each site. This meant a sigh of relief - the data that the miscreants obtained is of minimal impact to other sites that I use.

So, the unique passwords saved my bacon today. If you reuse your passwords on more than one site, you should think a second time what would happen with your accounts in the occurence like this. If you are a windows user, you could get the Password Safe program by Bruce Schneier - it's open source and free. In this day and age, you should avoid reusing the passwords, period.

I would also use this chance to describe how my "separate password" strategy looks like.

There are two components of it:

  1. A program called "sha1sum" - which exists on most of the *nix versions, and essentially just calculates a cryptographic one-time hash function.

  2. A text file with the editor which stores the material for my passwords, in plain text.Each line of the text file contains two strings: the site name and a random string. I choose the random string to be sufficiently long to be difficult to guess / brute force (some 30 characters or so).

When I need to get the password for a website, I open up the file and find the line that is corresponding to that website.

Then I start up the "sha1sum" program, it starts to await for my input. I copypaste the site name and a memorable "master password" - that I do not write down anywhere. This is the thing that allows me to not worry extremely much about the safety of the file with the key material (alone it would not give out an easy target. Even though of course I do not put it out in plain view)

After I entered both strings on one line, I press enter and then Ctrl-D - this signals an end of input for the sha1sum program, and it spits out a 40-character hex number at me.

Great, now it is time to launch the sha1sum again - this time I copy-paste the hex number from the previous run, and then the "random text string" from the file.
After that I press Enter and then Ctrl-D - getting another long hex string as a result.

This is my password that I now copy-paste into the web site that I need to login to.
Here's how it looks like:

ayourtch@ayourtch-lnx:~$ grep gmail p-material
gmail 1243pyupqwe,jl23hl23khjkh23khpw'@
ayourtch@ayourtch-lnx:~$ sha1sum
gmail this is my secret phrase
f3c446b01b24022c136bde50d32a1f9d4e9cd7fb -
ayourtch@ayourtch-lnx:~$ sha1sum
f3c446b01b24022c136bde50d32a1f9d4e9cd7fb 1243pyupqwe,jl23hl23khjkh23khpw'@
8e485f361bfe1834d281209121eb5d4a8b52bcb9 -

The bold string is the password to copy-paste into the login screen. (NB: of course the data in the example above is not real, it's my mockup just to illustrate the principle ;-)

While this scheme is certainly not pixel-perfect cryptography, it gives certain advantages:

  • Even if the service is dumb enough to store the passwords in plain text - the attacker does not gain much when they hack it. They will just get the password to a particular site - but they would not learn much how did you derive this essentially random string.

  • Even if the attacker gets the file where the strings are written down - they do not gain much, as they would need to know the "master password" in order to create correct passwords for any of the web sites.

  • It is simple, light, and independent from the vendor (browser, etc.) - practically anywhere there is a sha1sum - so I only need a (tiny) file with site names/strings to carry around. The smaller the data to secure, the easier it is to secure it.

Of course, this also has a couple of disadvantages:

  • entering the passwords is a bit annoying. You need to do some manual operation.

  • entering the passwords is a bit annoying. This is the iPad version.

However, I value my peaceful sleep much more than the annoyance of the process.

Do you value your sleep ? If you do - stop reusing the passwords before you lose it.

Thursday, June 16, 2011

Coursemare incorporated promotional video

I've created this one a while ago, but did not move it from xtranormal to YouTube. The happy spectators of my little "NAT vs. IPv6" fun were asking if I did anything else - so here it is. This time our target it the applications that go out of their way in alienating their users.

Friday, June 10, 2011

Indexing wikipedia data dump with ZF

I have decided to try indexing the wikipedia HTML data dump with my zettair fork.

The specs of the machine: 8GB RAM, 8-core Xeon X3440 machine at 2.53Ghz.

multicore nature of the machine did not really matter, since the code is single threaded.

Some info on the dataset:

14257665 documents in total (just above 14 million, that is)

233GiB of data as per "du -kh" output.

The index size is 17Gb.

Indexing this set took about 1.5 days:

real 1935m54.592s
user 138m42.620s
sys 23m49.060s

The search times obviously depend on the word, below are some dummy samples, "cold" is the first time you search, "warm" is the subsequent time.


cold: 20 results of 11194872 shown (took 3.886843 seconds)
warm: 20 results of 11194872 shown (took 1.542165 seconds)


cold: 20 results of 26981 shown (took 1.168747 seconds)
warm: 20 results of 26981 shown (took 0.065107 seconds)


cold: 20 results of 5056 shown (took 1.040630 seconds)
warm: 20 results of 5056 shown (took 0.022333 seconds)


cold: 20 results of 15340 shown (took 0.840945 seconds)
warm: 20 results of 15340 shown (took 0.039736 seconds)

cold: 20 results of 1198 shown (took 0.868518 seconds)
warm: 20 results of 1198 shown (took 0.055091 seconds)

An interesting task probably would be to see how well does the indexing scale (i.e. reindex only partial datasets and graph them). But I figured I'd write up what I have for now.