Monday, October 31, 2011

How to create a cost sharing app using Google Spreadsheets

I want to document a solution to a practical problem: a few people may want to share the expenses of various kind in a fair fashion.

The straightforward way is for everyone to just announce their expenses, and collect 1/n parts of them from everyone else. However, it is obvious that this is a very tedious way of doing things: say, 5 people spend 10 euros each on some things, thus each of them will need to give 4 times 2 euro - and receive 4 times 2 euro - the result of all this activity being exactly the same when it has started!

Thus, it's a clear area of optimization. At first I thought about a web app. Google apps could do that. However, I decided to try to be even lazier and to implement the whole thing using Google spreadsheets. The result is a successful "app" which in the main "screen" (the main sheet of the spreadsheet) displays for every person whether they owe money to someone or whether they need to collect money. A person who needs to collect money can chase some of the debtors, and then, once collected, put a note about that. The system will recalculate the debts.

First, the "theory" (if I may call this so:).

We start with a per-person value of how much expense do they incur. The "fairness" means that all of them would have equal expenses - so the target value is the average of the expenses. As one person gives the money to the other one in order to "balance" - this event needs to be recorded in order to decrease the imbalance for both - in different directions.

That's all that is there, really. The rest is the mechanics.

First, let's assume that the users are "normal" people and they do not want to enter the data into the spreadsheet manually (not only it is boring but is also very error prone).

So, we create so called "web form". Pitifully only one instance of this seems to be possible. Anyway, we make a web form with 5 fields:
"My name is": dropdown list with all the names.
"I have made an": dropdown with 2 choices: "expense" and "repayment"
"How many euros": the text input to put the $$$ value in.
"What for?": freeform text field to document the purpose of transaction.
"If you repaid to someone, to whom?": dropdown with the default choice of "---" and then the same names as in the first field.

Having to manually enter the names is somewhat annoying, I haven't figured out the way around it. Anyway, as you try entering the data into this form, you notice it starts to fill the sheet in. You will see the first line of the sheet being: "A>Timestamp |B> I am: | C>I have made a: | D>How many euros ? | E>What for ? | F>If you repaid someone, to whom did you repay ?" (the "A>" is the name of the column, for reference).

Starting from row 2 you will see the actual data. We need to shift the data to row 12 - the rows 2..11 we will need for various aggregate calculations.

The column "K" will hold the labels for them - just for us to remember. The contents of this column, starting from "K2":

Who
Got/Made
Type
Total
DeltaAvg(+/-)
Repaid
Received
Balance

Now, in L2, M2, N2... we need to put in the names - exactly the same way as we have typed them earlier in the web form. Let's say we have four people: Amy, Bob, Carol, Dave. L2 will read Amy, M2 - Bob, N2 - Carol, O2 - Dave; Repeat two more times in the same order. So P2 and T2 will read Amy, etc.

Why the names are repeated three times ? This will become obvious shortly. Go to the third line, and leave L3..O3 empty, then P3..S3 will become "made" and T3..W3 will become "got". Now go to the line 4, and the cells L4..O4 will need to have "expense", and the cells P4..W3 will need to have "repayment".

Now this clarifies the purpose of the three groups - this is a mechanism to untangle the data from the dense web form: each column will only get a value if that particular person was involved in that particular activity.

To calculate the total expenses per person, enter the following formula into L5: =SUM(L$12:L), and copypaste it into M5..W5 - notice that the column will auto-change. This is precisely as we need it. The line 5 will hold the calculated totals per activity per person.

Now let's calculate the average of total expenses per person. Enter the formula into the J6: =average(L5:O5), and put the label into J5 saying "Average total expense" or something similar.

L6 will hold the signed delta between the average and the expenses for Amy: =$J$6-L$5; copy-paste this same formula to M6..O6.

L7 will hold the amount of how much Amy has repaid to others, so the formula will be naturally the sum of her repayments: =SUM(P12:P); similarly to previous value - copy-paste this formula into M7..O7.

L8 will hold the amount of how much Amy has received from others in repayments. So, the formula is: =SUM(T12:T); again copy-paste this formula into M8..O8 - the column will auto-update to correct value.

L9 will hold the final "balance" for Amy - if she needs to pay, the value is positive, if she needs to get money from others, it will be negative. The formula is: =L6-L7+L8
and similarly needs to be copy-pasted into M9..O9.

We are almost there - now we need to actually populate the rows with the monetary values... One problem: the rows with the actual data are being dynamically added, how do we add the formulae there ? The answer is with the "arrayformula" - it autoexpands down as needed.

So, put the following formula into L12: =ARRAYFORMULA(IF($B12:$B=L$2,IF($C12:$C=L$4,$D12:$D,0),0))

This says that if the name of the participant matches the name for this column and the name of the activity matches - then the number entered will appear here, else it will be zero.

Copy this formula to M12..S12. This way we make sparse tables for two activities - entering expenses, and entering the repayments to others. If you move to L13, you will see the following formula autopopulated: =CONTINUE(L12, 2, 1). Neat, huh ?

Now we need to populate the third group - the synthetic one that captures the amounts the person *receives* in repayments. For this we need just to slightly modify the above arrayformula, and put it into T12: =ARRAYFORMULA(IF($F12:$F=T$2, IF($C12:$C=T$4,$D12:$D,0), 0))

You see that the only thing changed is the column from the form entry with which we compare.

Copy-paste this formula to U12..W12. We are almost there.

This completes the "brains" of the application, and the line 9 is already usable,
but we can make it more user friendly. Rename the current only sheet into "Expenses" and lock it. Create a new sheet called "Dashboard".

Go to cell B3 in this spreadsheet, and enter the formula: =Expenses!L2 - this will autopopulate the first name. Copy-paste this formula to C3..E3 - this will populate the other names.

Now we need to tell the status of the balances of these people.

Go to B5, and enter this formula:

=IF(Expenses!L9>0;concatenate("Needs to pay " , Expenses!L9, " euros"); IF(Expenses!L9<0; Concatenate("Needs to take ", -Expenses!L9, " euros"); ""))

This will translate the sign into the appropriate action for the person.

Last touch: we can liven the actions up, if we color the "needs to pay" actions with one color, and "needs to take" with another color.

Right-click the B5, and select "Conditional formatting".

Add a new rule with "text contains" "pay" and select red background and green color, and another rule with "text contains" "take" and select green background and red color.

Copy-paste the cell from B5 to C5..E5.

Lock this sheet too. You are done and now have an expense tracking application fully embedded into google spreadsheets.

Use the Web form to enter the expenses and the notes of repayment - and use the "Dashboard" as an indicator of ongoing balances - you can set your own rules when you need to repay to others, e.g. when the amount of money reaches over certain threshold.

There is also in theory a possible loss of precision that might happen as the total expense grows. If you think this is a problem - write your solution. I think I know how to solve it, but I don't want to spoil the fun for you!

Oh - and if you find this writeup useful - please leave a comment!

Wednesday, August 10, 2011

Want my attention ? Send me something useful.

[this is an expansion of my tweet earlier today.]

Getting the email address of your users and keeping in touch with them is a classic technique. So classic that it's been mentioned in umpteen different places - I won't even bother to find the references. So classic that quite a lot of companies do pick it up, eventually.

However, there is one detail. The emails that you send need to be wanted by the user. This is a detail that escapes frequently. As a result, I am getting a lot of updates from various companies "hey, we are doing this!" or "please take a look at our new new model of humbambillistic hyperbanana!" - while, if they noticed, I did not really use their service. I tried and did not find it too compelling, or it did not solve my problem.

More impulsive of you will exclaim - "hey but unwanted email is precisely the definition of spam!"

Well, not quite. The thing is, I might have agreed to be mailed at a particular point in time. When I thought this source might be interesting. However, the first mails proved not as much substance as I wanted. Should I unsubscribe ? Maybe. However, sometimes with the subcontractors acting on behalf of representatives of the newly formed subsidirary of the ... well, you get the point. It might not be trivial.
On the other hand, I do not want to send this straight away to GMail's "spam" folder - because it will harm the users who maybe are interested.

So, I found a way that would not harm anyone and requires minimal involvement from my side - simply create filters and transit those "not interesting anymore" messages to a special folder to settle.

Maybe I read it one day when I don't have anything to do, maybe I won't. But I won't read them now.

I do this if I consciously remember that I get more than two messages with the first three sentences not holding the content that is informative and useful for me.
This reveals the other side of the coin - it's somewhat ok to poke me if I do not remember when was the last time you poked me. Unless the background pokes from different places burst together such that I get sufficiently annoyed with random flood of poke-like messages on a particular day.

I wonder if I am unique in seeing this problem (and this maybe just my own laziness), or there is indeed something to it.

Thursday, August 4, 2011

Uploading the files to the VMWare ESXi server

Every now and then I need to upload the stuff like an ISO image to my ESXi server.

Since I do not use windows, my only remaining option is the CLI - which is big and clunky. However, today I digged a bit and after reading the CLI tools source realized that uploading the file onto an ESXi host is trivially simple:


  1. Browse the datastores till you get to the correct place, take that URL and append the target file name.
  2. perform the HTTP PUT request for that URL, supplying the data.


This way, the only tool you need to upload the files to VMWare ESXi really is curl:

curl -k -X PUT --data-binary @IMAGE.ISO 'https://user:pass@host/folder/FOLDERNAME/IMAGE.ISO?dcPath=DCNAME&dsName=DATASTORE'

Thursday, July 28, 2011

Microbenchmarking of luajit-based server (again: this time against lighttpd)

The other day inbetween the meetings I've ported my toy event loop experiment to use ljsyscall library.

Here are the results of running of ab -n 100000 -c 1000. First let's make a baseline:

lighttpd



# ab -n 100000 -c 1000 http://localhost:80/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Finished 100000 requests


Server Software: lighttpd/1.4.19
Server Hostname: localhost
Server Port: 80

Document Path: /
Document Length: 10 bytes

Concurrency Level: 1000
Time taken for tests: 7.854757 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Total transferred: 24331590 bytes
HTML transferred: 1001300 bytes
Requests per second: 12731.14 [#/sec] (mean)
Time per request: 78.548 [ms] (mean)
Time per request: 0.079 [ms] (mean, across all concurrent requests)
Transfer rate: 3025.05 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 39 281.0 15 3023
Processing: 5 19 16.5 21 740
Waiting: 3 14 15.9 14 734
Total: 9 58 287.6 38 3746

Percentage of the requests served within a certain time (ms)
50% 38
66% 40
75% 42
80% 43
90% 45
95% 47
98% 50
99% 94
100% 3746 (longest request)
#

luajit


Now let's try it on the primitive event loop.

# ab -n 100000 -c 1000 http://localhost:12345/
This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $> apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Finished 100000 requests


Server Software:
Server Hostname: localhost
Server Port: 12345

Document Path: /
Document Length: 13 bytes

Concurrency Level: 1000
Time taken for tests: 8.232656 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Total transferred: 5503355 bytes
HTML transferred: 1300793 bytes
Requests per second: 12146.75 [#/sec] (mean)
Time per request: 82.327 [ms] (mean)
Time per request: 0.082 [ms] (mean, across all concurrent requests)
Transfer rate: 652.77 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 44 336.4 6 3016
Processing: 4 10 13.9 10 766
Waiting: 3 8 13.9 7 762
Total: 10 55 341.0 16 3767

Percentage of the requests served within a certain time (ms)
50% 16
66% 17
75% 17
80% 18
90% 20
95% 22
98% 28
99% 3018
100% 3767 (longest request)


Upon multiple runs the numbers vary slightly, of course, but they stay within the same ballpark. I think this shows luajit is a very viable platform for server development.

----

Update:

curious cat as I am, I've added a HTTP parser that is made by a yet-unpublished-and-inefficient-and-incomplete patch for ragel that generates Lua state machines (and a ragel code from an earlier post about http parser.


ab -n 100000 -c 100 http://localhost:12345/
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient)
Completed 10000 requests
Completed 20000 requests
Completed 30000 requests
Completed 40000 requests
Completed 50000 requests
Completed 60000 requests
Completed 70000 requests
Completed 80000 requests
Completed 90000 requests
Completed 100000 requests
Finished 100000 requests


Server Software:
Server Hostname: localhost
Server Port: 12345

Document Path: /
Document Length: 19 bytes

Concurrency Level: 100
Time taken for tests: 21.980 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Total transferred: 6100000 bytes
HTML transferred: 1900000 bytes
Requests per second: 4549.62 [#/sec] (mean)
Time per request: 21.980 [ms] (mean)
Time per request: 0.220 [ms] (mean, across all concurrent requests)
Transfer rate: 271.02 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 1 22 3.1 21 30
Waiting: 1 22 3.1 21 30
Total: 3 22 3.1 21 30

Percentage of the requests served within a certain time (ms)
50% 21
66% 25
75% 25
80% 25
90% 25
95% 26
98% 27
99% 27
100% 30 (longest request)


This is not stellar, but I did not see yet where the bottlenecks are and if I can speed it up. Either way the 4.5K is still not too bad.

Update2:

with concurrency of 1000, I notice connection resets... After fixing up the mixing SIGPIPE handler of SIG_IGN, they still happen, apparently. Further debugging pending...

Saturday, June 25, 2011

Your IETF-writing will never be the same again

I've been reading up on SPDY today, and I noticed something interesting -
the spec at Mike Belshe's github repo has the extension ".xml".

So what, you say ? Given that it was rendered as the html output of the xml2rfc, I suspected there was XSLT in the game. And indeed, with one more search, I found the origin of this: Julian Reschke's work.

This is definitely very cool - it means you have to store one less format, and the results are immediately visible as you change the xml.

I will definitely give it a shot for the next draft to see if it lives up to my (inflated) expectations.

Friday, June 24, 2011

I came up with a new word today: faceboob

(v) to faceboob (someone): to earn money off the the humans' tendency to be curious, as well as to their natural tendency to trust their friends - specifically when using the social networks. Example use: "John Doe was faceboobed today."

Now, enough words, lets get to colorful pictures!

Prologue



You see this:

active_window_screenshot2011-06-23,23:44:06

Oh, boy! Aha! Free Boobz! Especially since one of your friends (or more!) shared this. Gotta be good!

Let's get another browser. But first lets investigate a bit - the link looks like an URL shortener.

active_window_screenshot2011-06-24,00:08:14

Yeah, it is chinese url shortener FTW! Better yet, if you go to home page it talks about a link being a spam and being deleted. This already rings the alarm bells, but let's see further - let's open up the link in the browser that is logged out of Facebook and see what happens.

active_window_screenshot2011-06-23,23:55:27

Whoa!!! Free boobz are just one click away! They need to verify the age - of course. Gonna be something really juicy. Better yet, the button is also helpfully localized. Thank you my friends! When we click this link, we get a popup to login to facebook.
Cool - so this is how it appeared in my friend feed. I'm the one with the OCD - the others simply tried to verify their age... oops. Anyway, lets close this annoying popup, I am not going to login anyway. Let's pretend that all happened in the background and see what happens further.

active_window_screenshot2011-06-23,23:56:05

Whoa! To verify my age I am going to get a chance to win an iPad ? Unbelievable! I am a lucky man today! iPads and iBoobz are falling from the sky! Let's click! I want a MacBook Air though - because I already have an iPad.

active_window_screenshot2011-06-23,23:47:30

The question asks something like "is it possible to play the videos on the MacBook Pro?" First, what a stupid question. Of course even the kid knows it is possible. Second, you are supposed to ask about the MacBook Air - I don't want a MacBook Pro!
Anyway, let's answer "Ja" - and go forward, towards the free gear and free boobz!
(the eye notices something in the right-top corner, but quickly dismisses it as uninteresting).

active_window_screenshot2011-06-23,23:47:36

Ahhah. Now the top-right thing starts to make sense. 1 euro for SMS sent, 1 euro for SMS received. This is an IQ test - so I enter my phone number, such that they send me an SMS and charge 1 euro. Brilliant, lets stick in some random number that is not a valid phone# and move on. I guess this is the time for the real age verification.

active_window_screenshot2011-06-23,23:53:36

Congratulations! You've been charged 1 euro for receiving an SMS! Now please also help us charge you another euro for sending an SMS! I start to be a bit unhappy about the whole situation - 2 euros for boobz aint really free anymore! I start to suspect the evil plot in this. They want to rip me off - let's go back to the first tab where they had at least the background pic.

An unpleasant surprise awaits me back there, there's no flirt at all on that page anymore. Instead, it coldly notifies:

active_window_screenshot2011-06-23,23:47:07

(Note that the text "Not completed" periodically changes into "checking..." - which makes a great illusion that the code *actually* checks whether you have completed the darn survey or not). The boobz are in the background - go and complete this survey, ASAP!

Well, there is a slight chance of a bug. And the page admits it. It offers to complete another survey if this survey does not bring the keys to the kingdom. Let's go and see about the iPad - two is better than one, after all... This one starts in a rather cheerful manner:

active_window_screenshot2011-06-23,23:44:58

It prompts me to select my favourite color! Yay! I prefer black.

active_window_screenshot2011-06-23,23:45:07

Wow, now is the real quiz: it asks which icon corresponds to the Facetime. That was easy. *click*

active_window_screenshot2011-06-23,23:45:17


Meeeeeh. This smiling lady promises me a charge of another 1 euro, and still no boobz!

I think I am going to give up here - no free boobz today. I'll better go read slashdot.

The practical takeaways from this little exercise, to avoid being faceboobed, remember these three simple rules:

  1. There is no free lunch boobz.

  2. Be bear aware! Always check where the links lead, even if it is your friend posting them.
  3. Use condoms.Always check them in a separate browser that is fully patched and is NOT logged into the site that you got the link from.



Remember, you are the target too.

And, unlike the computers, your antivirus is not auto-updated.

Keep this in mind.

update: oh, I forgot. When closing the browser, I get this popup:

active_window_screenshot2011-06-23,23:54:09

Now, this is called a plain greed, my friends.

Don't leave your mobile numbers to these folks. They are too greedy to rip the money off you.

And if you really really wanna boobz - here they are. (warning: age restricted, may be NSFW for you): not a faceboobed link.

Monday, June 20, 2011

Your account has been compromised. Will you finally stop reusing the passwords ?

Logging into gmail in the morning, I've got a prompt to change my password, which mentioned suspicious activity with my account. When logging in, I saw the possible reason why.

The MtGox bitcoin "exchange", where I created account to see what it is all look like, has been hacked, and the database has been leaked. Bummer. I even found my email and the user id on pastebin.com, alongside with the hash of my password. This can't be good, can it ?

Well, except since some while I started to be more pedantic about having separate passwords for different sites, no exceptions. My online passwords are 40-characters random hex strings, different for each site. This meant a sigh of relief - the data that the miscreants obtained is of minimal impact to other sites that I use.

So, the unique passwords saved my bacon today. If you reuse your passwords on more than one site, you should think a second time what would happen with your accounts in the occurence like this. If you are a windows user, you could get the Password Safe program by Bruce Schneier - it's open source and free. In this day and age, you should avoid reusing the passwords, period.

I would also use this chance to describe how my "separate password" strategy looks like.

There are two components of it:


  1. A program called "sha1sum" - which exists on most of the *nix versions, and essentially just calculates a cryptographic one-time hash function.


  2. A text file with the editor which stores the material for my passwords, in plain text.Each line of the text file contains two strings: the site name and a random string. I choose the random string to be sufficiently long to be difficult to guess / brute force (some 30 characters or so).


When I need to get the password for a website, I open up the file and find the line that is corresponding to that website.

Then I start up the "sha1sum" program, it starts to await for my input. I copypaste the site name and a memorable "master password" - that I do not write down anywhere. This is the thing that allows me to not worry extremely much about the safety of the file with the key material (alone it would not give out an easy target. Even though of course I do not put it out in plain view)

After I entered both strings on one line, I press enter and then Ctrl-D - this signals an end of input for the sha1sum program, and it spits out a 40-character hex number at me.

Great, now it is time to launch the sha1sum again - this time I copy-paste the hex number from the previous run, and then the "random text string" from the file.
After that I press Enter and then Ctrl-D - getting another long hex string as a result.

This is my password that I now copy-paste into the web site that I need to login to.
Here's how it looks like:


ayourtch@ayourtch-lnx:~$ grep gmail p-material
gmail 1243pyupqwe,jl23hl23khjkh23khpw'@
ayourtch@ayourtch-lnx:~$ sha1sum
gmail this is my secret phrase
f3c446b01b24022c136bde50d32a1f9d4e9cd7fb -
ayourtch@ayourtch-lnx:~$ sha1sum
f3c446b01b24022c136bde50d32a1f9d4e9cd7fb 1243pyupqwe,jl23hl23khjkh23khpw'@
8e485f361bfe1834d281209121eb5d4a8b52bcb9 -
ayourtch@ayourtch-lnx:~$


The bold string is the password to copy-paste into the login screen. (NB: of course the data in the example above is not real, it's my mockup just to illustrate the principle ;-)

While this scheme is certainly not pixel-perfect cryptography, it gives certain advantages:

  • Even if the service is dumb enough to store the passwords in plain text - the attacker does not gain much when they hack it. They will just get the password to a particular site - but they would not learn much how did you derive this essentially random string.

  • Even if the attacker gets the file where the strings are written down - they do not gain much, as they would need to know the "master password" in order to create correct passwords for any of the web sites.

  • It is simple, light, and independent from the vendor (browser, etc.) - practically anywhere there is a sha1sum - so I only need a (tiny) file with site names/strings to carry around. The smaller the data to secure, the easier it is to secure it.


Of course, this also has a couple of disadvantages:

  • entering the passwords is a bit annoying. You need to do some manual operation.

  • entering the passwords is a bit annoying. This is the iPad version.



However, I value my peaceful sleep much more than the annoyance of the process.

Do you value your sleep ? If you do - stop reusing the passwords before you lose it.