## Sunday, June 28, 2009

### EM and etc.

OK, i admit that it's the Nth time that I studied EM algorithm all over again. But I don't quite blame myself when N<=10. I follows wikipedia page but didn't finish reading that. So were some other materials like this. Two most unbearable things related: different set of notations, and theory without concrete and suitable examples.

So what is EM? Well, i know its Expectation and Maximization... Well, firstly it's a way to do MLE. Actually it is said to be able to do MAP as well. EM promises to find a local minimum but cannot always find global minimum if the distribution is multi-modal. I just realized the derivation from our course advisor Carlos Guestrin very useful... I must be misunderstanding him for a long tim.

In short, we want to:
arg max(theta) (Data Likelihood), or

arg max(theta) (Log P(data))

And assume we have some observable variable notated by X and some hidden variable notated by Z, then a two-step EM procedure is defined as:

First, set initial theta randomly. Then, for each iteration:

E-step: Calculate
Q(t+1)(z|xj) = P(Z|Xj, theta(t))

M-step:
arg max(theta) [sigma(j) sigma(z) Q(t+1)(z|xj) logP(z,xj|theta)]

Several points here. First, after E-step, the Q(t+1)(z|xj) is settled, and we can consider it as a constant in M-step; Second, i think we really need some examples to get a clearer idea of what those formula means. I'll give two examples here, but i'll just explain the variable, rather than go through the whole calculations.

Example1 Gaussian Mixture Model (GMM)
The most popular example.

Assume that we have 2 (can be >2) gaussian distribution, each with mu (central point) and sigma (variance). Assume each point is generated by the following steps: 1. select which gaussian distribution it comes from (by another variables tau1, and tau2=1-tau1) and 2. place the point according to corresponding distribution. Then we want to estimate all above parameters (mu, sigma, tau, which constitutes theta). X here are each data points, and Z is which Gaussian distribution each data point belongs to, i.e., P(Z=1|x1), P(Z=2|x1), P(Z=1|x2)...

So first step is just choose random values of theta as initial values.

E-step: Calculate
P(Z|xj, theta) for each x and z value. Btw, this step is always simple because we can usually use Bayesian formula here.

M-step: Now we know all the P(Z|xj, theta) values. Just maximize the function by corresponding mu, sigma and tau.

The most simple scenario is that, we can directly calculate the partial derivative of each parameter, and set it to zero to find the value of parameter that minimize the function.

From section3. Two things i want to mention here:

1. Z here is a combination of (Zx,Zy). So the E-step actually calculates P(Zx,Zy|xi,yi,ri) for each rating i and each Zx, Zy.

2. At M-step, we need to use Lagrange multiplier to estimate parameter.

That's probably it. I'll keep updating if i find anything more interesting.

## Sunday, June 7, 2009

### Hash & VC6 Release Mode

VC6的release mode设置在 Build -> Set Action Configuration
* To do: 分析一下大规模数据的hash Vs BST。我老板说，大规模数据下hash根本没法用。听符老大的观点似乎hash在大数据量的时候反而更有优势。做一做

### 64-bit integer under VC6

using __int64 and unsigned __int64

generally it works fine (as you're using ordinary int / unsigned int), and the only problem i encountered until now is that, you cannot use cout<< (cin) to output (input) them, which means that you need to handle them seperately...

### Word, backspace doesn't work, but delete works

For example, if you highlight a text, want to delete it, but backspace key doens't work (so you can only use e.g. delete key), then:

Word 2003:
Tools -> Options -> check "Typing replaces selected text"

Word 2007:
Start -> Word Options -> check "Typing replaces selected text"