<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments for Code Blip</title>
	<atom:link href="http://code.blip.pt/comments/feed/" rel="self" type="application/rss+xml" />
	<link>http://code.blip.pt</link>
	<description>A place to share and discuss problems and solutions we find along the way.</description>
	<pubDate>Sun, 05 Feb 2012 12:07:26 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>Comment on K-Means Clustering in PHP by José</title>
		<link>http://code.blip.pt/2009/04/06/k-means-clustering-in-php/comment-page-1/#comment-132</link>
		<dc:creator>José</dc:creator>
		<pubDate>Sat, 23 May 2009 23:22:38 +0000</pubDate>
		<guid isPermaLink="false">http://code.blip.pt/?p=106#comment-132</guid>
		<description>I think you miss-read the diff in my previous comment, the function to recalculate $cPositions is returning the updated array as it should be:

&lt;pre lang="php"&gt;
function kmeans_recalculate_cpositions($cPositions, $data, $clusters)
{
        $kValues = kmeans_get_cluster_values($clusters, $data);
        foreach($cPositions as $k =&gt; $position)
        {
                $cPositions[$k] = empty($kValues[$k]) ? 0 : kmeans_avg($kValues[$k]);
        }
        return $cPositions;
}
&lt;/pre&gt;

If you copy the updated code you should get the correct results, however for the 1,1,1,1,1,2,20,20,20,20 data set you will still only get 2 clusters most of the time.</description>
		<content:encoded><![CDATA[<p>I think you miss-read the diff in my previous comment, the function to recalculate $cPositions is returning the updated array as it should be:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">function</span> kmeans_recalculate_cpositions<span style="color: #009900;">&#40;</span><span style="color: #000088;">$cPositions</span><span style="color: #339933;">,</span> <span style="color: #000088;">$data</span><span style="color: #339933;">,</span> <span style="color: #000088;">$clusters</span><span style="color: #009900;">&#41;</span>
<span style="color: #009900;">&#123;</span>
        <span style="color: #000088;">$kValues</span> <span style="color: #339933;">=</span> kmeans_get_cluster_values<span style="color: #009900;">&#40;</span><span style="color: #000088;">$clusters</span><span style="color: #339933;">,</span> <span style="color: #000088;">$data</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">foreach</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$cPositions</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$k</span> <span style="color: #339933;">=&gt;</span> <span style="color: #000088;">$position</span><span style="color: #009900;">&#41;</span>
        <span style="color: #009900;">&#123;</span>
                <span style="color: #000088;">$cPositions</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$k</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">empty</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$kValues</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$k</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> ? <span style="color: #cc66cc;">0</span> <span style="color: #339933;">:</span> kmeans_avg<span style="color: #009900;">&#40;</span><span style="color: #000088;">$kValues</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$k</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
        <span style="color: #b1b100;">return</span> <span style="color: #000088;">$cPositions</span><span style="color: #339933;">;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>If you copy the updated code you should get the correct results, however for the 1,1,1,1,1,2,20,20,20,20 data set you will still only get 2 clusters most of the time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on K-Means Clustering in PHP by Didac</title>
		<link>http://code.blip.pt/2009/04/06/k-means-clustering-in-php/comment-page-1/#comment-127</link>
		<dc:creator>Didac</dc:creator>
		<pubDate>Sat, 23 May 2009 09:07:47 +0000</pubDate>
		<guid isPermaLink="false">http://code.blip.pt/?p=106#comment-127</guid>
		<description>By the way, even in the function function kmeans_recalculate_cpositions(), there is a variable called $cPosition.

If you check your code, you will see that you want to recalculate the cPositions, but you do all the changes in $cPosition, and this variable is never returned or moved to $cPositions.

I guess that the variable called $cPosition should be called $cPositions.

At least I did that change and now the code runs properly. If doing so, the empty clusters will not really disapear, but they will have the same cPosition value they had before, in case some objects can move to this cluster.

I tried assigning random initial cPosition values, and sometimes I get repeated values, let's say 4,4 and 10.

For the example with the set
1,1,1,1,1,2,10,10,10,10
after one iteration, the cPositions become
1.111, 4 and 10

Even that's not the best example, it still gives a change to split the values in 3 clusters.</description>
		<content:encoded><![CDATA[<p>By the way, even in the function function kmeans_recalculate_cpositions(), there is a variable called $cPosition.</p>
<p>If you check your code, you will see that you want to recalculate the cPositions, but you do all the changes in $cPosition, and this variable is never returned or moved to $cPositions.</p>
<p>I guess that the variable called $cPosition should be called $cPositions.</p>
<p>At least I did that change and now the code runs properly. If doing so, the empty clusters will not really disapear, but they will have the same cPosition value they had before, in case some objects can move to this cluster.</p>
<p>I tried assigning random initial cPosition values, and sometimes I get repeated values, let&#8217;s say 4,4 and 10.</p>
<p>For the example with the set<br />
1,1,1,1,1,2,10,10,10,10<br />
after one iteration, the cPositions become<br />
1.111, 4 and 10</p>
<p>Even that&#8217;s not the best example, it still gives a change to split the values in 3 clusters.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on K-Means Clustering in PHP by José</title>
		<link>http://code.blip.pt/2009/04/06/k-means-clustering-in-php/comment-page-1/#comment-122</link>
		<dc:creator>José</dc:creator>
		<pubDate>Fri, 22 May 2009 22:55:36 +0000</pubDate>
		<guid isPermaLink="false">http://code.blip.pt/?p=106#comment-122</guid>
		<description>Hi Didac

Thanks for your comment. 

You are right, the code I posted was removing the clusters with no values assigned to it. This can be fixed by making some changes to the function that recalculates the clusters positions:

&lt;pre lang="php"&gt;
function kmeans_recalculate_cpositions($cPositions, $data, $clusters)
 {
 	$kValues = kmeans_get_cluster_values($clusters, $data);
-	foreach($kValues as $k =&gt; $values)
+	foreach($cPositions as $k =&gt; $position)
 	{
-		$cPosition[$k] = kmeans_avg($values);
+		$cPositions[$k] = empty($kValues[$k]) ? rand(min($data), max($data)) : kmeans_avg($kValues[$k]);
 	}
 	return $cPositions;
 }
&lt;/pre&gt;

In this case, if a cluster is empty I just pick a random position in the data set and wait for the next iteration.

For the data set 1,1,1,1,1,1,1,2,20,20,20,20,20 finding the clusters 1s, 2 and 20s would be the optimal solution, but with this algorithm you will, most of the time, be stuck with a less-optimal solution which would be the one with only 2 clusters, 1s-2 and 20s.

To find the optimal solution, you would have to improve the algorithm a lot. You would need to create a function to evaluate the quality of your solution, for example assuming that a solution where all clusters have values is better than a solution where some clusters are empty, and optimize it. 
&lt;a href="http://en.wikipedia.org/wiki/Simulated_annealing" rel="nofollow"&gt;Simulated annealing&lt;/a&gt;, for example, is a technique to optimize such functions - "...At each step, the SA heuristic considers some neighbour s' of the current state s, and probabilistically decides between moving the system to state s' or staying in state s. The probabilities are chosen so that the system ultimately tends to move to states of lower energy. Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted."

I hope this helps and thanks for visiting our blog,

José</description>
		<content:encoded><![CDATA[<p>Hi Didac</p>
<p>Thanks for your comment. </p>
<p>You are right, the code I posted was removing the clusters with no values assigned to it. This can be fixed by making some changes to the function that recalculates the clusters positions:</p>

<div class="wp_syntax"><div class="code"><pre class="php" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">function</span> kmeans_recalculate_cpositions<span style="color: #009900;">&#40;</span><span style="color: #000088;">$cPositions</span><span style="color: #339933;">,</span> <span style="color: #000088;">$data</span><span style="color: #339933;">,</span> <span style="color: #000088;">$clusters</span><span style="color: #009900;">&#41;</span>
 <span style="color: #009900;">&#123;</span>
 	<span style="color: #000088;">$kValues</span> <span style="color: #339933;">=</span> kmeans_get_cluster_values<span style="color: #009900;">&#40;</span><span style="color: #000088;">$clusters</span><span style="color: #339933;">,</span> <span style="color: #000088;">$data</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #339933;">-</span>	<span style="color: #b1b100;">foreach</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$kValues</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$k</span> <span style="color: #339933;">=&gt;</span> <span style="color: #000088;">$values</span><span style="color: #009900;">&#41;</span>
<span style="color: #339933;">+</span>	<span style="color: #b1b100;">foreach</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$cPositions</span> <span style="color: #b1b100;">as</span> <span style="color: #000088;">$k</span> <span style="color: #339933;">=&gt;</span> <span style="color: #000088;">$position</span><span style="color: #009900;">&#41;</span>
 	<span style="color: #009900;">&#123;</span>
<span style="color: #339933;">-</span>		<span style="color: #000088;">$cPosition</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$k</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> kmeans_avg<span style="color: #009900;">&#40;</span><span style="color: #000088;">$values</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
<span style="color: #339933;">+</span>		<span style="color: #000088;">$cPositions</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$k</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #990000;">empty</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$kValues</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$k</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span> ? <span style="color: #990000;">rand</span><span style="color: #009900;">&#40;</span><span style="color: #990000;">min</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$data</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">,</span> <span style="color: #990000;">max</span><span style="color: #009900;">&#40;</span><span style="color: #000088;">$data</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">:</span> kmeans_avg<span style="color: #009900;">&#40;</span><span style="color: #000088;">$kValues</span><span style="color: #009900;">&#91;</span><span style="color: #000088;">$k</span><span style="color: #009900;">&#93;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
 	<span style="color: #009900;">&#125;</span>
 	<span style="color: #b1b100;">return</span> <span style="color: #000088;">$cPositions</span><span style="color: #339933;">;</span>
 <span style="color: #009900;">&#125;</span></pre></div></div>

<p>In this case, if a cluster is empty I just pick a random position in the data set and wait for the next iteration.</p>
<p>For the data set 1,1,1,1,1,1,1,2,20,20,20,20,20 finding the clusters 1s, 2 and 20s would be the optimal solution, but with this algorithm you will, most of the time, be stuck with a less-optimal solution which would be the one with only 2 clusters, 1s-2 and 20s.</p>
<p>To find the optimal solution, you would have to improve the algorithm a lot. You would need to create a function to evaluate the quality of your solution, for example assuming that a solution where all clusters have values is better than a solution where some clusters are empty, and optimize it.<br />
<a href="http://en.wikipedia.org/wiki/Simulated_annealing" rel="nofollow">Simulated annealing</a>, for example, is a technique to optimize such functions - &#8220;&#8230;At each step, the SA heuristic considers some neighbour s&#8217; of the current state s, and probabilistically decides between moving the system to state s&#8217; or staying in state s. The probabilities are chosen so that the system ultimately tends to move to states of lower energy. Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.&#8221;</p>
<p>I hope this helps and thanks for visiting our blog,</p>
<p>José</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on K-Means Clustering in PHP by Didac</title>
		<link>http://code.blip.pt/2009/04/06/k-means-clustering-in-php/comment-page-1/#comment-117</link>
		<dc:creator>Didac</dc:creator>
		<pubDate>Mon, 18 May 2009 12:17:34 +0000</pubDate>
		<guid isPermaLink="false">http://code.blip.pt/?p=106#comment-117</guid>
		<description>You got an small problem in your code, I guess.

In case one of the clouds does not get any value, this cloud is directly deleted.
You could ask for 45 clusters, but if in one step you only get 30 clusters with values, your function kmeans_get_cluster_values() will make tha array to have just 30 clusters, and on following steps, you don't search for 45 clusters anymore. What about the cluster with mean=0?

What if you want 3 clusters and you have the dataset:
1,1,1,1,1,1,1,2,20,20,20,20,20

The result should be 1, 2 and 20, but as you get the initial representative values 1, 10 and 20, in the second step, 1s and the 2 go to 1, the 20s go to 20, and the three cluster array becomes a 2 cluster array, and no more changes are done.</description>
		<content:encoded><![CDATA[<p>You got an small problem in your code, I guess.</p>
<p>In case one of the clouds does not get any value, this cloud is directly deleted.<br />
You could ask for 45 clusters, but if in one step you only get 30 clusters with values, your function kmeans_get_cluster_values() will make tha array to have just 30 clusters, and on following steps, you don&#8217;t search for 45 clusters anymore. What about the cluster with mean=0?</p>
<p>What if you want 3 clusters and you have the dataset:<br />
1,1,1,1,1,1,1,2,20,20,20,20,20</p>
<p>The result should be 1, 2 and 20, but as you get the initial representative values 1, 10 and 20, in the second step, 1s and the 2 go to 1, the 20s go to 20, and the three cluster array becomes a 2 cluster array, and no more changes are done.</p>
]]></content:encoded>
	</item>
	<item>
		<title>Comment on K-Means Clustering in PHP by aldo</title>
		<link>http://code.blip.pt/2009/04/06/k-means-clustering-in-php/comment-page-1/#comment-82</link>
		<dc:creator>aldo</dc:creator>
		<pubDate>Thu, 14 May 2009 08:27:17 +0000</pubDate>
		<guid isPermaLink="false">http://code.blip.pt/?p=106#comment-82</guid>
		<description>thanks for your code</description>
		<content:encoded><![CDATA[<p>thanks for your code</p>
]]></content:encoded>
	</item>
</channel>
</rss>

