<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>A Random Walk</title>
	<atom:link href="http://larkin.nuclearwinter.com/blog/index.php/feed/" rel="self" type="application/rss+xml" />
	<link>http://larkin.nuclearwinter.com/blog</link>
	<description>From Disorder and Chaos, comes more Disorder and Chaos</description>
	<lastBuildDate>Tue, 27 Mar 2012 00:01:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1</generator>
		<item>
		<title>Adding timestamps to each line of output</title>
		<link>http://larkin.nuclearwinter.com/blog/index.php/2012/03/adding-timestamps-to-each-line-of-output/</link>
		<comments>http://larkin.nuclearwinter.com/blog/index.php/2012/03/adding-timestamps-to-each-line-of-output/#comments</comments>
		<pubDate>Tue, 27 Mar 2012 00:01:22 +0000</pubDate>
		<dc:creator>llowrey</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://larkin.nuclearwinter.com/blog/?p=77</guid>
		<description><![CDATA[Have a long running script or command? Would you like to know where it&#8217;s spending its time? The tool to use is ts which is distributed as part of moreutils. If you&#8217;re using a Red Hat product (Fedora / Red Hat Enterprise Linux, CentOS, etc) you can get the ts command by installing the moreutils [...]]]></description>
			<content:encoded><![CDATA[<p>Have a long running script or command? Would you like to know where it&#8217;s spending its time?</p>
<p>The tool to use is ts which is distributed as part of <a href="http://kitenet.net/~joey/code/moreutils/">moreutils</a>.</p>
<p>If you&#8217;re using a Red Hat product (Fedora / Red Hat Enterprise Linux, CentOS, etc) you can get the ts command by installing the moreutils rpm via yum.</p>
<p><code>yum install moreutils</code></p>
<p>Usage is simple. Pipe the output from your script/command to ts and, optionally, supply a format spec. The example below is the hour of the day (%H &#8211; 24 clock), the minute (%M), and the second with microsecond fraction (%.S). I added the pipe (|) just to make the output easier to read.</p>
<p><code><br />
$ ping google.com | ts '%H:%M:%.S |'<br />
18:51:47.874676 | PING google.com (74.125.227.96) 56(84) bytes of data.<br />
18:51:47.874889 | 64 bytes from dfw06s16-in-f0.1e100.net (74.125.227.96): icmp_req=1 ttl=55 time=18.0 ms<br />
18:51:48.860208 | 64 bytes from dfw06s16-in-f0.1e100.net (74.125.227.96): icmp_req=2 ttl=55 time=16.3 ms<br />
18:51:49.862409 | 64 bytes from dfw06s16-in-f0.1e100.net (74.125.227.96): icmp_req=3 ttl=55 time=17.0 ms<br />
</code></p>
]]></content:encoded>
			<wfw:commentRss>http://larkin.nuclearwinter.com/blog/index.php/2012/03/adding-timestamps-to-each-line-of-output/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Grub2 and RAID 0.90 Superblocks</title>
		<link>http://larkin.nuclearwinter.com/blog/index.php/2012/03/grub2-and-raid-0-90-superblocks/</link>
		<comments>http://larkin.nuclearwinter.com/blog/index.php/2012/03/grub2-and-raid-0-90-superblocks/#comments</comments>
		<pubDate>Tue, 13 Mar 2012 20:32:54 +0000</pubDate>
		<dc:creator>llowrey</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://larkin.nuclearwinter.com/blog/?p=69</guid>
		<description><![CDATA[Grub2 (currently v1.99) allows for booting from mdraid devices. That&#8217;s the good news. The bad news is that its auto-detection mechanism cannot differentiate between whole disk raid devices and raid partitions that end at or near the last sector of the disk when version 0.90 superblocks are used. The problem is that a version 0.90 [...]]]></description>
			<content:encoded><![CDATA[<p>Grub2 (currently v1.99) allows for booting from mdraid devices. That&#8217;s the good news. The bad news is that its auto-detection mechanism cannot differentiate between whole disk raid devices and raid partitions that end at or near the last sector of the disk when version 0.90 superblocks are used.</p>
<p>The problem is that a version 0.90 superblock is placed at the end of the device (disk or partition as the case may be) and if you have a partition at the end of the disk then both the disk and the partition would use the same superblock location.</p>
<p>Version 1.x superblocks are more sophisticated and no such ambiguity is possible, so we&#8217;re just concerned about legacy 0.90 superblocks.</p>
<p>Why should you care? You will care because grub2 will fail to boot, telling you something like:</p>
<blockquote><p>error: superfluous RAID member (2 found)</p></blockquote>
<p>What can you do about it?</p>
<ol>
<li>Convert to version 1.x superblocks. There is no direct way to do this and the methods that do exist are risky and will not be discussed here.</li>
<li>Properly position the end of your partitons</li>
</ol>
<p>Option #1 is too risky (for most) so option #2 is your best bet.</p>
<p><a href="https://raid.wiki.kernel.org/articles/r/a/i/RAID_superblock_formats_fd05.html#The_version-0.90_Superblock_Format">Superblock Location</a></p>
<blockquote><p>The superblock is 4K long and is written into a 64K aligned block that starts at least 64K and less than 128K from the end of the device (i.e. to get the address of the superblock round the size of the device down to a multiple of 64K and then subtract 64K). The available size of each device is the amount of space before the super block, so between 64K and 128K is lost when a device in incorporated into an MD array.</p></blockquote>
<p>What should the last sector of the last partition be?</p>
<p>Example: 2TB drive with 3,907,029,168 sectors</p>
<p>Last 64K aligned sector: trunc(3,907,029,168 / 128) * 128 = 3,907,029,120<br />
Start of whole disk superblock: 3,907,029,120 &#8211; 128 = 3,907,028,992<br />
Last sector before whole disk superblock: 3,907,028,992 &#8211; 1 = 3,907,028,991</p>
<p>If your last partition ends on sector 3,907,028,991 you will not suffer from superfluous RAID members.</p>
<p>Of course, if you have an existing array you will probably need to shrink the array then drop, fdisk, then re-add each member one at a time.</p>
]]></content:encoded>
			<wfw:commentRss>http://larkin.nuclearwinter.com/blog/index.php/2012/03/grub2-and-raid-0-90-superblocks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Droid 3 vs Droid 1 LCD Matrix</title>
		<link>http://larkin.nuclearwinter.com/blog/index.php/2011/07/droid-3-vs-droid-1-lcd-matrix/</link>
		<comments>http://larkin.nuclearwinter.com/blog/index.php/2011/07/droid-3-vs-droid-1-lcd-matrix/#comments</comments>
		<pubDate>Sat, 09 Jul 2011 18:10:43 +0000</pubDate>
		<dc:creator>llowrey</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://larkin.nuclearwinter.com/blog/?p=58</guid>
		<description><![CDATA[There has been much chatter and complaining about the &#8220;PenTile&#8221; displays used by Motorola in their latest phones. I decided to break out the old microscope to see what the big deal was all about. The photos above are of the digits of the time display in the notification area at the top. The photo [...]]]></description>
			<content:encoded><![CDATA[<p>There has been much chatter and complaining about the &#8220;PenTile&#8221; displays used by Motorola in their latest phones. I decided to break out the old microscope to see what the big deal was all about.</p>
<p><img src="http://larkin.nuclearwinter.com/blog/wp-content/uploads/2011/07/droid3lcd.png" alt="" title="droid3lcd" width="238" height="300" class="alignnone size-full wp-image-60" /> <img src="http://larkin.nuclearwinter.com/blog/wp-content/uploads/2011/07/droid1lcd-300x277.png" alt="" title="droid1lcd" width="300" height="277" class="alignnone size-medium wp-image-59" /></p>
<p>The photos above are of the digits of the time display in the notification area at the top. The photo on the left is the Droid 3 and the right is the Droid 1. The Droid 3 utilizes four squares of (cw from upper left) Red, Green, White, and Blue. The Droid 1 utilizes three vertical bars of (left to right) Red, Green, and Blue.</p>
<p>PenTile Reference: <a href="http://www.nouvoyance.com/technology.html">www.nouvoyance.com/technology.html</a></p>
<table border="1">
<tr>
<th>Property</th>
<th>Droid 3</th>
<th>Droid 1</th>
</tr>
<tr>
<th>Diagonal</th>
<td align="right">4.0&#8243;</td>
<td align="right">3.7&#8243;</td>
</tr>
<tr>
<th>Aspect Ratio</th>
<td align="right">16:9</td>
<td align="right">16:9</td>
</tr>
<tr>
<th>Width in</th>
<td align="right">1.96&#8243;</td>
<td align="right">1.81&#8243;</td>
</tr>
<tr>
<th>Height in</th>
<td align="right">3.49&#8243;</td>
<td align="right">3.22&#8243;</td>
</tr>
<tr>
<th>Width px</th>
<td align="right">960</td>
<td align="right">854</td>
</tr>
<tr>
<th>Height px</th>
<td align="right">540</td>
<td align="right">480</td>
</tr>
<tr>
<th>PPI</th>
<td align="right">275.4</td>
<td align="right">264.8</td>
</tr>
</table>
]]></content:encoded>
			<wfw:commentRss>http://larkin.nuclearwinter.com/blog/index.php/2011/07/droid-3-vs-droid-1-lcd-matrix/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Powernow-k8 AMD Turbo Core Bug</title>
		<link>http://larkin.nuclearwinter.com/blog/index.php/2010/05/powernow-k8-amd-turbo-core-bug/</link>
		<comments>http://larkin.nuclearwinter.com/blog/index.php/2010/05/powernow-k8-amd-turbo-core-bug/#comments</comments>
		<pubDate>Mon, 03 May 2010 16:57:47 +0000</pubDate>
		<dc:creator>llowrey</dc:creator>
				<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://larkin.nuclearwinter.com/blog/?p=34</guid>
		<description><![CDATA[I just recently replaced my quad-core Phenom II x4 955 with a hexa-core Phenom II x6 1090T. The &#8216;T&#8217; designation in the model number indicates that the processor supports AMD&#8217;s Turbo Core functionality which allows the processor to &#8216;overclock&#8217; up to three cores under certain circumstances. The default maximum core clock is 3.2GHz, but when [...]]]></description>
			<content:encoded><![CDATA[<p>I just recently replaced my quad-core Phenom II x4 955 with a hexa-core Phenom II x6 1090T. The &#8216;T&#8217; designation in the model number indicates that the processor supports AMD&#8217;s Turbo Core functionality which allows the processor to &#8216;overclock&#8217; up to three cores under certain circumstances. The default maximum core clock is 3.2GHz, but when three or more cores are idle, Turbo Core can boost the remaining cores to 3.6GHz. Read more on Turbo Core <a href="http://www.anandtech.com/show/3641/amd-divulges-phenom-ii-x6-secrets-turbo-core-enabled">here</a>.</p>
<p>Linux kernel 2.6.33.3 and the 1090T do not play nice together. The powernow-k8 driver does not properly detect cpu speed p-states. A kernel patch has been committed to address this (see <a href="http://git.kernel.org/?p=linux/kernel/git/x86/linux-2.6-tip.git;a=commitdiff;h=b810e94c9d8e3fff6741b66cd5a6f099a7887871">this</a>).  Linux Magazine has an <a href="http://www.linux-magazine.com/Online/News/Phenom-II-X6-Performance-Under-Linux-Below-Expectations">article</a> that describes the problem, and the upcoming patch, and recommends  disabling Cool&#8217;n'Quiet until a patched kernel is released.</p>
<p>UPDATE: Kernel version 2.6.33.4, released Wed May 12 2010, resolves this issue.</p>
<p><span id="more-34"></span>AMD uses ACPI trickery to implement Turbo Core. Ordinarily, p-state 0 is the highest performance mode of the processor (highest clock) and p-state 1 is less performance than p-state 0, and p-state 2 is less performance than p-state 1, etc. A Turbo Core processor p-state 0 is the &#8216;overclocked&#8217; value. In the case of the 1090T this is 3.6GHz. Turbo Core exposes hardware p-state 1 as software p-state 0 to the operating system kernel. Essentially, the p-states the operating system sees are one integer value larger than the hardware p-state.</p>
<table border="1">
<tbody>
<tr>
<th>Clock Speed</th>
<th>Hardware</th>
<th>Software</th>
</tr>
<tr>
<td align="right">3600MHz</td>
<td align="center">p-state 0</td>
<td align="center">n/a</td>
</tr>
<tr>
<td align="right">3200MHz</td>
<td align="center">p-state 1</td>
<td align="center">p-state 0</td>
</tr>
<tr>
<td align="right">2400MHz</td>
<td align="center">p-state 2</td>
<td align="center">p-state 1</td>
</tr>
<tr>
<td align="right">1600MHz</td>
<td align="center">p-state 3</td>
<td align="center">p-state 2</td>
</tr>
<tr>
<td align="right">800MHz</td>
<td align="center">p-state 4</td>
<td align="center">p-state 3</td>
</tr>
</tbody>
</table>
<p>That&#8217;s the theory, now for the practice. The following is what the powernow-k8 driver in kernel 2.6.33.3 reports:</p>
<pre>powernow-k8: Found 1 AMD Phenom(tm) II X6 1090T Processor processors (6 cpu cores) (version 2.20.00)
powernow-k8:    0 : pstate 0 (3600 MHz)
powernow-k8:    1 : pstate 1 (3200 MHz)
powernow-k8:    2 : pstate 2 (2400 MHz)
powernow-k8:    3 : pstate 3 (1600 MHz)</pre>
<p>It knows it should only see 4 p-states, but it is seeing the hardware p-states, not the software p-states. I have posted bug <a href="https://bugzilla.kernel.org/show_bug.cgi?id=15896">#15896</a> about this. It is not clear to me if this is a motherboard BIOS issue or an issue with powernow-k8. I am running this processor on a Gigabyte GA-890GPA-UD3H v1 with the latest non-beta BIOS vF6. There is a beta vF7B, which I will try, but I&#8217;d rather not need to use a beta BIOS.</p>
<p>Given the above output from powernow-k8, the observed behavior is somewhat expected. The idle clock state for the ondemand governor is 1.6GHz instead of 800MHz, and the top actual clock is 3.2GHz. Since software control of hardware p-state 0 is not possible, cpufreq is not able to switch to p-state 0 and thus records no time spent at 3.6GHz.</p>
<p>I manually applied the patch identified above and recompiled the kernel. The now patched kernel does correctly identify the p-states of the 1090T.</p>
<pre>powernow-k8: Found 1 AMD Phenom(tm) II X6 1090T Processor processors (6 cpu cores) (version 2.20.00)
powernow-k8:    0 : pstate 0 (3200 MHz)
powernow-k8:    1 : pstate 1 (2400 MHz)
powernow-k8:    2 : pstate 2 (1600 MHz)
powernow-k8:    3 : pstate 3 (800 MHz)</pre>
<p>The CPU now properly idles at 800MHz. Further, I have been able to verify that Turbo Core is in fact working as advertised. I used a benchmarking tool called <a href="http://kornelix.squarespace.com/lbench/">lbench</a> to verify. Some of the tests produced erratic results, but the asin() tests were stable and repeatable, so I used them. Here are the results:</p>
<table border="1">
<tbody>
<tr>
<th>Threads</th>
<th>asin() MOps</th>
<th>Avg GHz</th>
</tr>
<tr>
<td align="center">1</td>
<td align="center">29.4</td>
<td align="center">3.6</td>
</tr>
<tr>
<td align="center">2</td>
<td align="center">27.4</td>
<td align="center">3.3</td>
</tr>
<tr>
<td align="center">3</td>
<td align="center">27.1</td>
<td align="center">3.3</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">26.5</td>
<td align="center">3.2</td>
</tr>
<tr>
<td align="center">5</td>
<td align="center">26.6</td>
<td align="center">3.2</td>
</tr>
<tr>
<td align="center">6</td>
<td align="center">26.5</td>
<td align="center">3.2</td>
</tr>
</tbody>
</table>
<p>I used the value of 26.5 as the normal value for 3.2GHz since Turbo Core would be disabled at 6 threads and all cores would run at 3.2GHz. I was a bit disappointed to see that the boost for two and three threads was not very substantial.</p>
<p>UPDATE 3-17-2011: I re-ran the lbench tests with kernel 2.6.37 with all 6 cores set to 3.2GHz via the userspace governor. I didn&#8217;t have the lbench I used previously so I downloaded v1.6. I am not sure of the version I used before. The asin tests were a bit odd so I ran the fibonacci (44) test.</p>
<table border="1">
<tbody>
<tr>
<th>Threads</th>
<th>asin() MOps</th>
<th>Avg GHz</th>
<th>Fibo44 secs</th>
<th>Avg GHz</th>
</tr>
<tr>
<td align="center">1</td>
<td align="center">32.5</td>
<td align="center">3.54</td>
<td align="center">12.4</td>
<td align="center">3.63</td>
</tr>
<tr>
<td align="center">2</td>
<td align="center">33.3</td>
<td align="center">3.63</td>
<td align="center">12.6</td>
<td align="center">3.59</td>
</tr>
<tr>
<td align="center">3</td>
<td align="center">31.4</td>
<td align="center">3.42</td>
<td align="center">13.0</td>
<td align="center">3.45</td>
</tr>
<tr>
<td align="center">4</td>
<td align="center">29.8</td>
<td align="center">3.25</td>
<td align="center">13.7</td>
<td align="center">3.28</td>
</tr>
<tr>
<td align="center">5</td>
<td align="center">29.9</td>
<td align="center">3.26</td>
<td align="center">13.8</td>
<td align="center">3.27</td>
</tr>
<tr>
<td align="center">6</td>
<td align="center">29.4</td>
<td align="center">3.20</td>
<td align="center">14.1</td>
<td align="center">3.20</td>
</tr>
</tbody>
</table>
<p>The results for 2 and 3 threads are clearly improved vs kernel 2.6.33.4. However, the slightly slower single thread asin performance suggests a slight asymmetry in floating point performance with Turbo Core active.</p>
]]></content:encoded>
			<wfw:commentRss>http://larkin.nuclearwinter.com/blog/index.php/2010/05/powernow-k8-amd-turbo-core-bug/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Journals, Metadata, and Performance</title>
		<link>http://larkin.nuclearwinter.com/blog/index.php/2010/04/journals-metadata-and-performance/</link>
		<comments>http://larkin.nuclearwinter.com/blog/index.php/2010/04/journals-metadata-and-performance/#comments</comments>
		<pubDate>Wed, 21 Apr 2010 18:48:10 +0000</pubDate>
		<dc:creator>llowrey</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://larkin.nuclearwinter.com/blog/?p=17</guid>
		<description><![CDATA[Linux Magazine has a few interesting articles relating to filesystem journals, internal and external, and how metadata performance is affected. Size Can Matter: Improving Metadata Performance with Ext4 Journal Sizing &#8211; Part I Size Can Matter: Ramdisk Journal Metadata Performance &#8211; Part 2 Size Can Matter: Would You Prefer the Hard Drive or the Ramdisk [...]]]></description>
			<content:encoded><![CDATA[<p>Linux Magazine has a few interesting articles relating to filesystem journals, internal and external, and how metadata performance is affected.</p>
<ul>
<li><a href="http://www.linux-mag.com/cache/7666/1.html">Size Can Matter: Improving Metadata Performance with Ext4 Journal Sizing &#8211; Part I</a></li>
<li><a href="http://www.linux-mag.com/cache/7675/1.html">Size Can Matter: Ramdisk Journal Metadata Performance &#8211; Part 2</a></li>
<li><a href="http://www.linux-mag.com/cache/7682/1.html">Size Can Matter: Would You Prefer the Hard Drive or the Ramdisk this Evening? Part 3</a></li>
<li><a href="http://www.linux-mag.com/id/7742/1/">Harping on Metadata Performance: New Benchmarks</a></li>
</ul>
<p><span id="more-17"></span>The take home lesson is that moving the journal to another device, whether mechanical or solid state, markedly improves performance. The incremental benefit of moving from mechanical to solid state is not substantial, but this is to be expected since journal operations are sequential write-only during normal on-line operation.</p>
<p>The testing done in the articles was on filesystems on single disks. Given the write performance penalties imposed by raid5/6 parity, especially for small writes such as metadata, I expect the improvement from using an external journal to be far more pronounced for parity raid.</p>
<p>Unless one is very lucky and can write an entire stripe in one go, or the entire stripe is in cache, it is necessary to read the rest of the stripe (worst case) so that parity can be computed. This read-modify-write operation is costly and has a significant impact on write performance.</p>
<p>The rational behind writing to an external journal for parity raid (5/6) is that the external device can be either a single device or a non-parity raid (0/1/10) and thus would not require the read-modify-write transaction of the parity raid volume.  This has many benefits.  First, by eliminating the read, the data will be durably persisted to disk much faster than if the journal had been on the parity raid device. This reduces the exposure to loss of data in the event of a crash, panic, or power loss. Secondly, by deferring the metadata writes via the external journal, it will be possible to coalesce multiple writes into a single, more efficient, transaction on the parity raid volume, and/or defer the metadata writes to a later time when the raid i/o system is idle or less busy.</p>
<p>The security of the journal is critical. The loss or corruption of the journal would require a full fsck that could take an unimaginably long time for large volumes. Furthermore, this fsck would be off-line thus taking dependent resources off-line for a substantial period of time.</p>
<p>Given the failure rates of mechanical and solid state disks, it is clear that for a single drive journal, solid state is the least risky. Both, however, are susceptible to non-device failures such as SATA port failure, accidental cable disconnection, etc. The safer solution is to employ raid 1 or 10. It would be foolish to utilize raid 0 since the failure probability doubles (by doubling the hardware) and there is no protection from failure (no parity).</p>
<p>Though foolish, I do intend to employ raid 0 with my ANS-9010 RAM based solid state disk. My rationale is that the ANS-9010 is a single device with a nearly equal failure rate whether I use one of it&#8217;s SATA ports or both. Yes, one port could fail, but I consider it more likily that the entire device will fail rather than just a portion. Regardless, I must consider the ANS-9010 to have the same journal reliability of a single device (none).</p>
<p>Stay tuned for parts III and IV of my <a href="http://larkin.nuclearwinter.com/blog/index.php/2010/04/raid5-journal-on-ssd/">Linux Raid5 + Journal on SSD = Speed</a> to see if real world results match the theory.</p>
]]></content:encoded>
			<wfw:commentRss>http://larkin.nuclearwinter.com/blog/index.php/2010/04/journals-metadata-and-performance/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Java Thread Priority Annoyances</title>
		<link>http://larkin.nuclearwinter.com/blog/index.php/2010/04/java-thread-priority-annoyances/</link>
		<comments>http://larkin.nuclearwinter.com/blog/index.php/2010/04/java-thread-priority-annoyances/#comments</comments>
		<pubDate>Tue, 20 Apr 2010 17:39:33 +0000</pubDate>
		<dc:creator>llowrey</dc:creator>
				<category><![CDATA[Java]]></category>
		<category><![CDATA[Linux]]></category>

		<guid isPermaLink="false">http://larkin.nuclearwinter.com/blog/?p=10</guid>
		<description><![CDATA[There is a post by Endre Stølsvik entitled Linux Java Thread Priorities workaround that is definitely worth a read. A few years ago I attempted to make use of low priority background threads to use idle cores to offload work. I quickly found that setting java thread priorities has no effect when running on Linux. [...]]]></description>
			<content:encoded><![CDATA[<p>There is a post by Endre Stølsvik entitled <a href="http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html">Linux Java Thread Priorities workaround</a> that is definitely worth a read.</p>
<p>A few years ago I attempted to make use of low priority background threads to use idle cores to offload work. I quickly found that setting java thread priorities has no effect when running on Linux. Being unaware of workarounds, I abandoned the idea.</p>
<p>My solution was to manage the priority of individual work units by feeding small(ish) operations into a queue and starting (n) worker threads to service the queue. This worked well for non-blocking work (nio) but failed miserably for work that required blocking i/o.  Ideally, it would be possible to have a very low priority thread that runs mostly during periods when  other threads are blocked on i/o. Endre Stølsvik&#8217;s workaround should almost certainly make this possible.</p>
<div id="_mcePaste" style="position: absolute; left: -10000px; top: 0px; width: 1px; height: 1px; overflow: hidden;">http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html</div>
]]></content:encoded>
			<wfw:commentRss>http://larkin.nuclearwinter.com/blog/index.php/2010/04/java-thread-priority-annoyances/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Linux Raid5 + Journal on SSD = Speed &#8212; Part I</title>
		<link>http://larkin.nuclearwinter.com/blog/index.php/2010/04/raid5-journal-on-ssd/</link>
		<comments>http://larkin.nuclearwinter.com/blog/index.php/2010/04/raid5-journal-on-ssd/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 16:51:08 +0000</pubDate>
		<dc:creator>llowrey</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Storage]]></category>

		<guid isPermaLink="false">http://larkin.nuclearwinter.com/blog/?p=1</guid>
		<description><![CDATA[The Problem RAID level 5 (and 4 and 6) suffers from poor write performance due to the need to recalculate parity. Writes smaller than (n-1) * chunk size will necessitate that the the remaining chunks be read in order to compute a new parity chunk. This read-modify-write requirement significantly reduces performance for all non large [...]]]></description>
			<content:encoded><![CDATA[<h2>The Problem</h2>
<p>RAID level 5 (and 4 and 6) suffers from poor write performance due to the need to recalculate parity. Writes smaller than (n-1) * chunk size will necessitate that the the remaining chunks be read in order to compute a new parity chunk. This read-modify-write requirement significantly reduces performance for all non large sequential writes. To add insult to injury, the additional seeks necessary to read the missing chunks adds further I/O load which will slow down random reads. During periods of high write load, concurrent reads will suffer.</p>
<p><span id="more-1"></span></p>
<h2>The Solution</h2>
<p>Typically, the problem of small random writes is solved using an expensive hardware RAID card that employs a large battery backed cache (NVRAM) to buffer small writes into large (and hopefully) less burdensome writes. Those of us not willing to spend a lot of money on a proprietary card have one other option: placing the filesystem journal on a Solid State Disk (SSD). Most modern filesystems employ a small region of storage to record a log (journal) of disk write intents. If the host system fails while writes are in progress, the filesystem can recover (to a great, but not necessarily complete extent) by replaying the transactions stored in the journal. The primary goal of the journal is to improve the integrity of filesystem structures in the event of a failure that prevents the clean shutdown and closing of the filesystem. That mechanism can be used to improve write performance for linux software raid.</p>
<p>By placing the journal on a high performance device, such as a flash or RAM based SSD, the filesystem can tolerate writes at a faster rate than the underlying disk (or RAID) can sustain, at least until the journal fills. The theory is that, much like the cache of a hardware raid card, the SSD based journal will allow small writes to be combined into larger sequential writes which are much more efficiently handled by RAID 5.</p>
<p>The original work that inspired this effort can be found <a href="http://insights.oetiker.ch/linux/external-journal-on-ssd/">here</a>.</p>
<h2>The Hardware</h2>
<p>There are two options for SSD to consider, flash and RAM based storage. Flash based SSDs are becoming increasingly fast and inexpensive. They offer the benefit of being non-volitile but at the cost of limited write cycles, on the order of 10,000 writes for MLC flash. RAM based SSDs require a battery in order to be non-volitile but do not suffer the write cycle limits of flash. The cost of unlimited writes is a very high cost per gigabyte of storage. As of this writing, RAM can be had for ~$20 per gigabyte, but flash SSDs can be had for as little as $2 per gigabyte. That&#8217;s one tenth the cost!</p>
<p><strong>Storage capacity doesn&#8217;t matter (much).</strong> Filesystem journals should be limited in capacity due to the fact that for every gigabyte of journal, an equal amount of system RAM will be required to buffer the journal.  A full, 1GB journal will require 1GB of system RAM. Unless you have an immense amount of system RAM, the total journal capacity for all filesystems will likely be limited to just a few gigabytes.</p>
<p><strong>Limited write cycles don&#8217;t matter (much).</strong> If a flash cell will burn out after 10,000 writes, and a filesystem journal does nothing but write, how long will the journal last? A 60GB flash SSD with 10,000 write cycles could (theoretically) write 600,000GB before burning out the drive. If journaling both data and metadata, that&#8217;s a lot of data. If only journaling metadata, that&#8217;s a nearly infinite amount of data.</p>
<p><strong>Speed matters.</strong> Typical consumer grade flash SSDs can sustain writes in excess of 100MB/s, but usually not more than about 200MB/s. In order to get high performance, one would have to employ RAID level 0 to get above those numbers. When you increase the number of drives, you increase the risk of failure. At some point, having a RAID of SSDs that journal your RAID of disks gets silly, and risky. Minimizing the risk means either RAID 10 or RAID 5/6. Both involve a lot of cost and system overhead. There are PCI Express solutions like the z-drive from OCZ which offer very high performance (900MB/s+), but at a very high cost. If you&#8217;re going to spend that kind of money you might as well buy a hardware RAID card.</p>
<p><strong>The alternative to flash.</strong> The solution investigated by this series is the ACARD ANS-9010 RAM based SSD. It offers dual SATA II connections for RAID 0 performance, a battery to maintain data during a power failure, and a Compact Flash socket to support backing up to flash. Write performance is on the order of 300MB/s and can accept up to 8 DDRII DIMMS.</p>
<p><strong>The Math:</strong></p>
<p>OCZ Vertex 60GB x 3: 135MB/s x 3 = 405MB/s, $190 x 3 = $570</p>
<p>ACARD ANS-9010 x 1: 350MB/s, $350 (chassis) + $20 x 4 = $80 ( 4 x 1GB DIMM) + $15 (4GB CF) = $445</p>
<p>The ANS-9010 is a little more than $100 cheaper and can tolerate unlimited writes, but is more than 50MB/s slower.</p>
<h2>The Test Plan</h2>
<ol>
<li>Part II: Assess the performance of the ANS-9010 in single and dual SATA configurations and with varying RAID 0 chunk sizes. Measure baseline RAID 5 performance.</li>
<li>Part III: Measure the performance of journal_data mode for various journal sizes from 256MB through 2GB.</li>
<li>Part IV: Repeat step 2 for journal_ordered.</li>
<li>Part V: Operational Considerations and Conclusions.</li>
</ol>
]]></content:encoded>
			<wfw:commentRss>http://larkin.nuclearwinter.com/blog/index.php/2010/04/raid5-journal-on-ssd/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

