<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Splitting names</title>
	<atom:link href="http://www.onlineaspect.com/2009/08/17/splitting-names/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.onlineaspect.com/2009/08/17/splitting-names/</link>
	<description>a blog about building stuff on the web</description>
	<lastBuildDate>Mon, 09 Jan 2012 18:55:04 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
	<item>
		<title>By: Josh Fraser</title>
		<link>http://www.onlineaspect.com/2009/08/17/splitting-names/comment-page-1/#comment-1226</link>
		<dc:creator>Josh Fraser</dc:creator>
		<pubDate>Fri, 10 Sep 2010 22:25:08 +0000</pubDate>
		<guid isPermaLink="false">http://www.onlineaspect.com/?p=431#comment-1226</guid>
		<description>Nice!  Thanks for sharing.  One thing I&#039;ve realized is that proper parsing varies a lot on the context of where the names came from and how they are being used.  For example, in my use-case, anything in parenthesis should be ignored -- in yours, it&#039;s a nickname.  I guess, ideally we should write a class where people can change that behavior w/ a single variable to customize it for their own purposes.  Let me know if you&#039;re interested.  Perhaps we could combine forces to see what we could come up with. </description>
		<content:encoded><![CDATA[<p>Nice!  Thanks for sharing.  One thing I&#039;ve realized is that proper parsing varies a lot on the context of where the names came from and how they are being used.  For example, in my use-case, anything in parenthesis should be ignored &#8212; in yours, it&#039;s a nickname.  I guess, ideally we should write a class where people can change that behavior w/ a single variable to customize it for their own purposes.  Let me know if you&#039;re interested.  Perhaps we could combine forces to see what we could come up with.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason Priem</title>
		<link>http://www.onlineaspect.com/2009/08/17/splitting-names/comment-page-1/#comment-1223</link>
		<dc:creator>Jason Priem</dc:creator>
		<pubDate>Tue, 07 Sep 2010 18:44:07 +0000</pubDate>
		<guid isPermaLink="false">http://www.onlineaspect.com/?p=431#comment-1223</guid>
		<description>Hey Josh, nice work.  I just finished writing something similar, along with a test suite of names.  It does pretty much what yours does, although it&#039;s object-oriented and captures nicknames and first-initials separately. Here are a few names your lib misses that &lt;a href=&quot;http://jasonpriem.com/human-name-parse/&quot; rel=&quot;nofollow&quot;&gt;HumanNameParser.php&lt;/a&gt; parses correctly: 
George (gob) bluth // gets &quot;gob&quot; as a nickname (not part of first name) 
smith, john // reverses around the comma 
carlos garcia y luz // gets &quot;garcia y luz&quot; as a last name 
e.e. cummings // keeps original capitalization 
 
I like your idea of matching all middle names as part of the first name; that way you never miss names like &#039;Billie Jo&#039;.  However, I&#039;d argue that this is less of a problem than always treating middle names as parts of first names, since it&#039;s far more common to have a single-word first name. My lib is at GitHub, and of course it&#039;s open, so take or fork anything you like. 
 </description>
		<content:encoded><![CDATA[<p>Hey Josh, nice work.  I just finished writing something similar, along with a test suite of names.  It does pretty much what yours does, although it&#039;s object-oriented and captures nicknames and first-initials separately. Here are a few names your lib misses that <a href="http://jasonpriem.com/human-name-parse/" rel="nofollow">HumanNameParser.php</a> parses correctly:<br />
George (gob) bluth // gets &quot;gob&quot; as a nickname (not part of first name)<br />
smith, john // reverses around the comma<br />
carlos garcia y luz // gets &quot;garcia y luz&quot; as a last name<br />
e.e. cummings // keeps original capitalization </p>
<p>I like your idea of matching all middle names as part of the first name; that way you never miss names like &#039;Billie Jo&#039;.  However, I&#039;d argue that this is less of a problem than always treating middle names as parts of first names, since it&#039;s far more common to have a single-word first name. My lib is at GitHub, and of course it&#039;s open, so take or fork anything you like.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Josh Fraser</title>
		<link>http://www.onlineaspect.com/2009/08/17/splitting-names/comment-page-1/#comment-640</link>
		<dc:creator>Josh Fraser</dc:creator>
		<pubDate>Tue, 18 Aug 2009 20:06:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.onlineaspect.com/?p=431#comment-640</guid>
		<description>Glad you found this useful and good catch on the parentheses issue.  Perhaps you can merge in your code for handling last name, first name?  That&#039;s definitely a common use-case that I missed.  I&#039;ve set up Google Code and given you commit access at &lt;a href=&quot;http://code.google.com/p/php-name-parser/.&quot; target=&quot;_blank&quot;&gt;http://code.google.com/p/php-name-parser/.&lt;/a&gt; 
 
 
 
 </description>
		<content:encoded><![CDATA[<p>Glad you found this useful and good catch on the parentheses issue.  Perhaps you can merge in your code for handling last name, first name?  That&#039;s definitely a common use-case that I missed.  I&#039;ve set up Google Code and given you commit access at <a href="http://code.google.com/p/php-name-parser/." target="_blank">http://code.google.com/p/php-name-parser/.</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Pete Warden</title>
		<link>http://www.onlineaspect.com/2009/08/17/splitting-names/comment-page-1/#comment-638</link>
		<dc:creator>Pete Warden</dc:creator>
		<pubDate>Tue, 18 Aug 2009 15:16:39 +0000</pubDate>
		<guid isPermaLink="false">http://www.onlineaspect.com/?p=431#comment-638</guid>
		<description>That rocks, thanks Josh! I have very similar problems, but nowhere near so comprehensive a solution. 
 
In my case I&#039;m trying to canonicalize display names from email address headers. One common case is that the name will appear as &quot;Warden, Pete&quot; - I try to detect and flip those, but I&#039;m guessing that&#039;s not an issue for your data set? Also there&#039;s sometimes multiple words inside the parentheses, eg &quot;Pete Warden (Mailana Inc)&quot;, but from inspection it looks like you&#039;re only catching the first word with your parentheses check? 
 
I&#039;d love to see this on Google Code, there&#039;s some other functionality I&#039;m working on that might fit here, like gender guessing from first names: 
&lt;a href=&quot;http://search.cpan.org/~edaly/Text-GenderFromName-0.32/GenderFromName.pm&quot; target=&quot;_blank&quot;&gt;http://search.cpan.org/~edaly/Text-GenderFromName...&lt;/a&gt; </description>
		<content:encoded><![CDATA[<p>That rocks, thanks Josh! I have very similar problems, but nowhere near so comprehensive a solution. </p>
<p>In my case I&#039;m trying to canonicalize display names from email address headers. One common case is that the name will appear as &quot;Warden, Pete&quot; &#8211; I try to detect and flip those, but I&#039;m guessing that&#039;s not an issue for your data set? Also there&#039;s sometimes multiple words inside the parentheses, eg &quot;Pete Warden (Mailana Inc)&quot;, but from inspection it looks like you&#039;re only catching the first word with your parentheses check? </p>
<p>I&#039;d love to see this on Google Code, there&#039;s some other functionality I&#039;m working on that might fit here, like gender guessing from first names:<br />
<a href="http://search.cpan.org/~edaly/Text-GenderFromName-0.32/GenderFromName.pm" target="_blank">http://search.cpan.org/~edaly/Text-GenderFromName&#8230;</a></p>
]]></content:encoded>
	</item>
</channel>
</rss>

