9.1.08 Javascript Natural Sort Algorithm With Unicode Support

PROBLEM
Javascript is not capable of sorting Array’s that contain strings and numeric values as human readable.

SOLUTION Added in Version 0.2
Here’s a very simple “Natural Sort” implementation in 11 lines of Javascript code that shares the same core “chunking” concept as displayed in many implementations that appears to have been first adopted by Dave Koelle. The earliest concept of this type of parsing I’ve seen/worked with was String Tokenization aka Lexical Analysis.

DATE SORTING SUPPORT
Finally added the ability to sort off Date objects based off the .getTime() unix epoch timestamp comparisons. This will only work against valuation of fields that in their entirety can instantiate a valid Date object via the Date constructor. This means that any valid input string that can create a date via the Date constructor, i.e. new Date('10/10/2005') or new Date('Tue Sep 09 2008 20:32:28 GMT-0700 (Pacific Daylight Time)'), can work. The value cannot contain any other character that might throw the Date constructor off, i.e. a ‘10/10/2005 original’ will not work.

UNICODE SUPPORT
This sort algorithm works against UNICODE characters/scripts as well because it relies on the browser’s built in string comparison operators. I have a working demo of a this in action sorting a column in my http://www.overset.com/2008/08/30/animated-sortable-datagrid-jquery-plugin-jtps/ post.

I wanted to build a function that split each string to compare for sorting into an array in the appropriate order based off blocks of strings and blocks of numbers similar to how the String.split() functionality works. This is the core of the “chunking” so that you can do the sort comparison off portions of the string and rely on the browser’s built in < and > operators. Another goal was NOT to rely on the String.charCodeAt() and compare ASCII character values on a per-character basis. This seemed like complete overkill considering once each “chunk” can be properly sorted off the browser’s built-in comparison operators properly.

The approach I used was to treat everything as a string, delimitate the numeric and alpha portions of each string by the null char(0) and then split the string into an array off the null char(0) delimiter. This null char(0) will almost never appear in a string - and if it did we’d not really be able to do a comparison operation off of it.

Here’s the simple naturalSort function to be used to point the Array.sort() method to:

SPEED
Upon initial testing this function appears to be 60% faster than Brian Huisman’s Javascript implementation of Dave Koelle’s Alphanum algorithm when sorting against string fields of around 20 characters on average and 500 entries in the array. It appears to be the same speed when sorting against purely numeric integer-based array values at around 500 entries.

TESTED BROWSERS

An example of sorted arrays using this functionality:

['10/12/2008','10/11/2008','10/11/2007','10/12/2007'].sort(naturalSort);
['10/11/2007', '10/12/2007', '10/11/2008', '10/12/2008']

[0,100,7,1,5,"10aay","10abZ","10aby","asd","Asd","Asc"].sort(naturalSort)
[0, 1, 5, 7, "10aay", "10aby", "10abZ", 100, "Asc", "asd", "Asd"]

["$10002.00","$10001.00",10000].sort(naturalSort)
[10000, "$10001.00", "$10002.00"]

["1 Title - The Big Lebowski","1 Title - Gattaca","1 Title - Last Picture Show"].sort(naturalSort)
["1 Title - Gattaca", "1 Title - Last Picture Show", "1 Title - The Big Lebowski"]

[99.99,99.9,1.123,1.1].sort(naturalSort)
[1.1, 1.123, 99.9, 99.99]

NOTE
This function does case-insensitive sorting. This can easily be disabled by removing the .toLowerCase() calls on the x and y variables.


12 Comments
George Hindle @ 9.5.08 10am

Just a spelling mistake. Other than that the code looks good, any speed improvement has to be a good thing, even with chrome and v8 doing a lot of the work for us.

“The approach I used was to treat everything as a string, [b]deliminate[/b] the numeric and alpha portions …” Is this meant to be “eliminate”?

#436 311chars
Jim Palmer @ 9.6.08 4pm

@George

Thanks for catching this! I really meant delimitate which is an odd verb form of the noun “delimiter”. This was how I wanted to describe the “chunking” concept from Dave Koelle.

I’m also in the processes of adding intelligent sorting of dates as they can be converted from strings and testing for the fastest form of sorting. The idea, as you point out, is to rely on the browser - i.e. chrome/v8 - doing as much work for us as possible.

#440 451chars
Philippe Leone @ 9.17.08 9pm

Hello I have some questions about making terrain could you email me back.

Thank you,

Philippe

#463 99chars
claudiohs @ 10.24.08 11am

Very nice work.

#501 15chars
Mike @ 4.4.09 2pm

Semantics, really, but I think the more common verb is delimit. When I looked it up, it did give delimitate as an alternate, but I have never seen that word before, while I’ve seen delimit numerous times. Delimitate _seems_ like one of those words people make up (i.e. they say, is that even a word?), though in this case, it’s not. ;-)

Thanks for the function. It’s really why I’m here.

#590 390chars
Mike @ 4.4.09 3pm

Too bad I can’t edit my previous comment. I didn’t see the function until just now (dummy me forgot noscript was on), so I looked in the source.

You can shorten the regex a bit and put it in a var, since you use it twice, therefore saving the overhead of compiling an extra new RegExp object [ / / syntax is just shorthand for new RegExp("","") ].

re = /(-?[0-9.]+)/g,
xN = x.replace(re, nC + ‘$1′ +nC).split(nC),
yN = y.replace(re, nC + ‘$1′ + nC).split(nC),

- is not a metacharacter outside of a character class, therefore does not need to be in a character class.

? is the same as {0,1}

+ is the same as {1,}

#591 628chars
Mike @ 4.4.09 3pm

I just thought of something else.

Declare the re variable outside of the function (global). Then you won’t need to compile and destroy a new RegExp every time the function is called. I think this will speed it up quite a bit.

#592 228chars
Mike @ 4.4.09 4pm

Some more optimizations:

var re=/(-?[0-9.]+)/g;

function naturalSort(a,b){
// setup temp-scope variables for comparison evauluation
var x=a.toString().toLowerCase()||”,y=b.toString().toLowerCase()||”,
nC=String.fromCharCode(0),
xN=x.replace(re,nC+’$1′+nC).split(nC),
yN=y.replace(re, nC + ‘$1′ + nC).split(nC),
xD=(new Date(x)).getTime(),
if(xD)yD=(new Date(y)).getTime(); //no point in getting yD if xD is not a date

// natural sorting of dates
if(xD&&yD){ // check for valid dates only once
if(xDyD)return 1;
}

// natural sorting through split numeric strings and default strings
var cLoc,numS=Math.max(xN.length,yN.length);
for(cLoc=0;cLoc<numS;cLoc++){
// instead of performing these next 6 operations in the if
// and the same 6 operations in the else if, just do them once
// so we can reuse results instead of computing twice
xNcL=xN[cLoc];
yNcL=yN[cLoc];
FxNcL=parseFloat(xNcL);
FyNcL=parseFloat(yNcL);
oFxNcL=FxNcL||xNcL;
oFyNcL=FyNcL||yNcL;

if(oFxNcLoFyNcL)return 1;
}
return 0;
}

#593 1116chars
Mike @ 4.4.09 4pm

see code in website
http://mgrier.com/code/natsort.optimized.txt

#594 65chars
air jordan shoes @ 4.23.09 7pm

Very nice work

#604 14chars
Lloyd A @ 5.6.09 7am

["car.mov","organic2.0001.sgi","my.string_41299.tif","01alpha.sgi"].sort(naturalSort)

does not sort properly, this is the result:

["car.mov","01alpha.sgi","organic2.0001.sgi","my.string_41299.tif"]

#609 203chars
nike dunks @ 6.6.09 6pm

It is a wonderful article,i like it,thank you very much!

#625 56chars



monetary contributions are always accepted $
0.362s