Tuesday, December 4, 2007

More Regex

When the problem of validating a date came up, it was regular expressions to the rescue.
The following will validate a date in the form of mm/dd/yyyy and could be modified for other forms:
^((0[13578]|1[02])\/([012]\d|3[01])\/(19|20)\d{2}|
(0[468]|11)\/([012]\d|30)\/(19|20)\d{2}|
02\/([01]\d|2[1-8])\/(19|20)\d{2}|
02\/29\/(2[048]00|(19|20)(0[48]|[2468][048]|[13579][26])))$

(note: join on a single line)

If you end up using this, keep in mind that it has to be mm/dd/yyyy that means
01/06/0045 will work
2/7/1932 will not

what does this check
checks that a month 01-12 is entered
for months 1,3,5,7,8,10, and 12 it checks that the day is 01-31
for months 4,6,9, and 11 it checks that the day is 01-30
for month 2 it checks that the day is 01-28
for the above it checks that the year is 1900-2099
if the month is 2 and the day is 29 it checks that the year is
2000, 2400, 2800, or divisible by 4 but not 100

Wednesday, November 14, 2007

The Joys of Regex

I had written a service as part of an application involving an old mainframe system. The mainframe would send me text that I was to turn into a pdf and save to database.

Shortly after the app went live, I started getting several transaction failures on my end. My logging displayed "FAILURE: Can't show character with Unicode value U+FFFD".

Since the data is stored in a varbinary field, I'm sure this was being generated when I was writing the pdf.

After talking to the mainframe guys, I found that they would sometimes send me strange unicode characters such as �.

I asked fro whichever would be easier, characters denied or characters allowed, and was sent:
~!@#$%^&*()_+`1234567890-=QWERTYUIOP{}|ASDFGHJKL:"ZXCVBNM<>?qwertyuiop[
]\asdfghjkl;'zxcvbnm,./

Well this looked like a job for Regular Expressions.
Now I am no regex expert having only used it for simple pattern matching, I began working at this in a similar manner. Match all of those characters and concatenate the matching strings with spaces.

After hammering away I came up with:

public static string SanatizeInput(string file)
{
Regex reg = new Regex("[\\r\\n\\sA-Za-z0-9~!@#$%^&*()_+`\\-={} |:\"<>?[\\]\\;',\\./]*", RegexOptions.Compiled);
Regex line = new Regex(".*");
Match lineMatch = line.Match(file);
string output = string.Empty;
while (lineMatch.Success)
{
string cleanLine = string.Empty;
Match m = reg.Match(lineMatch.Value);
while (m.Success)
{
foreach (Capture capture in m.Captures)
{
if(!string.IsNullOrEmpty(capture.Value))
cleanLine += capture.Value + " ";
}
m = m.NextMatch();
}
if(!string.IsNullOrEmpty(cleanLine))
output += cleanLine + "\n";
lineMatch = lineMatch.NextMatch();
}
return output;
}


Which grabbed each line, then would loop through all the matches and add them together and insert in newlines.
In the end it worked.

But I was unhappy. It was far too verbose, and just seemed wrong.
My first hint was using ^ to negate a collection of characters.
My second was using the regex replace as opposed to concatenating.

The better solution:
public static string SanatizeInput(string file){
return new Regex("[^A-Za-z0-9~!@#$%\\^&*()_+`\\-={} |:\"<>?[\\]\\;',./\\s]").Replace(file, " ");
}

Wednesday, July 11, 2007

Automate your build

Doing things manually sucks.
Thats why we develop software, because people are sick of doing things the hard way.
Seeing as our job is automation, it amazes me that so many people are still developing software the hard way.

What am I talking about? Builds.

Right now at my office, whenever a change is made and we need to update our test sever, we must produce a build. And right now, we're using the VS2005 IDE, cleaning the solution, building the solution, and publishing the website by hand.

Furthermore, once that is complete, we have to go in and delete certain files from our published site, and then copy it over.

There are several markup languages for a task like this with the main ones being MSBuild and Nant. However if you want to skip all that, the process can be easily automated with a batch file.


cd\
cd "program files\microsoft visual studio 8\common7\ide\"
devenv /clean release "C:\Dev\MyProject\Project.sln"
devenv /build release "C:\Dev\MyProject\Project.sln"
rd "C:\Dev\Precompiled Webs\MyProject" /s /q
cd\
cd "WINNT\Microsoft.NET\Framework\v2.0.50727"
aspnet_compiler -v /Website -p "C:\Dev\MyProject\Website" "C:\Dev\Precompiled Webs\MyProject"
cd\
cd "dev\precompiled webs\MyProject"
del *.log /f /s /q
del *.pdb /f /s /q


lets look at whats going on


cd\
cd "program files\microsoft visual studio 8\common7\ide\"
devenv /clean release "C:\Dev\MyProject\Project.sln"
devenv /build release "C:\Dev\MyProject\Project.sln"


this uses the devenv command line to clean and build your solution without using the IDE


rd "C:\Dev\Precompiled Webs\MyProject"


This deletes the folder where the compiled site will need to go


cd\
cd "WINNT\Microsoft.NET\Framework\v2.0.50727"
aspnet_compiler -v /Website -p "C:\Dev\MyProject\Website" "C:\Dev\Precompiled Webs\MyProject"


Here we use the aspnet_compiler utility to compile the site.
-v is the virtual path (the name of your website within the solution)
-p is the path to your website
and the last path is where the compiled site will end up


cd\
cd "dev\precompiled webs\MyProject"
del *.log /f /s /q
del *.pdb /f /s /q
del web.config /f /q


Lastly, I want to remove any pdb and log files, as well as the web.config because our test server has a different config file from the development machine

Here are a few links that may help you with your batch file:
aspnet_compiler info
devenv command line info
MS-DOS commands