May 13, 2013

Create a random sample using PowerShell

Very often you will need a random sample of a file. This is really handy to quickly prototype script, before you run it on a really large file. Or, if you are just doing some statistical analysis, it is very likely that you won’t even need to run it on the full file at all. Therefore, I generally create 10% and 1% samples of any large files that I am working with correctly. When using Windows I find this easiest to do using PowerShell. Here is the command that I use (replace the ’10’ with ‘100’ to get a 1% sample):

cat file.txt | ?{$_.ReadCount -eq 1 -or (Get-Random -max 10) -eq 1} > file_sample10.txt

