Skip to content
May 13, 2013 / jphoward

Create a random sample using PowerShell

Very often you will need a random sample of a file. This is really handy to quickly prototype script, before you run it on a really large file. Or, if you are just doing some statistical analysis, it is very likely that you won’t even need to run it on the full file at all. Therefore, I generally create 10% and 1% samples of any large files that I am working with correctly. When using Windows I find this easiest to do using PowerShell. Here is the command that I use (replace the ’10’ with ‘100’ to get a 1% sample):

cat file.txt | ?{$_.ReadCount -eq 1 -or (Get-Random -max 10) -eq 1} > file_sample10.txt
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: