Thursday, March 15, 2007

An implementation of 'du -s *' in Powershell

While reading about Powershell I noticed that there's no out-of-the-box way(that I know of) to do something similar to 'du -s *' in Unix/Linux. In this post I'm going to show how to implement a similar functionality.

The -s option of the du is used to show only a the total size each argument. This is very useful when trying to determine which directories or files are taking too much space. For example:



$ cd /cygdrive/c/Python25/
$ du -bs *
4794012 DLLs
4160038 Doc
13817 LICENSE.txt
13752327 Lib
88573 NEWS.txt
56691 README.txt
547784 Tools
382592 include
948600 libs
24064 python.exe
24576 pythonw.exe
3248808 tcl
4608 w9xpopen.exe


For better results the output could be combined with sort.


$ du -bs * | sort -n
4608 w9xpopen.exe
13817 LICENSE.txt
24064 python.exe
24576 pythonw.exe
56691 README.txt
88573 NEWS.txt
382592 include
547784 Tools
948600 libs
3248808 tcl
4160038 Doc
4794012 DLLs
13752327 Lib



In order to implement this in Powershell (with the little knowledge I have from it) I wanted to implement the following strategy


  1. get all the child elements from the base directory

  2. for each element get the total size

  3. generate a tuple with the element name and the total size



In order to implement #1 a common get-childitem command was used. To implement #2, a combination of foreach-object + get-childitem -r + measure-object was used. And finally for #3 the select-object command was used.

One implementation for this functionality is the following:


function directory-summary($dir=".") {
get-childitem $dir |
% { $f = $_ ;
get-childitem -r $_.FullName |
measure-object -property length -sum |
select @{Name="Name";Expression={$f}},Sum} }



Running this command in the same directory shows:


PS C:\Python25> directory-summary

Name Sum
---- ---
DLLs 4794012
Doc 4160038
include 382592
Lib 13752327
libs 948600
tcl 3248808
Tools 547784
LICENSE.txt 13817
NEWS.txt 88573
python.exe 24064
pythonw.exe 24576
README.txt 56691
w9xpopen.exe 4608


We can also sort the results:


PS C:\Python25> directory-summary | sort sum

Name Sum
---- ---
w9xpopen.exe 4608
LICENSE.txt 13817
python.exe 24064
pythonw.exe 24576
README.txt 56691
NEWS.txt 88573
include 382592
Tools 547784
libs 948600
tcl 3248808
Doc 4160038
DLLs 4794012
Lib 13752327


Maybe there's a shorter/better way to implement this, however it was very interesting and fun to learn a little bit more about Powershell while trying to solve this.

3 comments:

Si said...

Very neat, Luis, thanks for that.

Doesn't Powershell look "at home" with a more "functional" indenting style? I have been using a more conventional C intent/brace pattern, but I think I prefer yours.

Anonymous said...

And... wow, that's slow. I have a C language version that is probably 10x faster.

Anonymous said...

Thanks so much for this useful post. I modified the command slightly to sort descending by time and include size in MB:

gci . | %{$f=$_; gci -r $_.FullName| measure-object -property length -sum | select @{Name="Name"; Expression={$f}} , @{Name="Sum (MB)"; Expression={ "{0:N3}" -f ($_.sum / 1MB) }}, Sum } | sort Sum -desc | format-table -Property Name,"Sum (MB)", Sum -autosize