I am looking for a way to rank values in a csv column in powershell, similar to python in which you would read the column into list

greg00m · July 19, 2021, 5:43pm

Pass the csv column into a powershell list? Or use a powershell module? I am trying to count the frequency of IP addresses in a device log, thanks!

Olaf · July 19, 2021, 5:49pm

Your description is a bit vague. I assume you’ve heard about Import-Csv or haven’t you? If not - please read the complete help topic including the examples to learn how to use it

matt-bloomfield · July 19, 2021, 8:33pm

As well Import-CSV, you might look at Group-Object.

Example, using ConvertFrom-Csv in lieu of Import-Csv:

$data = @'
IP,ComputerName
10.0.0.1,host1
10.0.0.2,host2
10.0.0.1,host3
10.0.0.1,host3
10.0.0.2,host4
10.0.0.3,host5
'@

$csv = $data | ConvertFrom-Csv
$csv | Group-Object IP

Output:

Count Name                      Group
----- ----                      -----
    3 10.0.0.1                  {@{IP=10.0.0.1; ComputerName=host1}, @{IP=10.0.0.1; ComputerName=host3}, @{IP=10.0.0.1; ComputerName=host3}}
    2 10.0.0.2                  {@{IP=10.0.0.2; ComputerName=host2}, @{IP=10.0.0.2; ComputerName=host4}}
    1 10.0.0.3                  {@{IP=10.0.0.3; ComputerName=host5}}

dicey · August 18, 2021, 6:55am

@greg00m - when you are looking for the frequency of n in set X, find the Mode of the set.

The most frequent IP address in the source data (that I shamelessly copied from an 8yo post in another forum) is 10.1.1.2.
The output from the code is a single value that is the most frequent IP address in the set.
I scrambled the order of the arrays to show that sorting is not required.
- Luckily, we are not calculating modal intervals for continuous data.
I used an array of arrays instead of CSV because your text file could be JSON, and JSON and arrays swap well in PowerShell.

Demo Code

using namespace System.Collections;

$logFile = @(
    ('10.1.1.1','sssched','Mon Jul 22 08:39:38','2013','SNBSCH5607J','Starting job Mail-BASE (start job with sched_id 4496778)'),
    ('10.1.1.3','sstptmm','Mon Jul 22 08:45:56','2013','SNBTMM5461O','Node(10.1.1.3) drive(dev007) volser(TP0011) mount rc=(0)'),
    ('10.1.1.2','ssjobhnd','Mon Jul 22 08:39:38','2013 SNBJH_3075J','Syncsort Backup Express version 3.1; Copyright(c) Syncsort'),
    ('10.1.1.2','ssjobhnd','Mon Jul 22 08:39:44','2013 SNBJH_3462J','--- Building file list for node MAIL2 disk C: of job 1374496778'),
    ('10.1.1.2','ssjobhnd','Mon Jul 22 08:40:06','2013 SNBJH_3463J','--- Finished building file list for node MAIL1 disk C: of job 1374496778'),
    ('10.1.1.2','ssjobhnd','Mon Jul 22 08:40:06','2013 SNBJH_3463J','--- Finished building file list for node MAIL2 disk C: of job 1374496778'),
    ('10.1.1.3','sstptmm','Mon Jul 22 08:43:11','2013','SNBTMM5462O','Attempting mount for node(10.1.1.3) drive(dev007) volser(TP0011)'),
    ('10.1.1.3','sstptmm','Mon Jul 22 08:45:29','2013','SNBTMM5462O','Attempting mount for node(10.1.1.3) drive(dev006) volser(TP0010)'),
    ('10.1.1.2','ssjobhnd','Mon Jul 22 08:39:38','2013 SNBJH_3208J','===== New Job Solar (job ID 1374496778 / hex 51ED280A) starts'),
    ('10.1.1.2','ssjobhnd','Mon Jul 22 08:39:41','2013 SNBJH_3462J','--- Building file list for node MAIL1 disk C: of job 1374496778'),
    ('10.1.1.3','sstptmm','Mon Jul 22 08:48:21','2013','SNBTMM5461O','Node(10.1.1.3) drive(dev006) volser(TP0010) mount rc=(0)')
)

# I don't usually compact my classes like this - just saving space.
Class Tally{[string] $IP;[int] $tally; Tally($q1,$q2){$this.IP = $q1; $this.tally=$q2}}

$tallyTable = [ArrayList]::new();
for ($i = 0; $i -lt $logFile.Count; $i++)
{
    $u = [Tally]::new($logFile[$i][0],$logFile.Where({$_[0] -eq $logFile[$i][0]}).Count);
    [void]$tallyTable.Add($u);
}
cls;
$tallyTable.Where({$_.tally -eq ($tallyTable.tally|Select-Object -Unique| Measure-Object -Maximum).Maximum}).IP|Select-Object -Unique;

Results

Remarks

@matt-bloomfield showed the counts of IPs in the set, which speaks to your post. So, I showed how the most frequent IP address could be returned by determining the mode. The output could be modified to show the top 2 most frequent IPs, and so forth. A handy summation if your log files are huge.

I’ve used this method to determine the most frequent (reoccurring) number of members in a collection of AD security groups. I used a similar method to calculate the most frequent database size range across a server farm (that’s that modal interval of discrete data thing again).

Topic		Replies	Views
Advice around statistics from csv PowerShell Help	7	176	May 16, 2024
Can anyone help a newb? PowerShell Help	9	239	May 16, 2024
Is this code acceptable? Is it the "Powershell way"? PowerShell Help	5	287	May 16, 2024
Sorting Table, Results Not As Expected PowerShell Help	1	235	May 16, 2024
Remove duplicates from csv taking too long PowerShell Help	21	393	May 16, 2024

I am looking for a way to rank values in a csv column in powershell, similar to python in which you would read the column into list

Demo Code

Results

Remarks

Related topics