Compare file contents on remote server

I am having problems with implementing the logic I need to compare file contents in different directory locations. Production will involve mapping a drive as the Target path and comparing the Source to mapped Target.

I need to create a script that will accept two parameters

  • Source Path
  • Target Path
I am to compare the Source Path with the Target, report on objects that are:
  1. Not in Target but in the Source
  2. Not in Source but in the Target
  3. If there are Object Files in both Source and Target, indicate if the contents are the same.
I can accomplish the bulk of what I need with Compare-Object and Get-fileHash, but I am having a logic issue with comparing the Target files with the Source file hashes. The Source and Target objects might be a mapped UNC path run from a single server, but for testing, I have local directories on my machine in two different named paths.

This is what I have so far:

#Get Source and Target Objects and get the differences:

$SourceObjects = Get-ChildItem -Recurse C:\utility\Source\
$Targetobjects = Get-ChildItem -Recurse C:\utility\Destination\
$ObjectResults = Compare-Object -ReferenceObject $SourceObjects -DifferenceObject $TargetObjects -IncludeEqual

Here are my $ObjectResults at this point.

InputObject    SideIndicator
-----------    -------------
SubFolder      ==           
Duplicate.txt  ==           
Hash.txt       ==           
UserCheck.exe  ==           
UserChoice.exe ==           
UserNote.exe   ==           
Duplicate.txt  ==           
Test.txt       <=

Now at this point, with this information, I have criteria 1 and 2 met. I know that the Source file c:\utility\Source\Test.txt is in the Source Directory and not the c:\utility\Destination directory. All the other files are the same as far as file name. I need to now compare these files by hashing each file and comparing the hash values with the corresponding Target File Objects.

Now I create a Hash Variable of my $SourceObjects.

$SourceHashFiles = $SourceObjects | Get-FileHash$SourceHashFiles
$SourceHashFiles
Algorithm Hash Path
--------- ---- ----
SHA256 E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855 C:\utility\Source\Duplicate.txt
SHA256 CA978112CA1BBDCAFAC231B39A23DC4DA786EFF8147C4E72B9807785AFEE48BB C:\utility\Source\Hash.txt
SHA256 9F86D081884C7D659A2FEAA0C55AD015A3BF4F1B2B0B822CD15D6C15B0F00A08 C:\utility\Source\Test.txt
SHA256 E0C9CC517FFC572206816F6C6D2E3733E7952AFDDBA125D6B059E3228CB306D4 C:\utility\Source\UserCheck.exe
SHA256 2F0275B809B545E4CFFD2ED7F5F39E7852018A0C35E13B28D4883EE38E923D89 C:\utility\Source\UserChoice.exe
SHA256 2F8FCA808B67E2B78E55F94087F2A0CE50CD5C395110227C07A554AFE6ABC1E0 C:\utility\Source\UserNote.exe
SHA256 E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855 C:\utility\Source\SubFolder\Duplicate.txt

 

Now the idea is, I would want to iterate through my $ObjectResults array and for each file that is present in Source and Target (where SideIndicator equals “==”) I would then find the Destination Path, Hash that value and compare Hash values to see if the file is the same. Here is the problem though, the $DestinationObject has a different path and the hashed values might have the same values for multiple files in different locations. As I am iterating through my $ObjectResults, I don’t know how to reference my DestinationObject hashes as I am looping through SourceObject Hashes.

 

$SourceHashFiles

Algorithm       Hash                                                                   Path                                                                                                          
---------       ----                                                                   ----                                                                                                          
SHA256          E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855       C:\utility\Source\Duplicate.txt                                                                               
SHA256          CA978112CA1BBDCAFAC231B39A23DC4DA786EFF8147C4E72B9807785AFEE48BB       C:\utility\Source\Hash.txt                                                                                    
SHA256          9F86D081884C7D659A2FEAA0C55AD015A3BF4F1B2B0B822CD15D6C15B0F00A08       C:\utility\Source\Test.txt                                                                                    
SHA256          E0C9CC517FFC572206816F6C6D2E3733E7952AFDDBA125D6B059E3228CB306D4       C:\utility\Source\UserCheck.exe                                                                               
SHA256          2F0275B809B545E4CFFD2ED7F5F39E7852018A0C35E13B28D4883EE38E923D89       C:\utility\Source\UserChoice.exe                                                                              
SHA256          2F8FCA808B67E2B78E55F94087F2A0CE50CD5C395110227C07A554AFE6ABC1E0       C:\utility\Source\UserNote.exe                                                                                
SHA256          E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855       C:\utility\Source\SubFolder\Duplicate.txt   

$TargetHashFiles
Algorithm       Hash                                                                   Path                                                                                                          
---------       ----                                                                   ----                                                                                                          
SHA256          E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855       C:\utility\Destination\Duplicate.txt                                                                          
SHA256          E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855       C:\utility\Destination\Hash.txt                                                                               
SHA256          E0C9CC517FFC572206816F6C6D2E3733E7952AFDDBA125D6B059E3228CB306D4       C:\utility\Destination\UserCheck.exe                                                                          
SHA256          2F0275B809B545E4CFFD2ED7F5F39E7852018A0C35E13B28D4883EE38E923D89       C:\utility\Destination\UserChoice.exe                                                                         
SHA256          2F8FCA808B67E2B78E55F94087F2A0CE50CD5C395110227C07A554AFE6ABC1E0       C:\utility\Destination\UserNote.exe                                                                           
SHA256          E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855       C:\utility\Destination\SubFolder\Duplicate.txt

As you can see above, the ‘Duplicate.txt’ file is the same file, so there could be a hash value that is the same, so I can’t use that as a unique identifier.

When this script is in production, the source or target may be a mapped drive, so the target path will be a UNC path that looks something like \w16-Server\c$\Path\ and the Source may also be a server or even a local path c:\path\to\different\directory.

So I feel I have the cmdlet constructs I need, but I am not sure how to implement the logic to acheive this.

 

 

# Script to compare 2 folders including file content
# Sam Boutros - 2 June 2019 - v0.1

#region Input
$SourceFolder = 'D:\Sandbox\Source'
$TargetFolder = 'D:\Sandbox\Target'
#endregion

#region Error check
if (!(Test-Path $SourceFolder)) { "Source folder '$SourceFolder' does not exist"; break }
if (!(Test-Path $TargetFolder)) { "Target folder '$TargetFolder' does not exist"; break }
#endregion

#region Compare folders
$SourceList = Get-ChildItem $SourceFolder -File
'Source file list:'; $SourceList
$TargetList = Get-ChildItem $TargetFolder -File
'Target file list:'; $TargetList

$myOutput = foreach ($File in $SourceList) {
    if ($Found = $TargetList | where {$_.Name -eq $File.Name}) {
        if ((Get-FileHash $File.FullName).Hash -eq (Get-FileHash $Found.FullName).Hash) {
            [PSCustomObject]@{
                FileName                                = $File.Name
                "In$($SourceFolder | Split-Path -Leaf)" = $true
                "In$($TargetFolder | Split-Path -Leaf)" = $true
                SameContent                             = $true
            }
        } else {
            [PSCustomObject]@{
                FileName                                = $File.Name
                "In$($SourceFolder | Split-Path -Leaf)" = $true
                "In$($TargetFolder | Split-Path -Leaf)" = $true
                SameContent                             = $false
            }
        }
    } else {
        [PSCustomObject]@{
            FileName                                = $File.Name
            "In$($SourceFolder | Split-Path -Leaf)" = $true
            "In$($TargetFolder | Split-Path -Leaf)" = $false
            SameContent                             = $null
        }
    }
}

$myOutput += foreach ($File in ($TargetList | where { $_.Name -notin $SourceList.Name })) {
    if ($Found = $SourceList | where {$_.Name -eq $File.Name}) {
        if ((Get-FileHash $File.FullName).Hash -eq (Get-FileHash $Found.FullName).Hash) {
            [PSCustomObject]@{
                FileName                                = $File.Name
                "In$($TargetFolder | Split-Path -Leaf)" = $true
                "In$($SourceFolder | Split-Path -Leaf)" = $true
                SameContent                             = $true
            }
        } else {
            [PSCustomObject]@{
                FileName                                = $File.Name
                "In$($TargetFolder | Split-Path -Leaf)" = $true
                "In$($SourceFolder | Split-Path -Leaf)" = $true
                SameContent                             = $false
            }
        }
    } else {
        [PSCustomObject]@{
            FileName                                = $File.Name
            "In$($TargetFolder | Split-Path -Leaf)" = $true
            "In$($SourceFolder | Split-Path -Leaf)" = $false
            SameContent                             = $null
        }
    }
}

'Comparison:'
$myOutput | FT -a 
#endregion

Sample output:

Source file list:


    Directory: D:\Sandbox\Source


Mode                LastWriteTime         Length Name                                                                                                                            
----                -------------         ------ ----                                                                                                                            
-a----         8/3/2016   7:54 PM           8343 Assign-O365License.ps1                                                                                                          
-a----         6/2/2019   6:04 PM           3572 Connect-CiscoVPN.ps1                                                                                                            
-a----         6/2/2019   6:31 PM           1365 SS-Monitor-test.ps1                                                                                                             
Target file list:


    Directory: D:\Sandbox\Target


Mode                LastWriteTime         Length Name                                                                                                                            
----                -------------         ------ ----                                                                                                                            
-a----        12/8/2014   4:28 PM           3699 Add-Admin.ps1                                                                                                                   
-a----         8/3/2016   7:54 PM           8343 Assign-O365License.ps1                                                                                                          
-a----        9/17/2016   2:01 PM           1360 SS-Monitor-test.ps1                                                                                                             
Comparison:



FileName               InSource InTarget SameContent
--------               -------- -------- -----------
Assign-O365License.ps1     True     True        True
Connect-CiscoVPN.ps1       True    False            
SS-Monitor-test.ps1        True     True       False
Add-Admin.ps1             False     True            

I believe this would not give accurate results if the same file name was in different folders. The ‘Duplicate’ file, which is in two locations, would not work if searching for the file name.

Line 21 of the code above

$myOutput = foreach ($File in $SourceList) {
if ($Found = $TargetList | where {$_.Name -eq $File.Name}) {
if ((Get-FileHash $File.FullName).Hash -eq (Get-FileHash $Found.FullName).Hash) {
[PSCustomObject]@{
FileName = $File.Name
"In$($SourceFolder | Split-Path -Leaf)" = $true
"In$($TargetFolder | Split-Path -Leaf)" = $true
SameContent = $true
}

I am referring here to the ‘Duplicate’ example.

 

 

C:\Users\bclanton\Google Drive\Code\Projects\CompareTPTS\Target\Duplicate.txt                                 
C:\Users\bclanton\Google Drive\Code\Projects\CompareTPTS\Target\Hash.txt                                      
C:\Users\bclanton\Google Drive\Code\Projects\CompareTPTS\Target\testdoc.txt                                   
C:\Users\bclanton\Google Drive\Code\Projects\CompareTPTS\Target\UserCheck.exe                                 
C:\Users\bclanton\Google Drive\Code\Projects\CompareTPTS\Target\UserChoice.exe                                
C:\Users\bclanton\Google Drive\Code\Projects\CompareTPTS\Target\UserNote.exe                                  
C:\Users\bclanton\Google Drive\Code\Projects\CompareTPTS\Target\SubFolder\Duplicate.txt

 

So, this is designed to compare files in 2 folders. Not to recursively go into subfolders, or compare more than 2 folders.
Please indicate if either of these conditions is required.

If two files hash the same then they’re the same file. Is finding the same file with different names spread around your network a situation that you anticipate running into frequently? It seems like that would be an error condition that you should handle specifically (unless they’re backup files, in which case you should probably write in a section that can recognize backup files and either indicate their presence or ignore them).

If finding the same file with the same name in multiple places in your directory tree is an issue, you’ll need to make your output indicators more complex to show that (maybe show a count rather than just a symbol).

Logically:
[pre]
(get source)
|
(scan target)
|
(file name match?) —NO—> (hash match?) —NO—> [OK: different files]
| |
YES YES —> (backup file?) —NO—> [Error: resolve file names]
| |
(hash match?) —NO—> [update old file] YES —> [OK: backup present/ignore]
|
YES
|
(multiple matches?) —NO—> [OK: same file]
|
YES
|
[OK: file found in (dir1, dir2, …)]
[/pre]

Yes, it is a full recursion of multiple folders. My original example of what I was doing in the original posted showed I was doing a full ‘recursive’ comparison between two folders.

$SourceObjects = Get-ChildItem -Recurse C:\utility\Source\
$Targetobjects = Get-ChildItem -Recurse C:\utility\Destination\
$ObjectResults = Compare-Object -ReferenceObject $SourceObjects -DifferenceObject $TargetObjects -IncludeEqual