Remove diacritics

Hello! I am trying to create a PS script that does the following:

runs in ca certain directory recursively, checks the name of each folder and file and if any name contains any diacritics, changes the diacritics into the ”normal” letter. In each filename thare can be multiple occurences (or none) of diacritics.

I tried the following to remove only the ”ă”, but it does not work. Strangely, everything else in the script works.

$path = "Mypath"

# Get all the files and folders in the directory
$files = Get-ChildItem -Path $path -Recurse -File
$folders = Get-ChildItem -Path $path -Recurse -Directory

# Loop through all the files and folders
foreach ($item in $files + $folders) {
  # Get the current name of the file or folder
  $name = $item.Name

  # Replace ă with a
  $newName = $name -replace 'ă', 'a' 

  # Add a suffix to the new name to make it different from the old name
  $newName = $newName + "_new"

  # Rename the file or folder
  Rename-Item $item.FullName $newName
}

Octavian,
Welcome to the forum. :wave:t3:

This task is potentially more complex than you might think it is. Especially when you approach it like you show in your code snippet …

If you have a folder with a diacritics in it and you rename it and then you have a file in that folder and you want to rename this as well the path of the folder has changed and the renaming command will fail … :point_up_2:t3: :man_shrugging:t3: :wink:

I’d recommand to treat folders and files in separate loops to avoid such collisions.

Regardless of that … for the replace part … for the vast majority of the cases you’re not the very first one with a given task or issue. Before you try to come up with a clumsy solution yourself you should search online for others already solved this particular issue. For this particular task there are probably already hundrets if not thousands solutions out there. You just have to pick one you like.
For example:

So no need to re-invent the wheel again and again.

Hello Olaf, thank you for the feedback and suggestions.
The point is that I am not a great coder, I used (slightly adapted) what I could find online. I went again and check and could not find a working solution for my case (I went through at least 20 examples).
I know the code above has an issue with files + folders combination. however, I presented my current stating point. :grinning:
If I manage to get it working for files, I can make it work for folders as well…

Have you tried my suggestion?

Have you tried the function from the link I posted?

Yes, I tried that one and, following the advice, I searched also for similar solutions, but they dont work (out of the box - with minimum code change) and I don’t have the knowledge to fix the code.
The code you suggested (the last example) runs without errors but does not modify anything in the filenames.

Show your code! What happens? Or what does not happen but you expect it to happen? Please keep in mind we cannot see your screen or read your mind! :wink:

You are right, sorry.
Here is the code. From the example I have only changed the path to the files

# Modify the function to make it compatible with the pipeline
function Remove-StringLatinCharacters
{
    PARAM (
        [parameter(ValueFromPipeline = $true)]
        [string]$String
    )
    PROCESS
    {
        [Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($String))
    }
}


# Exemple with multiple Text files located in the directory c:\test\
Foreach ($file in (Get-ChildItem C:\Users\OctavianAldescu\Downloads\TEST\*.txt))
{
    # Get the content of the current file and remove the diacritics
    $NewContent = Get-content $file | Remove-StringLatinCharacters
    
    # Overwrite the current file with the new content
    $NewContent | Set-Content $file
}

I think the problem (or at least one of them) is comming from the way diacritics are read. Could it be?
If I run the code bellow, I get an error, but in the text, diacritics are different.

# Replace "C:\Users\OctavianAldescu\Downloads\TEST" with the actual path to the directory you want to process
$folderPath = "C:\Users\OctavianAldescu\Downloads\TEST"

# Function to remove diacritics from a string
function RemoveDiacritics($text) {
    $diacriticChars = @{
        'ă' = 'a'; 'â' = 'a'; 'î' = 'i'; 'ș' = 's'; 'ț' = 't';
        'Ă' = 'A'; 'Â' = 'A'; 'Î' = 'I'; 'Ș' = 'S'; 'Ț' = 'T';
    }

    $newText = ""
    foreach ($char in $text) {
        if ($diacriticChars.ContainsKey($char)) {
            $newText += $diacriticChars[$char]
        } else {
            $newText += $char
        }
    }

    return $newText
}

# Get a list of files in the directory and its subdirectories
$files = Get-ChildItem $folderPath -File -Recurse

# Loop through files, check for diacritics in filenames, and print filenames
foreach ($file in $files) {
    $hasDiacritics = $file.Name -match '[ăâîșțĂÂÎȘȚ]'
    
    if ($hasDiacritics) {
        Write-Host "Filename with Diacritics: $($file.Name)"
    }
    
    $newFileName = RemoveDiacritics($file.Name)
    
    if ($newFileName -ne $file.Name) {
        $newFullPath = Join-Path -Path $file.DirectoryName -ChildPath $newFileName
        Rename-Item -Path $file.FullName -NewName $newFileName -Force
        Write-Host "Original Filename: $($file.Name)"
        Write-Host "New Filename: $newFileName"
        Write-Host "-----------------------------"
    }
}

and error (Ș shoud have been Ș): 
At C:\Users\OctavianAldescu\Downloads\TEST\s.ps1:8 char:17
+         'Ă' = 'A'; 'Â' = 'A'; 'Î' = 'I'; 'Ș' = 'S'; 'Ț' = 'T';
+                 ~
Missing '=' operator after key in hash literal.
At C:\Users\OctavianAldescu\Downloads\TEST\s.ps1:12 char:12
+     foreach ($char in $text) {
+            ~
Missing '=' operator after key in hash literal.

No. The problem is you’re not listening and you make it harder than necessary. :man_shrugging:t3:

I created some filenames in a test folder imported the function I linked above and ran the following code:

PS C:\_Sample\Test_01> Get-ChildItem -File

    Directory: C:\_Sample\Test_01

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          29.08.2023    13:28              0 StrăngeFileNÂme_01.txt
-a---          29.08.2023    13:28              0 SȚrangêFiléNämè_02.txt
-a---          29.08.2023    13:28              0 StrãngeFileName_03.txt
-a---          29.08.2023    13:28              0 ȘträngeFîleNÄme_04.txt
-a---          29.08.2023    13:28              0 StrangeFileName_05.txt

PS C:\_Sample\Test_01> Get-ChildItem -File |
>> ForEach-Object {
>>     $NewBaseName = Remove-StringDiacritic $_.BaseName
>>     $NewName = $NewBaseName + $_.Extension
>>     Rename-Item -Path $_.FullName -NewName $NewName
>> }
PS C:\_Sample\Test_01> Get-ChildItem  -File

    Directory: C:\_Sample\Test_01

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          29.08.2023    13:28              0 StrangeFileNAme_01.txt
-a---          29.08.2023    13:28              0 STrangeFileName_02.txt
-a---          29.08.2023    13:28              0 StrangeFileName_03.txt
-a---          29.08.2023    13:28              0 StrangeFileNAme_04.txt
-a---          29.08.2023    13:28              0 StrangeFileName_05.txt

PS C:\_Sample\Test_01> 

As you can see the filenames had diacritics before and then they’re gone. :man_shrugging:t3:

Olaf, please believe me, I am trying to be useful :grinning:
But I feel completely overwhelmed.

I am total new to PS scripting (and I have last coded like… 20 years ago).
I think I am missing some points - I will go and read more about it. I dont find how to import the function so that I run it the way you did. I only know saving it as a .ps1
For somebody who knows it might be very obvious, for someone trying to figure out how to do it… its harder.

You just created a function in the code you posted. :man_shrugging:t3: Instead of that function you created yourself you use the function from the link. That’s all. And the rest of the code is quite basic … don’t you think? :wink:

Olaf, in the code I provided as reference I could not find “Remove-StringDiacritic”. If not too much trouble, can you please put the whole input/output?
Indeed, the code seems pretty straightforward. However, if something does not work, I am not very good at debugging. :grinning:

You just copy the code from the link and run it. Just like you did with your function definition. That’s really it. :man_shrugging:t3:

I ran:

PS C:\Users\OctavianAldescu\Downloads\TEST\Sursa\aaîîșșțțîîăă> function Remove-StringLatinCharacters5
{
    PARAM (
        [parameter(ValueFromPipeline = $true)]
        [string]$String
    )
    PROCESS
    {
        [Text.Encoding]::ASCII.GetString([Text.Encoding]::GetEncoding("Cyrillic").GetBytes($String))
    }
}


# Exemple with multiple Text files located in the directory c:\test\
Foreach ($file in (Get-ChildItem  C:\Users\OctavianAldescu\Downloads\TEST\Sursa\aaîîșșțțîîăă\*.txt))
{
    # Get the content of the current file and remove the diacritics
    $NewContent = Get-content $file | Remove-StringLatinCharacters5
    
    # Overwrite the current file with the new content
    $NewContent | Set-Content  $file
}

PS C:\Users\OctavianAldescu\Downloads\TEST\Sursa\aaîîșșțțîîăă> Get-ChildItem -File -Recurse


    Directory: C:\Users\OctavianAldescu\Downloads\TEST\Sursa\aaîîșșțțîîăă


Mode                 LastWriteTime         Length Name                                                                                       
----                 -------------         ------ ----                                                                                       
-a----         8/29/2023   8:42 PM           1463 b.ps1                                                                                      
-a----         8/16/2023   5:25 PM              0 ăîăăăîîșțșțș.txt                                                                           


I tried different options for the path, same output.

I really don’t understand why you don’t simply follow the steps I showed in my code suggestion above. I used the function Remove-StringDiacritic !!! from the link I posted, ran a Get-ChildItem on a folder with files with diacritics in their names, and in the Foreach-Object loop are three lines of code. Are you really unable to follow these simple steps?
Sorry but I give up. :man_shrugging:t3: