Converting characters

plaza · May 15, 2021, 5:07pm

i need some help with converting Swedish special characters like (ö,ä,å) to English when importing a csv file with employee’s names and roles in it. im new to PS so i dont know if im explaining this right or not…
because the system doesn’t accept non-english characters, my script should convert the characters to english version like this (ö=o; ä=a; å=a)
,is there any special commands for such things?

Olaf · May 15, 2021, 8:39pm

plaza,
welcome tot the forums.

… and welcome to the club.

As far as I know there is no builtin command to convert such charachters. We Germans often have the same issue with our German Umlaute.

But as the amount of special charachters is quite limitted you can use a simple chain of replacements to deal with strings containing special charachters. Here you have an example containing the most common German Umlaute:

$InputString = "Größter Übeltäter der Straße"

$InputString -creplace 'Ö','Oe' -creplace 'Ä','Ae' -creplace 'Ü','Ue' -creplace 'ö','oe' -creplace 'ä','ae' -creplace 'ü','ue' -creplace 'ß','ss'

$InputString.Replace('Ö', 'Oe').Replace('Ä', 'Ae').Replace('Ü', 'Ue').Replace('ö', 'oe').Replace('ä', 'ae').Replace('ü', 'ue').Replace('ß', 'ss')

Both of these commands will return the same string:

'Groesster Uebeltaeter der Strasse'

The “normal” -replace oparetor wouldn’t have been suitable in this case as it does not maintain uppercase charachters.

matt-bloomfield · May 15, 2021, 9:16pm

I PowerShellified this Stack Overflow answer.

$myString = 'Crème Brulée'
$newString = $myString.Normalize('FormD') -replace "\p{M}"
Write-Output $newString

We’re doing the same as described in the Stack Overflow answer. Normalizing the string so that Crème Brulée becomes Cre<accent>me Brule<accent>e' then using replace to remove the accents.

The regex uses \p to specify a character class match. In this case, ‘M’ which is diacritic marks. A full list of the classes can be found here.

Olaf · May 15, 2021, 9:51pm

I tried that for my input strings and it does not look good at all unfortunately. It might remove the special charachter but at least for German it changes the words to incorrect spelled versions.

… and it does not work for “ß” at all.

matt-bloomfield · May 16, 2021, 10:30am

The eszett is a letter so it won’t match the diacritic character class.

After your comment I spent some time last night looking at the substitution rules for characters in various languages and it seems pretty common in Swedish to just drop the diacritical marks if they’re not available. Also seems OK to do this, in Finnish, Romanian, Hungarian, and French.

By contrast it’s pretty much verboten to do this in German. Not only does it mess up the spelling, as you pointed out, but it can also change the pronunciation and meaning.

Olaf · May 16, 2021, 3:23pm

Oh wow … I’m used to count sheeps or drink warm milk if I cannot sleep …

That’s what I actually meant. If it would still be possible to understand a text without umlauts if there is enough context, simply omitting the umlauts in names is a “no go”. In those cases you have to have the correctl substitutions for them.

But - fortunately - as I already said there are only 4 of them and only 3 of 4 have uppercase versions actually. So it’s manageable.

Nevertheless it’s missing in some charachter sets or fonts and will be shown as “□” or something similar or something even stranger.

Topic		Replies	Views
Try to replace german special chars in filenames PowerShell Help	6	415	December 7, 2023
Keep merging of several CSV umlauts PowerShell Help	2	864	June 23, 2021
Codepage issues PowerShell Help	3	230	March 27, 2023
Remove Special Characters from Import-Csv PowerShell Help	3	286	April 12, 2015
Replacing a backslash character with a dot character in string PowerShell Help	3	728	September 30, 2021

Converting characters

Related topics