Converting characters

i need some help with converting Swedish special characters like (ö,ä,å) to English when importing a csv file with employee’s names and roles in it. im new to PS so i dont know if im explaining this right or not…
because the system doesn’t accept non-english characters, my script should convert the characters to english version like this (ö=o; ä=a; å=a)
,is there any special commands for such things?

plaza,
welcome tot the forums.

… and welcome to the club. :wink:

As far as I know there is no builtin command to convert such charachters. We Germans often have the same issue with our German Umlaute.

But as the amount of special charachters is quite limitted you can use a simple chain of replacements to deal with strings containing special charachters. Here you have an example containing the most common German Umlaute:

$InputString = "Größter Übeltäter der Straße"

$InputString -creplace 'Ö','Oe' -creplace 'Ä','Ae' -creplace 'Ü','Ue' -creplace 'ö','oe' -creplace 'ä','ae' -creplace 'ü','ue' -creplace 'ß','ss'

$InputString.Replace('Ö', 'Oe').Replace('Ä', 'Ae').Replace('Ü', 'Ue').Replace('ö', 'oe').Replace('ä', 'ae').Replace('ü', 'ue').Replace('ß', 'ss')

Both of these commands will return the same string:

'Groesster Uebeltaeter der Strasse'

The “normal-replace oparetor wouldn’t have been suitable in this case as it does not maintain uppercase charachters.

1 Like

I PowerShellified this Stack Overflow answer.

$myString = 'Crème Brulée'
$newString = $myString.Normalize('FormD') -replace "\p{M}"
Write-Output $newString

We’re doing the same as described in the Stack Overflow answer. Normalizing the string so that Crème Brulée becomes Cre<accent>me Brule<accent>e' then using replace to remove the accents.

The regex uses \p to specify a character class match. In this case, ‘M’ which is diacritic marks. A full list of the classes can be found here.

I tried that for my input strings and it does not look good at all unfortunately. It might remove the special charachter but at least for German it changes the words to incorrect spelled versions. :cry:

… and it does not work for “ß” at all. :thinking:

The eszett is a letter so it won’t match the diacritic character class.

After your comment I spent some time last night looking at the substitution rules for characters in various languages and it seems pretty common in Swedish to just drop the diacritical marks if they’re not available. Also seems OK to do this, in Finnish, Romanian, Hungarian, and French.

By contrast it’s pretty much verboten to do this in German. Not only does it mess up the spelling, as you pointed out, but it can also change the pronunciation and meaning.

Oh wow … I’m used to count sheeps or drink warm milk if I cannot sleep … :flushed: :crazy_face: :laughing: :laughing:

That’s what I actually meant. If it would still be possible to understand a text without umlauts if there is enough context, simply omitting the umlauts in names is a “no go”. In those cases you have to have the correctl substitutions for them.

But - fortunately - as I already said there are only 4 of them and only 3 of 4 have uppercase versions actually. So it’s manageable. :wink:

Nevertheless it’s missing in some charachter sets or fonts and will be shown as “□” or something similar or something even stranger. :flushed: