parsing through a document

Hello, I have a DLP type program up and running but would like to be able to parse through it for tags. An example would be water-marking documents as “secret” or “Private”, or using white-text to hide such tags in the general document text.

Can powershell do this on any PC, or just on MS Server? Or is it too much compute overall? Yes, I am looking to avoid paying for actual DLP software.

Thanks!

If you’re talking about controlling the program with some scripts it depends pretty much on the particular program. If it has an API for Powershell you could do it. But if not you will be probably pretty much out of luck. There are some option to control the GUI of a program with something like AutoIt or AutoHotkey but that’s another discussion. :wink:

Hi Olaf,

Thanks, I don’t think I spelled out what I am hoping to find a command or package for. So I have a working file status monitoring program and am looking for a package, library or lines of code that I could add that would be able to detect in a document:

SSNs

Credit card numbers

embedded tags

etc.

 

Thank you

You’ll have to be more specific with your operating situation. PowerShell can only do Get-Content on plaintext files, which your documents probably aren’t. Handling other document formats requires more complicated methods, and it’s different for each document format that you want to handle. Also, writing new information into them will be more complicated than getting information out of them.

For instance, this blog post from the Scripting Guy describes a method for importing a .docx file as an object and then finding specific words within the file.

This forum discussion is about finding text in a .pdf document, but it relies on the now-deprecated itextsharp. You can probably apply the same method using the new version, iText7, but depending on your usage it may not be legal to use it for free. Unclear whether this can handle .ps documents in addition to .pdf

If you need to handle other document types, like .odt, that’s another specific solution that you’ll have to find.