Hi, I don’t know if this easy to make, I’m not so accustomed on text editing via Powershell.
Here is my beginning setting, I have few PDF files with a variable number of pages (from 2 to 5 generally) and I need to count the occurencies of a specific word (11256) in them, I’ll need to process a single file per day.
The file gets importd to text, but the pages are odd, some gets duplicated for no reason, last import I had a PDF with 3 pages and it imported (in order): Page 1, Page 1, Page 2, Page 1, Page 2, Page 3.
To get to the point I would like to fix the text after the import. Every page begins with the same header: “Export Data” and contains at the bottom the page number like so: “1 / x, 2 / x …”.
In my head I would like to extract the text part between 2 headers (If the hader gets duplicated is not a problem) and save each to a variable, then I would check the page number on each variable and discard the ones I already got.