Loop cycle with multiple data

Hi, I have a txt file formatted like a csv, I have to check and modify some values based on conditions. I made this code that works, but since the starting file contains 664765 rows it is taking forever to execute, so I was wondering if there was a faster way to do it.

The first line contains the headers (stop_id, arrival_time,departure_time, index)
The arrival and departure time in eache line are always identical and formatted like so hh:mm:ss
Here is what I have to fix, the file contains multiple continuous lines with the same stop _id, it is possible the arrival_time in 2 consecutive lines (with the same stop_id) is the same and that is what I want to change.

For example:
line2 = 1234,08:10:00,08:10:00,1
line3 = 1234,08:10:00,08:10:00,2

In this case I want to change line3 like this:
line3 = 1234,08:10:02,08:10:02,2

Since I wasn’t quite able to find how to modify the lines on my file I created another one and I’m inserting there the lines I check and eventually modify. I don’t know if it takes more time and knowing how to modify the starting file would be faster.

Also I inserted a Write-Host $lineNumberX to check the progress and that could delay everything a bit, but that was the only way I tought for keeping track.

Thanks in advance for the help

$Path1 = "C:\Users\Administrator\Desktop\Mia cartella 5" #Folder with origin file and modified file
$Path2 = ($Path1 + '\' + "stop_times.txt") #origin file
$PathNew = $Path1 + "\OUT.txt" #modified file
$lineNumberX = 2
$line1 = (Get-Content -Path $Path2 -TotalCount 1)

New-Item $PathNew #create modified file

Add-Content -Path $PathNew -Value $line1

Get-Content -Path $Path2 | ForEach-Object {

$lineX = (Get-Content -Path $Path2 -TotalCount $lineNumberX)[-1] #the line I'm checking
$trip_idX = $lineX.Substring(0,11)
$arrival_timeX =$lineX.Substring(12,8)

$lineNumberY = $lineNumberX +1
$lineY = (Get-Content -Path $Path2 -TotalCount $lineNumberY)[-1] #the line after
$trip_idY = $lineY.Substring(0,11)
$arrival_timeY =$lineY.Substring(12,8)


if ($trip_idX -eq $trip_idY) 
    {
        if ($arrival_timeX -eq $arrival_timeY) 
        {
            $lineY = $lineY -replace ":00," , ":02,"
          
            Add-Content -Path $PathNew -Value $lineX
            Add-Content -Path $PathNew -Value $lineY
            $lineNumberX = $lineNumberX +2
            Write-Host $lineNumberX #check video to see how far I am
        } 
        Else
        {
            Add-Content -Path $PathNew -Value $lineX
            $lineNumberX = $lineNumberX +1
            Write-Host $lineNumberX #check video to see how far I am
        }
     
    } 
Else
{
    Add-Content -Path $PathNew -Value $lineX
    $lineNumberX = $lineNumberX +1
    Write-Host $lineNumberX #check video to see how far I am
}

}

You categorized your question as “DSC”. You may change this to “PowerShell Help” instead. :wink:

What’s the condition?

Could you share some (sanitized but representative) lines of the file in the original format (formatted as code please)?

If your inpit file is a proper CSV file you should use Import-Csv to import the data.

If that’s “the condition” you could use Group-Object to extract those data sets.

Since you only showed one example how you want to change the data … Instead of using string acrobatics you could cast the input data to proper [DateTime] types and change them this way.

Sorry for the misplace, the condition as you guessed is the one on the last extract. The file I have is not a proper CSV, is formatted like one, but it is a .txt file

The example I provided isn’t explicable enough? I believed it would have been, anyway there you go:

trip_id,arrival_time,departure_time,stop_id,stop_sequence
491_1017690,05:00:00,05:00:00,7001,1
491_1017690,05:01:00,05:01:00,7002,2
491_1017690,05:03:00,05:03:00,7003,3

:thinking: What?

The extension of a file does not matter. If the internal formatting is CSV then you can use Import-Csv. :point_up:t3:

You mentioned …

And you share 3 of them? :man_shrugging:t3:

The 3 example lines do not have the same stop_id.

Please keep in mind: We cannot see your screen and we cannot read your mind. You will have to explain it with enough details for us to be able to understand and to reproduce your situation. And you should provide enough sample data for us to play with. In the best case it containes some “white noise” data as well. It makes it easier to design and check the logic.

And I asked to format the sample data as code. :smirk: :face_with_raised_eyebrow:

Sorry, I confused trip_id and stop_id in writing. As you can see is the trip_id that is constant. Sorry for not formatting the data as code, my bad.

I’d like to not put so much content because I’m note sure if I can disclose the file.

I have the same trip_id on multiple lines, usually the arrival_time is different from one line to the next, but sometimes it is the same, so I have to manually chenge the arrival_time in the next line by adding 2 seconds.

I’ll try using the Import-CSV.

Reading online I found this suggested as faster than Get-Content and seems improving a bit. Using Get-Content the file got slower and slower while processing.

foreach($lineX in [System.IO.File]::ReadLines($Path2))

That’s why I asked for …

If there are sensitive data you should obfuscate or randomize them. :man_shrugging:t3:

You still can do. :point_up:t3:

If I do this:

$Data = Import-Csv -Path .\stop_times.txt
$Data | Format-Table -AutoSize

… with your sample data the output looks like this:

trip_id     arrival_time departure_time stop_id stop_sequence
-------     ------------ -------------- ------- -------------
491_1017690 05:00:00     05:00:00       7001    1
491_1017690 05:01:00     05:01:00       7002    2
491_1017690 05:03:00     05:03:00       7003    3

How would you like the data to be changed? :face_with_raised_eyebrow: :man_shrugging:t3:

Import-Csv seems really helpful, I’ll try it immediately, thank you.

Sometimes in my file I have consecutive lines with the same trip_id and arrival_time (no more than 2 lines in a row), in this case I want to add 2 seconds to the second line so the arrival_time is a different.

After I use Import-Csv I can use ForEach to cicle through every line, correct? And check the values

Again, if you’d share some REPRESENTATIVE sample data I could try to help you. But it does not make any sense if I have to guess what data you’re dealing with. :man_shrugging:t3:

“representative” in this case means that there are data “having the issue” you want to correct. And it wouldn’t bother if there are a few more data sets in this sample data NOT having the issue. :point_up:t3: :smirk:

Here you go, I hope it is enough, I modified a value in line 7 so in line 6 and 7 you have the case I described

trip_id,arrival_time,departure_time,stop_id,stop_sequence
491_1428304,11:56:00,11:56:00,1032,1
491_1428304,11:57:49,11:57:49,1034,2
491_1428304,11:57:51,11:58:51,1035,3
491_1428304,12:00:21,12:00:21,1036,4
491_1428304,12:01:00,12:01:00,1037,5
491_1428304,12:01:00,12:01:00,2002,6
491_1428304,12:02:24,12:02:24,2095,7
491_1428304,12:03:07,12:03:07,2003,8
491_1428304,12:03:37,12:03:37,2004,9
491_1428304,12:05:00,12:05:00,1039,10
491_1428304,12:05:50,12:05:50,2037,11
491_1428304,12:06:46,12:06:46,2005,12
491_1428304,12:09:00,12:09:00,1040,13
491_1428304,12:11:11,12:11:11,1041,14
491_1428304,12:15:00,12:15:00,2141,15
491_1428304,12:17:00,12:17:00,1042,16
491_1428304,12:18:08,12:18:08,1043,17
491_1428304,12:19:29,12:19:29,1044,18
491_1428304,12:20:38,12:20:38,326,19
491_1428304,12:21:00,12:21:00,327,20
491_1428304,12:21:51,12:21:51,323,21
491_1428304,12:22:29,12:22:29,2104,22
491_1428304,12:23:00,12:23:00,329,23
491_1428304,12:23:42,12:23:42,389,24
491_1428304,12:24:19,12:24:19,390,25
491_1428304,12:25:14,12:25:14,391,26
491_1428304,12:26:00,12:26:00,392,27
491_1428304,12:27:00,12:27:00,393,28
491_1428304,12:28:07,12:28:07,251,29
491_1428304,12:29:09,12:29:09,252,30
491_1428304,12:30:22,12:30:22,253,31
491_1428304,12:31:19,12:31:19,135,32
491_1428304,12:32:06,12:32:06,136,33
491_1428304,12:33:00,12:33:00,68,34
491_1428304,12:34:00,12:34:00,159,35
491_1428304,12:36:00,12:36:00,137,36
491_1428304,12:37:55,12:37:55,575,37
491_1428304,12:39:32,12:39:32,576,38
491_1428304,12:40:49,12:40:49,394,39
491_1428304,12:42:00,12:42:00,395,40
491_1428304,12:43:15,12:43:15,396,41
491_1428304,12:44:00,12:44:00,397,42
491_1428304,12:45:59,12:45:59,398,43
491_1428304,12:47:15,12:47:15,399,44
491_1428304,12:48:38,12:48:38,141,45
491_1428304,12:50:55,12:50:55,401,46
491_1428304,12:53:51,12:53:51,402,47
491_1428304,12:56:00,12:56:00,403,48
491_1428304,12:57:08,12:57:08,404,49
491_1428304,12:57:54,12:57:54,405,50
491_1428304,12:59:00,12:59:00,406,51
491_1428304,12:59:37,12:59:37,407,52
491_1428304,13:00:42,13:00:42,408,53
491_1428304,13:01:18,13:01:18,409,54
491_1428304,13:01:50,13:01:50,939,55
491_1428304,13:03:11,13:03:11,940,56
491_1428304,13:04:00,13:04:00,941,57
491_1428304,13:04:32,13:04:32,942,58
491_1428304,13:06:00,13:06:00,361,59

The trip_id of all 59 data sets is the same. Is this the case in your real data as well? I understood that there will be some cases where the trip_id is the same like the one before. :man_shrugging:t3: I’ll soon be fed up. Is it really that hard to understand that the sample data should contain a few test cases - not just a single one and besides this some random data to actually be able to check if the logic you come up with to get your task done works???

Using your sample data and making a lot of assumptions you could start with something like this:

$Data = 
    Import-Csv -Path .\stop_times.txt | 
        Select-Object -Property trip_id, arrival_time, departure_time, stop_id, 
            @{Name = 'stop_sequence'; Expression = {$_.stop_sequence -as [int]}}
$GroupedData = 
    $Data | 
        Group-Object -Property arrival_time 

$UnchangedOriginals =
    ($GroupedData | 
        Where-Object -Property Count -eq 1 ).Group

$CorrectedData = 
    $GroupedData | 
        Where-Object -Property Count -gt 1 |
            ForEach-Object {
                $_.Group | Sort-Object -Property stop_sequence | Select-Object -First 1 
                $LastElement = $_.Group | Sort-Object -Property stop_sequence | Select-Object -Last 1 
                $LastElement.arrival_time = ($LastElement.arrival_time -as [datetime] ).AddSeconds(2).ToLongTimeString()
                $LastElement
            }

$Results = 
$CorrectedData + $UnchangedOriginals 

The output looks like this:

$Results | 
    Sort-Object -Property stop_sequence |    
        Format-Table -AutoSize

trip_id     arrival_time departure_time stop_id stop_sequence
-------     ------------ -------------- ------- -------------
491_1428304 11:56:00     11:56:00       1032                1
491_1428304 11:57:49     11:57:49       1034                2
491_1428304 11:57:51     11:58:51       1035                3
491_1428304 12:00:21     12:00:21       1036                4
491_1428304 12:01:00     12:01:00       1037                5
491_1428304 12:01:02     12:01:00       2002                6
491_1428304 12:02:24     12:02:24       2095                7
491_1428304 12:03:07     12:03:07       2003                8
491_1428304 12:03:37     12:03:37       2004                9
491_1428304 12:05:00     12:05:00       1039               10
491_1428304 12:05:50     12:05:50       2037               11
491_1428304 12:06:46     12:06:46       2005               12
491_1428304 12:09:00     12:09:00       1040               13
491_1428304 12:11:11     12:11:11       1041               14
491_1428304 12:15:00     12:15:00       2141               15
491_1428304 12:17:00     12:17:00       1042               16
491_1428304 12:18:08     12:18:08       1043               17
491_1428304 12:19:29     12:19:29       1044               18
491_1428304 12:20:38     12:20:38       326                19
491_1428304 12:21:00     12:21:00       327                20
491_1428304 12:21:51     12:21:51       323                21
491_1428304 12:22:29     12:22:29       2104               22
491_1428304 12:23:00     12:23:00       329                23
491_1428304 12:23:42     12:23:42       389                24
491_1428304 12:24:19     12:24:19       390                25
491_1428304 12:25:14     12:25:14       391                26
491_1428304 12:26:00     12:26:00       392                27
491_1428304 12:27:00     12:27:00       393                28
491_1428304 12:28:07     12:28:07       251                29
491_1428304 12:29:09     12:29:09       252                30
491_1428304 12:30:22     12:30:22       253                31
491_1428304 12:31:19     12:31:19       135                32
491_1428304 12:32:06     12:32:06       136                33
491_1428304 12:33:00     12:33:00       68                 34
491_1428304 12:34:00     12:34:00       159                35
491_1428304 12:36:00     12:36:00       137                36
491_1428304 12:37:55     12:37:55       575                37
491_1428304 12:39:32     12:39:32       576                38
491_1428304 12:40:49     12:40:49       394                39
491_1428304 12:42:00     12:42:00       395                40
491_1428304 12:43:15     12:43:15       396                41
491_1428304 12:44:00     12:44:00       397                42
491_1428304 12:45:59     12:45:59       398                43
491_1428304 12:47:15     12:47:15       399                44
491_1428304 12:48:38     12:48:38       141                45
491_1428304 12:50:55     12:50:55       401                46
491_1428304 12:53:51     12:53:51       402                47
491_1428304 12:56:00     12:56:00       403                48
491_1428304 12:57:08     12:57:08       404                49
491_1428304 12:57:54     12:57:54       405                50
491_1428304 12:59:00     12:59:00       406                51
491_1428304 12:59:37     12:59:37       407                52
491_1428304 13:00:42     13:00:42       408                53
491_1428304 13:01:18     13:01:18       409                54
491_1428304 13:01:50     13:01:50       939                55
491_1428304 13:03:11     13:03:11       940                56
491_1428304 13:04:00     13:04:00       941                57
491_1428304 13:04:32     13:04:32       942                58
491_1428304 13:06:00     13:06:00       361                59

Sorry if I didn’t answer, but I had other things at work that required my time and wasn’t often in the office. Thank you so much for your help, the code now works smoothly and requires less than a minute to execute.

I chenged a bit the part about the loop, but Import-Csv really saved me

Hi again, nI finished it and worked ok,it finished in about 40 seconds, but there was something off with the final export. Here is my code:

$Path1 = "C:\Users\bm607134\Desktop\Temp" #Folder with origin and edited files
$Path2 = ($Path1 + '\' + "stop_times.txt") #origin file
$PathNew = $Path1 + "\OUT.txt" #edited file
#New-Item $PathNew #crea il file modificato

$Data = Import-Csv -Path $Path2

$RowIndex = 0
foreach ($row in $Data)
{

    if ($Data[$RowIndex].arrival_time -eq $Data[$RowIndex -1].arrival_time)
    {
        if ($Data[$RowIndex].trip_id -eq $Data[$RowIndex -1].trip_id)
        {
            $Data[$RowIndex].arrival_time = ($Data[$RowIndex].arrival_time -as [datetime] ).AddSeconds(2).ToLongTimeString()
            $Data[$RowIndex].departure_time = $Data[$RowIndex].arrival_time
            #Write-Output $Data[$RowIndex].arrival_time
            #Write-Output $Data[$RowIndex].departure_time

        }
        
    }

$RowIndex ++
} 

$Data | Export-Csv $PathNew

I don’t know why, but the export put everything quotation marks:

#TYPE System.Management.Automation.PSCustomObject
"trip_id","arrival_time","departure_time","stop_id","stop_sequence"
"491_1017690","05:00:00","05:00:00","7001","1"
"491_1017690","05:01:00","05:01:00","7002","2"

Ok, I solved it, I just needed to modify the fiel after the export by replacing the quotation marks with nothing

$Data | Export-Csv $PathNew -NoTypeInformation
(Get-Content $PathNew).Replace(‘"’, ‘’) | Set-Content $PathNew

That is part of the CSV standard. Every current tool or service should be able to work with that if it claims to be able to work with CSV data. :man_shrugging:t3:

I didn’t work a lot with csv files, this is the first time I saw that. As I mentioned I was able to edit the file afterwards removing the quotation marks.

Thank you so much for your help and your time

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.