I can aggregate 10000 records in a minute or less assuming a useable machine.
In the end an enterprise database server can do this much faster because it is built for data. The point of pulling it down and using PowerShell is to offload the corporate servers.
I would use a analysis server if this is required on a regular basis.
I am a DBA and have specific training in data warehouses, data marts and analysis servers on a number of platforms. Systems and hardware dedicated to analysis are the most efficient and easiest to manage.
1000 records is trivial for PowerShell and can be aggregated very quickly. Perhaps your approach is not optimal.
Linq?
Forum rules
Do not post any licensing information in this forum.
Any code longer than three lines should be added as code using the 'Select Code' dropdown menu or attached as a file.
Do not post any licensing information in this forum.
Any code longer than three lines should be added as code using the 'Select Code' dropdown menu or attached as a file.
- LtMandella
- Posts: 61
- Last visit: Mon May 07, 2018 4:03 pm
Re: Linq?
[[Perhaps your approach is not optimal.]]
Certainly possible.
[[I can aggregate 10000 records in a minute or less assuming a useable machine.]]
yes, that is about the performance I am getting from the powershell script I posted. 5K records in about 30secs. (although I then have to push it into a datagrid).
But I hate to ask the user wait for 30 seconds watching the hourglass in the GUI for ~ 5K records to be de-duped and aggregated. If I send the recordset across the wire to sql server and use sql to bring those 5K records back in deduped and aggregated, it takes about 5 secs.
And I expect the difference will be even more noticeable for greater record count.
[[I would use a analysis server if this is required on a regular basis.]]
There is nothing to aggregate or dedup before the user interactively makes some choices using the GUI for how they want to modify the records.
ken
Certainly possible.
[[I can aggregate 10000 records in a minute or less assuming a useable machine.]]
yes, that is about the performance I am getting from the powershell script I posted. 5K records in about 30secs. (although I then have to push it into a datagrid).
But I hate to ask the user wait for 30 seconds watching the hourglass in the GUI for ~ 5K records to be de-duped and aggregated. If I send the recordset across the wire to sql server and use sql to bring those 5K records back in deduped and aggregated, it takes about 5 secs.
And I expect the difference will be even more noticeable for greater record count.
[[I would use a analysis server if this is required on a regular basis.]]
There is nothing to aggregate or dedup before the user interactively makes some choices using the GUI for how they want to modify the records.
ken
Re: Linq?
Actually, in my experience, the startup cost is the most expensive.
If your SQLServer can take the hit with no issues to production then go for it. I had a database with 16 thread capability (4X4 cores an) and a multichannel optimally striped 3 terabyte array with about 220 disks. Nothing was faster.
If your SQLServer can take the hit with no issues to production then go for it. I had a database with 16 thread capability (4X4 cores an) and a multichannel optimally striped 3 terabyte array with about 220 disks. Nothing was faster.
- LtMandella
- Posts: 61
- Last visit: Mon May 07, 2018 4:03 pm
Re: Linq?
I polled the users today and they expect they will have > 1000 records in a batch only very rarely.
So for that reason, I am going with the powershell version. I may regret it. I can always revert to the sql server version if needed.
In fact I did run into a limitation of powershell measure-object cmdlet that stumped me today. I have a datetime column in the datatable and measure object choked on the values in it, would only return nulls for maximum.
I had to change my group statement to include the datetime column as a shortdatestring. Luckily that value in this data will always group the way I need.
But if I needed to get the max datetime value from a powershell group using measure-object? Can't be done. Measure-object only supports numeric data.
Unlike sql of course...
So for that reason, I am going with the powershell version. I may regret it. I can always revert to the sql server version if needed.
In fact I did run into a limitation of powershell measure-object cmdlet that stumped me today. I have a datetime column in the datatable and measure object choked on the values in it, would only return nulls for maximum.
I had to change my group statement to include the datetime column as a shortdatestring. Luckily that value in this data will always group the way I need.
But if I needed to get the max datetime value from a powershell group using measure-object? Can't be done. Measure-object only supports numeric data.
Unlike sql of course...
Re: Linq?
Measure Object can only measure datetime objects. SQLServer returns strings under most circumstances and they have to be converted. You can also design your query to return datetime objects.
- LtMandella
- Posts: 61
- Last visit: Mon May 07, 2018 4:03 pm
Re: Linq?
trying the simplest possible test psobject with a strongly typed date. measureobject throws error saying date is not numeric.
Should measure-object be able to return the max of a group with dates?
# create file with character strings for dates
@”
OrderId,Date,CustomerId
1,6/6/2012 9:12:44 AM,1
2,6/7/2012 6:12:10 PM,1
3,6/5/2012 12:05:03 PM,2
“@ > .\orders.txt
# import as strings
$orders=import-csv ".\orders.txt"
$orders | gm | format-table -AutoSize
$orders | format-table -AutoSize
OrderId Date CustomerId
------- ---- ----------
1 6/6/2012 9:12:44 AM 1
2 6/7/2012 6:12:10 PM 1
3 6/5/2012 12:05:03 PM 2
#convert to strongly typed date
$StronglyTypedOrders = $orders | Select OrderId, @{N='Date';E={[datetime]$_.Date}}, CustomerId
$StronglyTypedOrders | gm | format-table -AutoSize
TypeName: Selected.System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
CustomerId NoteProperty System.String CustomerId=1
Date NoteProperty System.DateTime Date=6/6/2012 9:12:44 AM
OrderId NoteProperty System.String OrderId=1
$measured = $StronglyTypedOrders | Measure-Object -Property Date -max
Measure-Object : Input object "6/6/2012 9:12:44 AM" is not numeric.
At C:\powershelltesting\MeasureDateTest.ps1:20 char:50
+ $measured = $StronglyTypedOrders | Measure-Object <<<< -Property Date -max
+ CategoryInfo : InvalidType: (6/6/2012 9:12:44 AM:PSObject) [Mea
sure-Object], PSInvalidOperationException
+ FullyQualifiedErrorId : NonNumericInputObject,Microsoft.PowerShell.Comma
nds.MeasureObjectCommand
Measure-Object : Input object "6/7/2012 6:12:10 PM" is not numeric.
At C:\powershelltesting\MeasureDateTest.ps1:20 char:50
+ $measured = $StronglyTypedOrders | Measure-Object <<<< -Property Date -max
+ CategoryInfo : InvalidType: (6/7/2012 6:12:10 PM:PSObject) [Mea
sure-Object], PSInvalidOperationException
+ FullyQualifiedErrorId : NonNumericInputObject
Should measure-object be able to return the max of a group with dates?
# create file with character strings for dates
@”
OrderId,Date,CustomerId
1,6/6/2012 9:12:44 AM,1
2,6/7/2012 6:12:10 PM,1
3,6/5/2012 12:05:03 PM,2
“@ > .\orders.txt
# import as strings
$orders=import-csv ".\orders.txt"
$orders | gm | format-table -AutoSize
$orders | format-table -AutoSize
OrderId Date CustomerId
------- ---- ----------
1 6/6/2012 9:12:44 AM 1
2 6/7/2012 6:12:10 PM 1
3 6/5/2012 12:05:03 PM 2
#convert to strongly typed date
$StronglyTypedOrders = $orders | Select OrderId, @{N='Date';E={[datetime]$_.Date}}, CustomerId
$StronglyTypedOrders | gm | format-table -AutoSize
TypeName: Selected.System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
CustomerId NoteProperty System.String CustomerId=1
Date NoteProperty System.DateTime Date=6/6/2012 9:12:44 AM
OrderId NoteProperty System.String OrderId=1
$measured = $StronglyTypedOrders | Measure-Object -Property Date -max
Measure-Object : Input object "6/6/2012 9:12:44 AM" is not numeric.
At C:\powershelltesting\MeasureDateTest.ps1:20 char:50
+ $measured = $StronglyTypedOrders | Measure-Object <<<< -Property Date -max
+ CategoryInfo : InvalidType: (6/6/2012 9:12:44 AM:PSObject) [Mea
sure-Object], PSInvalidOperationException
+ FullyQualifiedErrorId : NonNumericInputObject,Microsoft.PowerShell.Comma
nds.MeasureObjectCommand
Measure-Object : Input object "6/7/2012 6:12:10 PM" is not numeric.
At C:\powershelltesting\MeasureDateTest.ps1:20 char:50
+ $measured = $StronglyTypedOrders | Measure-Object <<<< -Property Date -max
+ CategoryInfo : InvalidType: (6/7/2012 6:12:10 PM:PSObject) [Mea
sure-Object], PSInvalidOperationException
+ FullyQualifiedErrorId : NonNumericInputObject
- LtMandella
- Posts: 61
- Last visit: Mon May 07, 2018 4:03 pm
Re: Linq?
format table was only in there to show the objects data, and to show the data type of the date element was type date.
removing all those format-table clauses and same error. I am on version 2 and 3. Maybe it only works in newer powershell version? You should be able to cut and paste the exact few lines of code below and repro?
----------------------------code--------
@”
OrderId,Date,CustomerId
1,6/6/2012 9:12:44 AM,1
2,6/7/2012 6:12:10 PM,1
3,6/5/2012 12:05:03 PM,2
“@ > .\orders.txt
$orders=import-csv ".\orders.txt"
$StronglyTypedOrders = $orders | Select OrderId, @{N='Date';E={[datetime]$_.Date}}, CustomerId
$measured = $StronglyTypedOrders | Measure-Object -Property Date -max
----------------------------------------------
Measure-Object : Input object "6/6/2012 9:12:44 AM" is not numeric.
At C:\powershelltesting\MeasureDateTest.ps1:15 char:50
+ $measured = $StronglyTypedOrders | Measure-Object <<<< -Property Date -max
+ CategoryInfo : InvalidType: (6/6/2012 9:12:44 AM:PSObject) [Mea
sure-Object], PSInvalidOperationException
+ FullyQualifiedErrorId : NonNumericInputObject,Microsoft.PowerShell.Comma
nds.MeasureObjectCommand
Measure-Object : Input object "6/7/2012 6:12:10 PM" is not numeric.
At C:\powershelltesting\MeasureDateTest.ps1:15 char:50
+ $measured = $StronglyTypedOrders | Measure-Object
removing all those format-table clauses and same error. I am on version 2 and 3. Maybe it only works in newer powershell version? You should be able to cut and paste the exact few lines of code below and repro?
----------------------------code--------
@”
OrderId,Date,CustomerId
1,6/6/2012 9:12:44 AM,1
2,6/7/2012 6:12:10 PM,1
3,6/5/2012 12:05:03 PM,2
“@ > .\orders.txt
$orders=import-csv ".\orders.txt"
$StronglyTypedOrders = $orders | Select OrderId, @{N='Date';E={[datetime]$_.Date}}, CustomerId
$measured = $StronglyTypedOrders | Measure-Object -Property Date -max
----------------------------------------------
Measure-Object : Input object "6/6/2012 9:12:44 AM" is not numeric.
At C:\powershelltesting\MeasureDateTest.ps1:15 char:50
+ $measured = $StronglyTypedOrders | Measure-Object <<<< -Property Date -max
+ CategoryInfo : InvalidType: (6/6/2012 9:12:44 AM:PSObject) [Mea
sure-Object], PSInvalidOperationException
+ FullyQualifiedErrorId : NonNumericInputObject,Microsoft.PowerShell.Comma
nds.MeasureObjectCommand
Measure-Object : Input object "6/7/2012 6:12:10 PM" is not numeric.
At C:\powershelltesting\MeasureDateTest.ps1:15 char:50
+ $measured = $StronglyTypedOrders | Measure-Object
Re: Linq?
Works for me just fi ne:
- PS D:\scripts> $orders=@”
- >> OrderId,Date,CustomerId
- >> 1,6/6/2012 9:12:44 AM,1
- >> 2,6/7/2012 6:12:10 PM,1
- >> 3,6/5/2012 12:05:03 PM,2
- >> “@ |ConvertFrom-Csv
- PS D:\scripts> $orders | Select OrderId, @{N='Date';E={[datetime]$_.Date}}, CustomerId
- OrderId Date CustomerId
- ------- ---- ----------
- 1 6/6/2012 9:12:44 AM 1
- 2 6/7/2012 6:12:10 PM 1
- 3 6/5/2012 12:05:03 PM 2
- PS D:\scripts> $fixed=$orders | Select OrderId, @{N='Date';E={[datetime]$_.Date}}, CustomerId
- PS D:\scripts> $fixed | Measure-Object -Max Date
- Count : 3
- Average :
- Sum :
- Maximum : 6/7/2012 6:12:10 PM
- Minimum :
- Property : Date
Re: Linq?
Measure-Object does not work with dates in Version 2.
In V3 I believe you need to have the correct Net versions installed. In my V4 and V5 copies it works.
We should not be using V3 or earlier as they are out of support.
In V3 I believe you need to have the correct Net versions installed. In my V4 and V5 copies it works.
We should not be using V3 or earlier as they are out of support.