Hinweis
Für den Zugriff auf diese Seite ist eine Autorisierung erforderlich. Sie können versuchen, sich anzumelden oder das Verzeichnis zu wechseln.
Für den Zugriff auf diese Seite ist eine Autorisierung erforderlich. Sie können versuchen, das Verzeichnis zu wechseln.
U-SQL doesn't support constructs to generate Unique Identifier in Text Files. The script below generates unique identifier for every row in the input file.
The steps are
- Extract the data file with the EXTRACT statement
- REDUCERS are spun based on the customer code. Too little reducers or too many reducers may both cause performance issues. Identify a column that can fairly split, but make sure not to specify a unique column.
- For every reduced data set, the python script is invoked with the DATA FRAME. Add another column to the data frame "sguid" and generate a new encoded UID.
- The output produced out of the reducer will have a new column sguid
REFERENCE ASSEMBLY [ExtPython];
DECLARE @ReduceScript = @" import uuid import base64
def usqlml_main(df): df['sguid'] = '' df['sguid'] = df.sguid.apply(lambda row: str(base64.urlsafe_b64encode(uuid.uuid1().bytes))) return df ";
@AllData = EXTRACT OrderNo string, Date string, CustomerCode string, ProductCode string, SalesArea string, OrderValue string FROM "/DataLoads/Input/TempFile.csv" USING Extractors.Text(delimiter: ',', skipFirstNRows: 1);
@ReducedData = REDUCE @AllData ON CustomerCode PRODUCE sguid string, OrderNo string, Date string, CustomerCode string, ProductCode string, SalesArea string, OrderValue string USING new Extension.Python.Reducer(pyScript:@ReduceScript);
OUTPUT @ReducedData TO "/DataLoads/CSVOutputwithGUID.txt" USING Outputters.Text(); |
Note : Follow these instructions to enable U-SQL extensions on your ADL-A account