Nov-02-2020, 11:40 PM
for some classification, I want to split my data (60%,20%,20%) in condition that no redundant data in each class: I want that the train set have unique data, test set have another set of data and the validation set contains new set of data
take the content of the matrix cell where the cell number i= number from the list region
my data table is
as illustated in the example we have "5" included in the region list so we take the unique content of the 5 , same for 6,7 untill geting 10 ( as 10 is the final number
my data are numpy arrays. My code imagination are
region: array of 2145 region -->
ID: 2d array of 14587 feature
Note:my algorithm could be false, please feel free to give me hints
take the content of the matrix cell where the cell number i= number from the list region
my data table is
Quote: 5 5 5 6 6 6 6
124 254 558 541 57 120 212
as illustated in the example we have "5" included in the region list so we take the unique content of the 5 , same for 6,7 untill geting 10 ( as 10 is the final number
my data are numpy arrays. My code imagination are
Quote:def sets_split (id, name, region:
1- extract the unique liste of regions
2 - loop the regions, and extract unique ID uniques for each region
3- random shuffle the ID
4- split id list into train valid test
5- in the end we take idx_train, idx_valid et idx_test maybe with np.where (np.isin( id,Train/Valid/Test ))
return train_bands, valid_bands, test_bands, train_label, valid_label, test_label
my data are:
region: array of 2145 region -->
Quote:with list(set(region.flat))gives a list of 10 [1,2,3,4,5,6,7,8,9,10]
ID: 2d array of 14587 feature
Note:my algorithm could be false, please feel free to give me hints