loop function that parses arrays with condition: no redundant data

amela · Nov-02-2020, 11:40 PM

for some classification, I want to split my data (60%,20%,20%) in condition that no redundant data in each class: I want that the train set have unique data, test set have another set of data and the validation set contains new set of data

take the content of the matrix cell where the cell number i= number from the list region
my data table is

Quote: 5 5 5 6 6 6 6
124 254 558 541 57 120 212

as illustated in the example we have "5" included in the region list so we take the unique content of the 5 , same for 6,7 untill geting 10 ( as 10 is the final number

my data are numpy arrays. My code imagination are

Quote:def sets_split (id, name, region:
1- extract the unique liste of regions
2 - loop the regions, and extract unique ID uniques for each region
3- random shuffle the ID
4- split id list into train valid test
5- in the end we take idx_train, idx_valid et idx_test maybe with np.where (np.isin( id,Train/Valid/Test ))
return train_bands, valid_bands, test_bands, train_label, valid_label, test_label
my data are:

region: array of 2145 region -->

Quote:with list(set(region.flat))

gives a list of 10 [1,2,3,4,5,6,7,8,9,10]
ID: 2d array of 14587 feature
Note:my algorithm could be false, please feel free to give me hints

jefsummers · Nov-03-2020, 12:40 AM

Use scikitlearn's sklearn.model_selection.train_test_split function twice. Once to split off your validation set, then once to get your train and test functions.

In other words, the wheel has been invented already. Don't try to write the routine that does this rather than use a fast, debugged routine that already exists.

amela · (This post was last modified: Nov-03-2020, 06:15 PM by amela.)

yes I found thatthe split can be likethis

Quote:X_train, X_test, y_train, y_test
= train_test_split(X, y, test_size=0.2, random_state=1)

X_train, X_val, y_train, y_val
= train_test_split(X_train, y_train, test_size=0.25, random_state=1) # 0.25 x 0.8 = 0.2

my problem is how to enssure that the data in the train are different of those in test and in split

jefsummers · Nov-04-2020, 12:06 PM

Pretty sure it does that.
Review Documentation

amela · Nov-05-2020, 07:29 PM

Is it the The cross_validate function??

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	How can I run a function inside a loop every 24 values of the loop iteration range?	mcva	1	2,200	Sep-18-2019, 04:50 PM Last Post: buran
	2D arrays and appending values using a loop	Pythonhelp82	4	5,762	Mar-25-2019, 03:23 AM Last Post: Pythonhelp82
	While loop within a Function (2.7)	tuffgong	3	3,600	Jun-28-2017, 05:54 PM Last Post: sparkz_alot

loop function that parses arrays with condition: no redundant data

User Panel Messages

Announcements