Python Forum
loop function that parses arrays with condition: no redundant data
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
loop function that parses arrays with condition: no redundant data
#1
for some classification, I want to split my data (60%,20%,20%) in condition that no redundant data in each class: I want that the train set have unique data, test set have another set of data and the validation set contains new set of data

take the content of the matrix cell where the cell number i= number from the list region
my data table is

Quote: 5 5 5 6 6 6 6
124 254 558 541 57 120 212

as illustated in the example we have "5" included in the region list so we take the unique content of the 5 , same for 6,7 untill geting 10 ( as 10 is the final number

my data are numpy arrays. My code imagination are

Quote:def sets_split (id, name, region:
1- extract the unique liste of regions
2 - loop the regions, and extract unique ID uniques for each region
3- random shuffle the ID
4- split id list into train valid test
5- in the end we take idx_train, idx_valid et idx_test maybe with np.where (np.isin( id,Train/Valid/Test ))
return train_bands, valid_bands, test_bands, train_label, valid_label, test_label
my data are:

region: array of 2145 region -->
Quote:with list(set(region.flat))
gives a list of 10 [1,2,3,4,5,6,7,8,9,10]
ID: 2d array of 14587 feature
Note:my algorithm could be false, please feel free to give me hints
Reply
#2
Use scikitlearn's sklearn.model_selection.train_test_split function twice. Once to split off your validation set, then once to get your train and test functions.

In other words, the wheel has been invented already. Don't try to write the routine that does this rather than use a fast, debugged routine that already exists.
Reply
#3
yes I found thatthe split can be likethis

Quote:X_train, X_test, y_train, y_test
= train_test_split(X, y, test_size=0.2, random_state=1)

X_train, X_val, y_train, y_val
= train_test_split(X_train, y_train, test_size=0.25, random_state=1) # 0.25 x 0.8 = 0.2

my problem is how to enssure that the data in the train are different of those in test and in split
Reply
#4
Pretty sure it does that.
Review Documentation
Reply
#5
Is it the The cross_validate function??
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How can I run a function inside a loop every 24 values of the loop iteration range? mcva 1 2,200 Sep-18-2019, 04:50 PM
Last Post: buran
  2D arrays and appending values using a loop Pythonhelp82 4 5,762 Mar-25-2019, 03:23 AM
Last Post: Pythonhelp82
  While loop within a Function (2.7) tuffgong 3 3,600 Jun-28-2017, 05:54 PM
Last Post: sparkz_alot

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020