[Numpy] How to store different data type in one numpy array? - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Data Science (https://python-forum.io/forum-44.html) +--- Thread: [Numpy] How to store different data type in one numpy array? (/thread-41826.html) |
[Numpy] How to store different data type in one numpy array? - water - Mar-23-2024 I want to store different data type in on numpy array, b = np.array([['2024-03-22', 71.0, 'ceh'], ['2024-03-23', 63.0, 'abc']])and specific dtype likes: [['datetime64[D]', 'float64', 'string'], ['datetime64[D]', 'float64', 'string']]how to define that? RE: [Numpy] How to store different data type in one numpy array? - deanhystad - Mar-24-2024 Look at: https://numpy.org/doc/stable/reference/generated/numpy.recarray.html https://numpy.org/doc/stable/user/basics.rec.html RE: [Numpy] How to store different data type in one numpy array? - snippsat - Mar-24-2024 Example like this. >>> import numpy as np >>> >>> dtype = [('date', 'datetime64[D]'), ('value', 'float64'), ('code', 'U3')] >>> n = np.array([('2024-03-22', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abc')], dtype=dtype) >>> n array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')], dtype=[('date', '<M8[D]'), ('value', '<f8'), ('code', '<U3')])String of up to 3 characters <U3 .>>> n = np.array([('2024-03-22', 71.0, 'cehar'), ('2024-03-23', 63.0, 'abchhhhhhhhhh')], dtype=dtype) >>> n array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')], dtype=[('date', '<M8[D]'), ('value', '<f8'), ('code', '<U3')])To Pandas dataFrame,as Panda i build on NumPy in bottom it seamlessly transfer over. >>> import pandas as pd >>> >>> df = pd.DataFrame(n) >>> df date value code 0 2024-03-22 71.0 ceh 1 2024-03-23 63.0 abc RE: [Numpy] How to store different data type in one numpy array? - water - Mar-25-2024 Different data type seems just can be store into tuple then as array element in one array, can't store as standalone array element directly in one array. RE: [Numpy] How to store different data type in one numpy array? - snippsat - Mar-25-2024 It's one array if add it like this. >>> n = np.array([('2024-03-22', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abc')], dtype= [('date', 'datetime64[D]'), ('value', 'float64'), ('code', 'U3')]) >>> n array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')], dtype=[('date', '<M8[D]'), ('value', '<f8'), ('code', '<U3')])Can make it shorter like this,read doc . >>> n = np.array([('2024-03-22', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abcyyyyyy')], dtype='datetime64[D], float64, U3') >>> n array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')], dtype=[('f0', '<M8[D]'), ('f1', '<f8'), ('f2', '<U3')])Work the same if add date that's wrong will get error message. >>> n = np.array([('2024-03-2299', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abcyyyyyy')], dtype='datetime64[D], float64, U3') Traceback (most recent call last): File "<interactive input>", line 1, in <module> ValueError: Error parsing datetime string "2024-03-2299" at position 10 RE: [Numpy] How to store different data type in one numpy array? - deanhystad - Mar-25-2024 What are you planning to do with the array? RE: [Numpy] How to store different data type in one numpy array? - paul18fr - Mar-26-2024 agree with @deanhystad For instance, if the goal is to recover data per type, I would imagine the following if you can use 2 arrays (possible?): import numpy as np Array1 = np.array([['2024-03-22', 71.0, 'ceh'], ['2024-03-23', 63.0, 'abc'], ['2024-03-24', -50.6, 'zzzzzzz'], ['2024-03-25', 13.8, 'lkj'], ['2024-03-26', 05.2, 'dsfdssss'], [935.2, 'hgjhg', '2024-03-27'] ]) TypeArray = np.array([['datetime64[D]', 'float64', 'string'], ['datetime64[D]', 'float64', 'string'], ['datetime64[D]', 'float64', 'string'], ['datetime64[D]', 'float64', 'string'], ['datetime64[D]', 'float64', 'string'], ['float64', 'string', 'datetime64[D]'] # !!!!!!!!!!!!!! ]) NumberOfTypes = np.unique(TypeArray) # results are stored in a dictionary PER type but you can proceed differently RecoveringDictionary = {} for ntype in NumberOfTypes: Index = np.where(TypeArray == ntype) Extract = Array1[Index] if ntype == 'float64': Extract = Extract.astype(np.float64) # if ntype == 'datetime64[D]': Extract = Extract.astype(np.datetime64) RecoveringDictionary.update({ ntype: Extract, }) # print results for ntype in NumberOfTypes: print(f"{ntype} = {RecoveringDictionary[ntype]}\n")
RE: [Numpy] How to store different data type in one numpy array? - snippsat - Mar-26-2024 As deanhystad posted more info may be needed. paul18fr good effort,but would say that look wrong in most cases. The TypeArray dos not work(eg try with a wrong date) and repeat data unnecessary. (Mar-23-2024, 08:55 PM)water Wrote: and specific In first post he ask about specify dtype in a NumPy array.Then we talk about Structured arrays. To give one more example on how Structured arrays works import numpy as np # Sample data: Transaction ID, Date, Amount, Transaction Type data = [ (1001, '2023-01-01', 250.00, 'Deposit'), (1002, '2023-01-03', -100.00, 'Withdrawal'), (1003, '2023-01-05', 200.00, 'Deposit'), (1004, '2023-01-07', -50.00, 'Withdrawal'), (1005, '2023-01-09', 300.00, 'Deposit'), ] # Define the dtype for the structured array dtype = [ ('trans_id', 'int32'), ('date', 'datetime64[D]'), ('amount', 'float64'), ('type', 'U10') # Transaction type with up to 10 characters ] transactions = np.array(data, dtype=dtype)Structured arrays are particularly useful in scenarios where working with tabular data that mixes different data types, and where want to perform efficient, vectorized operations on this data. Take a look at data manipulation,this would not be possible if not specify dtype .# Get all dates >>> transactions['date'] array(['2023-01-01', '2023-01-03', '2023-01-05', '2023-01-07', '2023-01-09'], dtype='datetime64[D]') # Find all withdrawals >>> withdrawals = transactions[transactions['type'] == 'Withdrawal'] >>> withdrawals array([(1002, '2023-01-03', -100., 'Withdrawal'), (1004, '2023-01-07', -50., 'Withdrawal')], dtype=[('trans_id', '<i4'), ('date', '<M8[D]'), ('amount', '<f8'), ('type', '<U10')]) # Calculate the total amount of deposits >>> total_deposits = transactions[transactions['type'] == 'Deposit']['amount'].sum() >>> total_deposits 750.0Operations that are easily vectorized, staying within NumPy can be faster and more memory efficient. Structured arrays can also easily be taken into Pandas if need more advanced stuff like grouping, Plot... import pandas as pd df = pd.DataFrame(transactions) print(df)
|