Python Forum
[Numpy] How to store different data type in one numpy array? - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: [Numpy] How to store different data type in one numpy array? (/thread-41826.html)



[Numpy] How to store different data type in one numpy array? - water - Mar-23-2024

I want to store different data type in on numpy array,
b = np.array([['2024-03-22', 71.0, 'ceh'], ['2024-03-23', 63.0, 'abc']])
and specific dtype likes:
[['datetime64[D]', 'float64', 'string'], ['datetime64[D]', 'float64', 'string']]
how to define that?


RE: [Numpy] How to store different data type in one numpy array? - deanhystad - Mar-24-2024

Look at:

https://numpy.org/doc/stable/reference/generated/numpy.recarray.html
https://numpy.org/doc/stable/user/basics.rec.html


RE: [Numpy] How to store different data type in one numpy array? - snippsat - Mar-24-2024

Example like this.
>>> import numpy as np
>>>
>>> dtype = [('date', 'datetime64[D]'), ('value', 'float64'), ('code', 'U3')]
>>> n = np.array([('2024-03-22', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abc')], dtype=dtype)
>>> n
array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')],
      dtype=[('date', '<M8[D]'), ('value', '<f8'), ('code', '<U3')])
String of up to 3 characters <U3.
>>> n = np.array([('2024-03-22', 71.0, 'cehar'), ('2024-03-23', 63.0, 'abchhhhhhhhhh')], dtype=dtype)
>>> n
array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')],
      dtype=[('date', '<M8[D]'), ('value', '<f8'), ('code', '<U3')])
To Pandas dataFrame,as Panda i build on NumPy in bottom it seamlessly transfer over.
>>> import pandas as pd
>>> 
>>> df = pd.DataFrame(n)
>>> df
        date  value code
0 2024-03-22   71.0  ceh
1 2024-03-23   63.0  abc



RE: [Numpy] How to store different data type in one numpy array? - water - Mar-25-2024

Different data type seems just can be store into tuple then as array element in one array, can't store as standalone array element directly in one array.


RE: [Numpy] How to store different data type in one numpy array? - snippsat - Mar-25-2024

It's one array if add it like this.
>>> n = np.array([('2024-03-22', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abc')], dtype= [('date', 'datetime64[D]'), ('value', 'float64'), ('code', 'U3')]) 
>>> n
array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')],
      dtype=[('date', '<M8[D]'), ('value', '<f8'), ('code', '<U3')])
Can make it shorter like this,read doc .
>>> n = np.array([('2024-03-22', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abcyyyyyy')], dtype='datetime64[D], float64, U3')
>>> n
array([('2024-03-22', 71., 'ceh'), ('2024-03-23', 63., 'abc')],
      dtype=[('f0', '<M8[D]'), ('f1', '<f8'), ('f2', '<U3')])
Work the same if add date that's wrong will get error message.
>>> n = np.array([('2024-03-2299', 71.0, 'ceh'), ('2024-03-23', 63.0, 'abcyyyyyy')], dtype='datetime64[D], float64, U3')
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
ValueError: Error parsing datetime string "2024-03-2299" at position 10



RE: [Numpy] How to store different data type in one numpy array? - deanhystad - Mar-25-2024

What are you planning to do with the array?


RE: [Numpy] How to store different data type in one numpy array? - paul18fr - Mar-26-2024

agree with @deanhystad

For instance, if the goal is to recover data per type, I would imagine the following if you can use 2 arrays (possible?):

import numpy as np

Array1 = np.array([['2024-03-22', 71.0, 'ceh'], 
                   ['2024-03-23', 63.0, 'abc'],
                   ['2024-03-24', -50.6, 'zzzzzzz'],
                   ['2024-03-25', 13.8, 'lkj'],
                   ['2024-03-26', 05.2, 'dsfdssss'],
                   [935.2, 'hgjhg', '2024-03-27']                   
                   ])

TypeArray = np.array([['datetime64[D]', 'float64', 'string'],
                      ['datetime64[D]', 'float64', 'string'],
                      ['datetime64[D]', 'float64', 'string'],
                      ['datetime64[D]', 'float64', 'string'],
                      ['datetime64[D]', 'float64', 'string'],
                      ['float64', 'string', 'datetime64[D]']   # !!!!!!!!!!!!!!                  
                     ])


NumberOfTypes = np.unique(TypeArray)


# results are stored in a dictionary PER type but you can proceed differently
RecoveringDictionary = {}

for ntype in NumberOfTypes:
    Index = np.where(TypeArray == ntype)
    Extract = Array1[Index]
    
    if ntype == 'float64': Extract = Extract.astype(np.float64)
    # if ntype == 'datetime64[D]': Extract = Extract.astype(np.datetime64)   
    
    RecoveringDictionary.update({ ntype: Extract, })

    
# print results
for ntype in NumberOfTypes:
    print(f"{ntype} = {RecoveringDictionary[ntype]}\n")
Output:
datetime64[D] = ['2024-03-22' '2024-03-23' '2024-03-24' '2024-03-25' '2024-03-26' '2024-03-27'] float64 = [ 71. 63. -50.6 13.8 5.2 935.2] string = ['ceh' 'abc' 'zzzzzzz' 'lkj' 'dsfdssss' 'hgjhg']



RE: [Numpy] How to store different data type in one numpy array? - snippsat - Mar-26-2024

As deanhystad posted more info may be needed.
paul18fr good effort,but would say that look wrong in most cases.
The TypeArray dos not work(eg try with a wrong date) and repeat data unnecessary.

(Mar-23-2024, 08:55 PM)water Wrote: and specific dtype likes:
In first post he ask about specify dtype in a NumPy array.
Then we talk about Structured arrays.
To give one more example on how Structured arrays works
import numpy as np

# Sample data: Transaction ID, Date, Amount, Transaction Type
data = [
    (1001, '2023-01-01', 250.00, 'Deposit'),
    (1002, '2023-01-03', -100.00, 'Withdrawal'),
    (1003, '2023-01-05', 200.00, 'Deposit'),
    (1004, '2023-01-07', -50.00, 'Withdrawal'),
    (1005, '2023-01-09', 300.00, 'Deposit'),
]

# Define the dtype for the structured array
dtype = [
    ('trans_id', 'int32'),
    ('date', 'datetime64[D]'),
    ('amount', 'float64'),
    ('type', 'U10')  # Transaction type with up to 10 characters
]

transactions = np.array(data, dtype=dtype)
Structured arrays are particularly useful in scenarios where working with tabular data that mixes different data types,
and where want to perform efficient, vectorized operations on this data.

Take a look at data manipulation,this would not be possible if not specify dtype.
# Get all dates
 >>> transactions['date']
array(['2023-01-01', '2023-01-03', '2023-01-05', '2023-01-07',
       '2023-01-09'], dtype='datetime64[D]')

# Find all withdrawals
>>> withdrawals = transactions[transactions['type'] == 'Withdrawal']
>>> withdrawals
array([(1002, '2023-01-03', -100., 'Withdrawal'),
       (1004, '2023-01-07',  -50., 'Withdrawal')],
      dtype=[('trans_id', '<i4'), ('date', '<M8[D]'), ('amount', '<f8'), ('type', '<U10')])

# Calculate the total amount of deposits
>>> total_deposits = transactions[transactions['type'] == 'Deposit']['amount'].sum()
>>> total_deposits
750.0
Operations that are easily vectorized, staying within NumPy can be faster and more memory efficient.
Structured arrays can also easily be taken into Pandas if need more advanced stuff like grouping, Plot...
import pandas as pd

df = pd.DataFrame(transactions)
print(df)
Output:
trans_id date amount type 0 1001 2023-01-01 250.0 Deposit 1 1002 2023-01-03 -100.0 Withdrawal 2 1003 2023-01-05 200.0 Deposit 3 1004 2023-01-07 -50.0 Withdrawal 4 1005 2023-01-09 300.0 Deposit