Hello! I'm working with a dataset that has a rather inconvenient format where one of the columns is basically a list of name-value pair dictionaries. I would like to expand that column such that each of the names is it's own column. So far, I've found a way to do it by manually extracting each of the values, but ideally, I would prefer a more general solution that is also efficient. Here's an example:
Maybe the solution is to use pd.json_normalize on the preferences column, pivot that, then append the various dataframes?
import pandas as pd data = {'name': ['Alice', 'Bob', 'Clark'], 'preferences': [[{'name': 'fruit', 'value': 'apple'}, {'name': 'drink', 'value': 'lemonade'}, {'name': 'food', 'value': 'pizza'}], [{'name': 'fruit', 'value': 'orange'}, {'name': 'drink', 'value': 'soda'}, {'name': 'food', 'value': 'soup'}], [{'name': 'fruit', 'value': 'pear'}, {'name': 'drink', 'value': 'water'}, {'name': 'food', 'value': 'chicken'}]]} df = pd.DataFrame(data) # Extract values from 'preferences' column df['fruit'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'fruit'][0]) df['drink'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'drink'][0]) df['food'] = df['preferences'].apply(lambda x: [item['value'] for item in x if item['name'] == 'food'][0]) # Drop the 'preferences' column df = df.drop(columns=['preferences'])An additional complication is that not every column has the same name-value pairs. In that case, the method above fails (IndexError) without doing an additional check, which is even more inefficient.
Maybe the solution is to use pd.json_normalize on the preferences column, pivot that, then append the various dataframes?