简体   繁体   中英

Restructure an array of arrays and combine same terms

I'm trying to write a function that takes an array of arrays, and restructures it into a different form, with certain conditions. For example let's say:

array = [
    ["City1","Spanish", "163"],
    ["City1", "French", "194"],
    ["City2","English", "1239"],
    ["City2","Spanish", "1389"],
    ["City2", "French", "456"]
]

So I want to create a new array which is sorted by cities alphabetically, and columns by languages(sorting on columns optional), any nulls will get replaced by 0. For example, and output to the above array should be:

[
[0, 163, 194],
[1239, 1389, 456]
]

I wrote this method, but II'm not sure if it makes sense logically. It is definitely hard coded and I am trying to make it so that it can be used for any input in the above format.

import numpy as np

new_array = [[]]
x = 'City1'
y = 'City2'

def solution(arr):
    for row in arr:
        if row[0]==x:
            new_array[-1].append(row[2])
        else:
            x = x + 1
            c.append([row[2]])
solution(array)

I know I need to fix the syntax, and also write a loop for sorting things alphabetically. Any help on this would be appreciated, I would like to understand how to iterate through an array like this and perform different functions and restructure the array to the new format.

If performance is not your overriding concern, you can use Pandas with Categorical Data and groupby . This works because, by default, groupby with categoricals uses the Cartesian product of categorical series:

import pandas as pd, numpy as np

# construct dataframe
df = pd.DataFrame(array, columns=['city', 'language', 'value'])

# convert to categories
for col in ['city', 'language']:
    df[col] = df[col].astype('category')

# groupby.first or groupby.sum works if you have unique combinations
res = df.sort_values(['city', 'language'])\
        .groupby(['city', 'language']).first().fillna(0).reset_index()

print(res)

    city language value
0  City1  English     0
1  City1   French   194
2  City1  Spanish   163
3  City2  English  1239
4  City2   French   456
5  City2  Spanish  1389

Then, for your desired list of lists output:

res_lst = res.groupby('city')['value'].apply(list).tolist()
res_lst = [list(map(int, x)) for x in res_lst]

print(res_lst)

[[0, 194, 163], [1239, 456, 1389]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM