简体   繁体   中英

Selecting data from multiple dataframe columns and compiling into one list

I'm new to python and associated libraries and am hacking around on my own in a forest of syntax, datatypes etc. I'd greatly appreciate any advice on the following problem: I'm trying to select values from multiple columns ("Numbers" and "Numbers2" in the example below) in a dataframe based on a value in one column ("Letters") and then combine the values from the different columns into one list so I can do statistical analysis on the combined values. After a certain amount of trial and error, I've got the following which seems to work...but feels a bit clunky. Is there a better way?!

Many thanks!

Letters = ["A","B", "C", "C", "D", "D", "D"]
Numbers =[1,1,1,2,1,2,3]
Numbers2 =[10,10,10,20,10,20,30]

test_dict={"Letter":Letters, "Number":Numbers, "Number2":Numbers2}
test=pd.DataFrame(test_dict)

numbers_by_letters =[]

for unique_letter in test["Letter"].unique(): 
    numbers_by_letter =[]
    for col in range (1, 3) :
        number_by_letter=test[test["Letter"] == unique_letter].iloc[:,col]
        numbers_by_letter.extend(number_by_letter)
    numbers_by_letters.append(numbers_by_letter) 

print (numbers_by_letters)

The output I get is shown below and is what I think I want!

[ [1, 10], [1, 10], [1, 2, 10, 20], [1, 2, 3, 10, 20, 30]]

print(
    test.groupby("Letter")
    .apply(lambda x: sorted(x[["Number", "Number2"]].to_numpy().flatten()))
    .to_list()
)

Prints:

[[1, 10], [1, 10], [1, 2, 10, 20], [1, 2, 3, 10, 20, 30]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM