I'm new to python and associated libraries and am hacking around on my own in a forest of syntax, datatypes etc. I'd greatly appreciate any advice on the following problem: I'm trying to select values from multiple columns ("Numbers" and "Numbers2" in the example below) in a dataframe based on a value in one column ("Letters") and then combine the values from the different columns into one list so I can do statistical analysis on the combined values. After a certain amount of trial and error, I've got the following which seems to work...but feels a bit clunky. Is there a better way?!
Many thanks!
Letters = ["A","B", "C", "C", "D", "D", "D"]
Numbers =[1,1,1,2,1,2,3]
Numbers2 =[10,10,10,20,10,20,30]
test_dict={"Letter":Letters, "Number":Numbers, "Number2":Numbers2}
test=pd.DataFrame(test_dict)
numbers_by_letters =[]
for unique_letter in test["Letter"].unique():
numbers_by_letter =[]
for col in range (1, 3) :
number_by_letter=test[test["Letter"] == unique_letter].iloc[:,col]
numbers_by_letter.extend(number_by_letter)
numbers_by_letters.append(numbers_by_letter)
print (numbers_by_letters)
The output I get is shown below and is what I think I want!
[ [1, 10], [1, 10], [1, 2, 10, 20], [1, 2, 3, 10, 20, 30]]
print(
test.groupby("Letter")
.apply(lambda x: sorted(x[["Number", "Number2"]].to_numpy().flatten()))
.to_list()
)
Prints:
[[1, 10], [1, 10], [1, 2, 10, 20], [1, 2, 3, 10, 20, 30]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.