简体   繁体   English

Python:从列表创建2列数据框并在列表上进行计算

[英]Python: Creating a 2-column dataframe from list and a computation on the list

I'm taking my first baby steps in python and I'm hoping you can help me with the following: 我正在使用python迈出第一步,希望您可以在以下方面为我提供帮助:

I have a list 我有一个清单

scores = [1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5]

And I would like to create a dataframe that has scores in column 1 and the frequency of the scores in column 2. 我想创建一个数据框,该数据框在第1列中具有得分,在第2列中具有得分的频率。

Any help or pointers is appreciated. 任何帮助或指针表示赞赏。 Thanks! 谢谢!

My first attempt was not very good: 我的第一次尝试不是很好:

scores = [1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5]
freq = []
df = {'col1': scores, 'col2': freq}

First off, create a Counter object to count the frequency of each score. 首先,创建一个Counter对象来计算每个乐谱的频率。

In [1]: scores = [1,1,1,2,2,2,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5]

In [2]: from collections import Counter

In [3]: score_counts = Counter(scores)

In [4]: score_counts
Out[4]: Counter({5: 12, 4: 8, 3: 4, 1: 3, 2: 3})

In [5]: import pandas as pd

In [6]: pd.DataFrame.from_dict(score_counts, orient='index')
Out[6]: 

    0
1   3
2   3
3   4
4   8
5  12

[5 rows x 1 columns]

The part that may trip up some users is the pd.DataFrame.from_dict() . 可能会使某些用户绊倒的部分是pd.DataFrame.from_dict() The documentation is here: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.from_dict.html 该文档位于此处: http : //pandas.pydata.org/pandas-docs/dev/genic/pandas.DataFrame.from_dict.html

I would use value_counts (eg here for the Series docs). 我将使用value_counts (例如, 此处为Series文档)。 Note that I've changed the data here a little: 请注意,我在这里稍微更改了数据:

>>> import pandas as pd
>>> scores = [1]*3 + [2]*3 + [3]*4 + [4]*1 + [5]*4
>>> pd.value_counts(scores)
5    4
3    4
2    3
1    3
4    1
dtype: int64

And you can change the output as you like: 您可以根据需要更改输出:

>>> pd.value_counts(scores, ascending=True)
4    1
1    3
2    3
3    4
5    4
dtype: int64
>>> pd.value_counts(scores).sort_index()
1    3
2    3
3    4
4    1
5    4
dtype: int64
>>> pd.value_counts(scores).sort_index().to_frame()
   0
1  3
2  3
3  4
4  1
5  4

To calculate the frequencies: 要计算频率:

freq = {}
for score in scores:
     freq[score] = freq.get(score, 0) + 1

This will give you a dictionary with keys mapping to the frequency of the key values. 这将为您提供一个字典,其中的键映射到键值的频率。 Then to create two columns you can just create a dictionary such as: 然后,要创建两列,您可以只创建一个字典,例如:

data = {'scores': scores, 'freq': freq}

You could also accomplish this using a list comprehension where the index of a list is equal to your score and the value is the frequency, but if the range of your scores is large this will require a large, sparse array, so you may be better off using a dictionary as above 您也可以使用列表理解来实现此目的,其中列表的索引等于您的分数,值是频率,但是如果分数的范围较大,则将需要较大的稀疏数组,因此您可能会更好如上使用字典

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 合并和内插2个dataframe列并从中创建唯一的数字列表 - python - combining and interpolating 2 dataframe column and creating unique number list from it 从 Dask dataframe 列创建列表的方法 - Ways of Creating List from Dask dataframe column 使用Python中的Dataframe索引创建包含数据的列表 - Creating a list with data from a Dataframe index in Python PYTHON:从元组列表创建排列的数据框 - PYTHON: Creating a dataframe of permutations from a list of tuples 从数据框创建列表 - creating list from dataframe pandas append 列从 dataframe 到 python 中的列表 - pandas append column from dataframe to a list in python 检查数据框列中的值是否在列表中-Python - Check if value from a dataframe column is in a list - Python 在 Python 中创建一个 function 生成一个 2 列数组,计算列表的每个元素中出现的 substring 和每个元素的长度? - Create a function in Python that generates a 2-column array counting the substring occurence in each element of a list and the length of each element? 使用列表创建 dataframe - python - creating a dataframe by using a list - python 从列表名称作为列名称的多个列表创建数据框 - Creating a dataframe from multiple lists with list names as column names
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM