简体   繁体   English

如何编写最有效的方法来为数据框python中的列添加值?

[英]How to write most efficient way to add a value for an column in dataframe python?

I have one data frame df consisting of a 2 columns(word and meaning/definition of that word). 我有一个由两列组成的数据帧df (单词和该单词的含义/定义)。 I want to use the Collections.Counter object for each definition of a word and count the frequency of words occurring in the definition in the most pythonic way possible. 我想对单词的每个定义使用Collections.Counter对象,并以尽可能多的Python方式计算在定义中出现的单词的频率。

The traditional approach would be to iterate over the data frame using the iterrows() methods and do the computations. 传统方法是使用iterrows()方法遍历数据帧并进行计算。

Sample output 样品输出

 <table style="height: 59px;" border="True" width="340"> <tbody> <tr> <td>Word</td> <td>Meaning</td> <td>Word Freq</td> </tr> <tr> <td>Array</td> <td>collection of homogeneous datatype</td> <td>{'collection':1,'of':1....}</td> </tr> <tr> <td>&nbsp;</td> <td>&nbsp;</td> <td>&nbsp;</td> </tr> </tbody> </table> 

I would take advantage of Pandas str accessor methods and do this 我会利用Pandas str访问器方法并做到这一点

from collections import Counter
Counter(df.definition.str.cat(sep=' ').split())

Some Test data 一些测试数据

df = pd.DataFrame({'word': ['some', 'words', 'yes'], 'definition': ['this is a definition', 'another definition', 'one final definition']})

print(df)
             definition   word
0  this is a definition   some
1    another definition  words
2  one final definition    yes

And then concatenating and splitting by space and using Counter 然后按空间串联和拆分并使用Counter

Counter(df.definition.str.cat(sep=' ').split())

Counter({'a': 1,
         'another': 1,
         'definition': 3,
         'final': 1,
         'is': 1,
         'one': 1,
         'this': 1})

Assuming that df has two columns 'word' and 'definition' , then you simply use the .map method with Counter on the definition series after splitting on space. 假设df有两列'word''definition' ,则在空间上分割后,只需将.map方法与Counter一起用于definition系列。 Then sum the result. 然后对结果求和。

from collections import Counter

def_counts = df.definition.map(lambda x: Counter(x.split()))
all_counts = def_counts.sum()

I intend for this answer to be useful but not the chosen answer. 我希望这个答案有用,但不是所选择的答案。 In fact, I'm only making an argument for Counter and @TedPetrou's answer. 实际上,我只是在为Counter和@TedPetrou的答案争论。

create large example of random words 创建随机单词的大型示例

a = np.random.choice(list(ascii_lowercase), size=(100000, 5))

definitions = pd.Series(
    pd.DataFrame(a).sum(1).values.reshape(-1, 10).tolist()).str.join(' ')

definitions.head()

0    hmwnp okuat sexzr jsxhh bdoyc kdbas nkoov moek...
1    iiuot qnlgs xrmss jfwvw pmogp vkrvl bygit qqon...
2    ftcap ihuto ldxwo bvvch zuwpp bdagx okhtt lqmy...
3    uwmcs nhmxa qeomd ptlbg kggxr hpclc kwnix rlon...
4    npncx lnors gyomb dllsv hyayw xdynr ctwvh nsib...
dtype: object

timing 定时
Counter is an order of 1000 times faster than fastest I could think of. Counter比我能想到的最快速度快1000倍。

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何通过python / pandas中另一个数据框的值来标记一个数据框的列的最有效方式? - How to flag the most efficient way a column of a dataframe by values of another dataframe's in python/pandas? 如何以最有效的方式编写 - How to write this in most efficient way Dataframes 填充 dataframe 列的最有效方法 - Dataframes the most efficient way to fill the column of dataframe pandas DataFrame 中映射列的最有效方法 - Most efficient way of mapping column in pandas DataFrame 以最有效的方式编写python代码 - write a python code the most efficient way 这是在 Python 中编写 Luhn 算法的最有效方法吗? - Is this the most efficient way to write the Luhn algorithm in Python? 从Pandas DataFrame中选择有限值的最新索引的有效方法? - Efficient way to select most recent index with finite value in column from Pandas DataFrame? 在 Python/Pandas 中,将自定义 function 应用于输入包含字符串的 dataframe 的列的最有效方法是什么? - In Python/Pandas, what is the most efficient way, to apply a custom function, to a column of a dataframe, where the input includes strings? 将新数据行添加到 Python 中的 DataFrame 的最有效/最快的方法是什么 - What is the most efficient/fastest way to add new rows of data to a DataFrame in Python 将新列添加到pandas数据帧的有效方法 - Efficient way to add new column to pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM