简体   繁体   English

Python,Pandas:将数据框过滤为一个子集并就地更新此子集

[英]Python, Pandas: Filter dataframe to a subset and update this subset in place

I have a pandas dataframe that looks like: 我有一个熊猫数据框,看起来像:

cleanText.head()
    source      word    count
0   twain_ess            988
1   twain_ess   works    139
2   twain_ess   short    139
3   twain_ess   complete 139
4   twain_ess   would    98
5   twain_ess   push     94

And a dictionary that contains the total word count for each source: 还有一个字典,其中包含每个来源的总字数:

titles
{'orw_ess': 1729, 'orw_novel': 15534, 'twain_ess': 7680, 'twain_novel': 60004}

My goal is to normalize the word counts for each source by the total number of words in that source, ie turn them into a percentage. 我的目标是通过每个来源中的单词总数对每个来源的单词计数进行归一化,即将它们转换为百分比。 This seems like it should be trivial but python seems to make it very difficult (if anyone could explain the rules for inplace operations to me that would be great). 这似乎应该是微不足道的,但是python似乎使它变得非常困难(如果有人可以向我解释就地操作的规则,那将是很棒的)。

The caveat comes from needing to filter the entries in cleanText to just those from a single source, and then I attempt to inplace divide the counts for this subset by the value in the dictionary. 警告来自需要将cleanText的条目过滤为仅来自单个来源的条目,然后我尝试就该子集的计数除以字典中的值。

# Adjust total word counts and normalize
for key, value in titles.items():

    # This corrects the total words for overcounting the '' entries
    overcounted= cleanText[cleanText.iloc[:,0]== key].iloc[0,2]
    titles[key]= titles[key]-overcounted

    # This is where I divide by total words, however it does not save inplace, or at all for that matter
    cleanText[cleanText.iloc[:,0]== key].iloc[:,2]= cleanText[cleanText.iloc[:,0]== key]['count']/titles[key]

If anyone could explain how to alter this division statement so that the output is actually saved in the original column that would be great. 如果有人可以解释如何更改此除法语句,以便将输出实际上保存在原始列中,那将是很好的。

Thanks 谢谢

If I understand Correctly: 如果我正确理解:

cleanText['count']/cleanText['source'].map(titles)

Which gives you: 这给你:

0    0.128646
1    0.018099
2    0.018099
3    0.018099
4    0.012760
5    0.012240
dtype: float64

To re-assign these percentage values into your count column, use: 要将这些百分比值重新分配给您的count列,请使用:

cleanText['count'] = cleanText['count']/cleanText['source'].map(titles)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM