简体   繁体   English

DataFrame,Apply,Lambda,列表理解

[英]DataFrame, apply, lambda, list comprehension

I'm trying to do a bit of cleanse to some data sets, I can accomplish the task with some for loops but I wanted a more pythonic/pandorable way to do this. 我正在尝试对某些数据集进行一些清理,我可以使用一些for循环来完成任务,但是我想要一种更具Pythonic /可扩展性的方法来执行此操作。

This is the code I came up with, the data is not real..but it should work 这是我想出的代码,数据不是真实的..但是应该可以

import pandas as pd

# This is a dataframe containing the correct values
correct = pd.DataFrame([{"letters":"abc","data":1},{"letters":"ast","data":2},{"letters":"bkgf","data":3}])

# This is the dataframe containing source data
source = pd.DataFrame([{"c":"ab"},{"c":"kh"},{"c":"bkg"}])

for i,word in source["c"].iteritems():
    for j,row in correct.iterrows():       
        if word in row["letters"]:           
            source.at[i,"c"] = row["data"]    
            break

This is my attempt to a pandorable way but it fails because of the list comprehension returning a generator: 这是我的一种可笑的尝试,但是由于列表理解返回了一个生成器而失败了:

source["c"] = source["c"].apply(
lambda x: row["data"] if x in row["letters"] else x for row in 
correct.iterrows() 
)

Here's one solution using pd.Series.apply with next and a generator expression: 这是将pd.Series.applynext和生成器表达式结合使用的一种解决方案:

def update_value(x):
    return next((k for k, v in correct.set_index('data')['letters'].items() if x in v), x)

source['c'] = source['c'].apply(update_value)

print(source)

    c
0   1
1  kh
2   3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM