简体   繁体   English

使用 if 语句循环的有效方法

[英]Efficient way to loop with if statement

I have a sample data look like this (real dataset has more columns):我有一个像这样的示例数据(真实数据集有更多列):

data = {'stringID':['AB CD Efdadasfd','RFDS EDSfdsadf dsa','FDSADFDSADFFDSA'],'IDct':[1,3,4]}
data = pd.DataFrame(data)
data['Index1'] = [[3,6],[7,9],[5,6]]
data['Index2'] = [[4,8],[10,13],[8,9]]

在此处输入图片说明

What i want to achieve is i want to slice stringID column based on second elment in Index1 and Index2 (both are list), only if IDct value is bigger than 1, otherwise return NaN.我想要实现的是我想根据 Index1 和 Index2 中的第二个元素(都是列表)对 stringID 列进行切片,仅当 IDct 值大于 1 时,否则返回 NaN。

I tried this, it works as Output1 column, but there must be a better way (i mean faster when apply to a large dataset) to do it, please kindly advise, thanks!我试过了,它可以作为 Output1 列使用,但必须有更好的方法(我的意思是应用于大型数据集时更快)来做到这一点,请指教,谢谢!

data['pos'] = data.Index1.map(lambda x: x[1])
data['pos1'] = data.Index2.map(lambda x: x[1])

def cal(m):
    if m['IDct'] > 1:
        return m['stringID'][m['pos']:m['pos1']]
    else:
        return 'NaN'

data['Output1'] = data.apply(cal,axis=1)

在此处输入图片说明

I love pandas - but realistically speaking it's just one of many tools that belong in your tool belt.我喜欢熊猫 - 但实际上,它只是属于您工具带的众多工具之一。

pandas and numpy really shine for computation and analysis. pandas 和 numpy 非常适合计算和分析。 It's okay to use pandas to visualize and analyze your data - but that doesn't mean it's the right tool for the job.可以使用 Pandas 来可视化和分析您的数据 - 但这并不意味着它是适合这项工作的工具。

This kind of problem is better suited for regular python.这种问题更适合常规python。 Assuming we can, let's move StringID and IDct out of the dict and back into lists.假设我们可以,让我们将 StringID 和 IDct 从字典中移回列表中。 If we assume the result is regular in shape (all lists are of equal length)如果我们假设结果的形状是规则的(所有列表的长度相等)

StringID = ['AB CD Efdadasfd','RFDS EDSfdsadf dsa','FDSADFDSADFFDSA'],
IDct = [1,3,4]
Index1 = [[3,6],[7,9],[5,6]]
Index2 = [[4,8],[10,13],[8,9]]

for stringID, IDct, Index1, Index2 in zip(stringID, IDct, Index1, Index2):
    result = []
    if IDct > 1:
       result.append(your_indexing_goes_here())
    else:
       result.append(None) 

You can then blend the result data back in as you see fit.然后,您可以按照您认为合适的方式重新混合结果数据。

data = {
    'StringID': StringID,
    'IDct': IDct,
    'Index1': Index1,
    'Index2': Index2,
    'Result': result
}

pd.DataFrame(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM