[英]TypeError: string indices must be integers when updating a dataframe column using .apply
I am updating a column based on a substring in another column. 我正在基于另一列中的子字符串更新一列。 This has been done by iterating through the rows.
这是通过遍历行来完成的。
import pandas as pd
my_DestSystemNote1_string = 'ISIN=XS1906311763|CUSIP= |CalTyp=1'
dfDest = [('DestSystemNote1', ['ISIN=XS1906311763|CUSIP= |CalTyp=1',
'ISIN=XS0736418962|CUSIP= |CalTyp=1',
'ISIN=XS1533910508|CUSIP= |CalTyp=1',
'ISIN=US404280AS86|CUSIP=404280AS8|CalTyp=1',
'ISIN=US404280BW89|CUSIP=404280BW8|CalTyp=21',
'ISIN=US06738EBC84|CUSIP=06738EBC8|CalTyp=21',
'ISIN=XS0736418962|CUSIP= |CalTyp=1',]),
]
# create pandas df
dfDest = pd.DataFrame.from_items(dfDest)
def findnth(haystack, needle, n):
parts= haystack.split(needle, n+1)
if len(parts)<=n+1:
return -1
return len(haystack)-len(parts[-1])-len(needle)
def split_between(input_string,
start_str, start_occurence,
end_str, end_occurence
):
start_index = findnth(input_string, start_str, start_occurence-1) + len(start_str)
end_index = findnth(input_string, end_str, end_occurence-1) + len(end_str) -1
return input_string[start_index:end_index]
dfDest['FOUND_ISIN'] = ""
dfDest['FOUND_CUSIP'] = ""
dfDest.info()
for index, row in dfDest.iterrows():
try:
print(row.DestSystemNote1)
row.FOUND_ISIN = split_between(row.DestSystemNote1, "ISIN=", 1, "|", 1)
row.FOUND_CUSIP = split_between(row.DestSystemNote1, "CUSIP=", 1, "|", 2)
# print ('DestSystemNote1=' + row.DestSystemNote1 + " " + 'FOUND_ISIN= ' + row.FOUND_ISIN)
# print ('DestSystemNote1=' + row.DestSystemNote1 + " " + 'FOUND_CUSIP= ' + row.FOUND_CUSIP)
except:
pass # doing nothing on exception
To aid my learning, I would like to do the same thing but use the apply method with a lambda function ie update a third column FOUND_ISIN2
but I'm getting TypeError: string indices must be integers
为了帮助我的学习,我想做同样的事情,但是将lambda函数与apply方法一起使用,即更新第三列
FOUND_ISIN2
但是我遇到TypeError: string indices must be integers
dfDest['FOUND_ISIN2'] = dfDest["DestSystemNote1"].apply(lambda x: split_between(x['DestSystemNote1'], "ISIN=", 1, "|", 1))
When I place a sample sting into the function it returns a value 当我将样本字符串放入函数中时,它将返回一个值
dfDest['FOUND_ISIN2'] = dfDest["DestSystemNote1"].apply(lambda x: split_between('ISIN=XS1906311763|CUSIP= |CalTyp=1',"ISIN=", 1, "|", 1) )
So with this in mind I tried converting the DestSystemNote1
to string but the error raised again 因此,考虑到这一点,我尝试将
DestSystemNote1
转换为字符串,但错误再次出现
dfDest['FOUND_ISIN2'] = dfDest["DestSystemNote1"].apply(lambda x: split_between(x['DestSystemNote1'].astype('str'), "ISIN=", 1, "|", 1))
When using .apply do I convert the value parsed to the function to be string? 使用.apply时,是否将解析后的值转换为字符串? What going on under the hood here?
这到底是怎么回事?
you don't need lambda
or apply
. 您不需要
lambda
或apply
。 stick to pandas and you're done in three steps (probably this can be done with less than that, too): 坚持熊猫,您就可以分三个步骤完成(可能也可以用不到这一步来完成):
# 1 - Create DataFrame
import pandas as pd
dfDest = pd.DataFrame.from_items(dfDest)
# 2 - String parsing
cols = ['ISIN','CUSIP', 'CalTyp'] # Define Columns
dfDest[cols] = dfDest['DestSystemNote1'].str.split('|', n=-1, expand=True) # Split Strings to columns
# 3 - Replace unwanted parts of raw data
for header in cols: # look at every column and remove its header string from the data
dfDest[header] = dfDest[header].str.replace(header + "=", '') # and add "=" to pattern you want to remove
print dfDest
Output: 输出:
DestSystemNote1 ISIN CUSIP CalTyp
0 ISIN=XS1906311763|CUSIP= |CalTyp=1 XS1906311763 1
1 ISIN=XS0736418962|CUSIP= |CalTyp=1 XS0736418962 1
2 ISIN=XS1533910508|CUSIP= |CalTyp=1 XS1533910508 1
3 ISIN=US404280AS86|CUSIP=404280AS8|CalTyp=1 US404280AS86 404280AS8 1
4 ISIN=US404280BW89|CUSIP=404280BW8|CalTyp=21 US404280BW89 404280BW8 21
5 ISIN=US06738EBC84|CUSIP=06738EBC8|CalTyp=21 US06738EBC84 06738EBC8 21
6 ISIN=XS0736418962|CUSIP= |CalTyp=1 XS0736418962 1
happy coding. 快乐的编码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.