I have written some code which takes in my dataframe which consists of two columns - one is a string and the other is an idea count - the code takes in the dataframe, tries several delimeters and cross references it with the count to check it is using the correct one. The result I am looking for is to add a new column called "Ideas" which contains the list of broken out ideas. My code is below:
def getIdeas(row):
s = str(row[0])
ic = row[1]
# Try to break on lines ";;"
my_dels = [";;", ";", ",", "\\", "//"]
for d in my_dels:
ideas = s.split(d)
if len(ideas) == ic:
return ideas
# Try to break on numbers "N)"
ideas = re.split(r'[0-9]\)', s)
if len(ideas) == ic:
return ideas
ideas = []
return ideas
# k = getIdeas(str_contents3, idea_count3)
xl = pd.ExcelFile("data/Total Dataset.xlsx")
df = xl.parse("Sheet3")
df1 = df.iloc[:,1:3]
df1 = df1.loc[df1.iloc[:,1] != 0]
df1["Ideas"] = df1.apply(getIdeas, axis=1)
When I run this I am getting an error
ValueError: could not broadcast input array from shape (5) into shape (2)
Could someone tell me how to fix this?
You have 2 option with apply
with axis=1
, ether you return a single value or a list of length that match the length your number of columns. if you match the number of columns in will be broadcast to the entire row. if you return a single value it will return a pandas Series
one work around would be not to use apply.
result = []
for idx, row in df1.iterrows():
result.append(getIdeas(row))
df1['Ideas'] = result
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.