[英]Compare each string element in a dataframe to a list and assign it to a column, python pandas
How to rearrange my dataframe according to column names while searching for specific strings in cells?在单元格中搜索特定字符串时,如何根据列名重新排列我的 dataframe?
My dataframe:我的 dataframe:
0 ![]() |
1 ![]() |
2 ![]() |
3 ![]() |
4 ![]() |
---|---|---|---|---|
apple pie![]() |
banana bread![]() |
orange juice![]() |
nan![]() |
nan![]() |
apple cookies![]() |
orange lemonade![]() |
nan![]() |
nan![]() |
nan![]() |
banana muffin![]() |
orange ice![]() |
berry candy![]() |
nan![]() |
nan![]() |
berry juice![]() |
nan![]() |
nan![]() |
nan![]() |
nan![]() |
I want to arrange the rows according to a list of column names, which look for specific strings of text.我想根据列名列表来排列行,该列表查找特定的文本字符串。
apple![]() |
banana![]() |
orange![]() |
berry![]() |
lemon![]() |
---|---|---|---|---|
apple pie![]() |
banana bread![]() |
orange juice![]() |
nan![]() |
nan![]() |
apple cookies![]() |
nan![]() |
orange lemonade![]() |
nan![]() |
nan![]() |
nan![]() |
banana muffin![]() |
orange ice![]() |
berry candy![]() |
nan![]() |
nan![]() |
nan![]() |
nan![]() |
berry juice![]() |
nan![]() |
I have tried to create a column/list for each fruit, searching for the right string and adding the cell if it matches, however I do not know how to iterate through the dataframe and assign values.我试图为每个水果创建一个列/列表,搜索正确的字符串并添加匹配的单元格,但是我不知道如何遍历 dataframe 并分配值。 I just get a column of Nan's.
我只是得到一个南的专栏。
col_names = ['apple', 'banana', 'orange', 'berry', 'lemonade']
apples = np.where(df_fruits.str.contains("apple", case=False, na=False), df_fruits, np.nan)
bananas = np.where(df_fruits.str.contains("banana", case=False, na=False), df_fruits, np.nan)
etc...
Edit: I got the dataframe from a csv-file, so the original data format is in rows of string: "apple pie, banana bread, orange juice, nan, nan" etc.编辑:我从 csv 文件中获得了 dataframe,因此原始数据格式为字符串行:“苹果派、香蕉面包、橙汁、nan、nan”等。
we can do some re-shaping using .unstack
and .str.extractall
我们可以使用
.unstack
和.str.extractall
进行一些重塑
pat = '|'.join(col_names)
s = df.stack()
s1 = s.to_frame('vals').join(
s.str.extractall(f'({pat})').groupby(level=[0,1]).agg(list))
out = s1.explode(0).set_index(0,append=True).reset_index(1,drop=True).unstack(-1)
print(out)
vals
0 apple banana berry lemonade orange
0 apple pie banana bread NaN NaN orange juice
1 apple cookies NaN NaN orange lemonade orange lemonade
2 NaN banana muffin berry candy NaN orange ice
3 NaN NaN berry juice NaN NaN
# if you want to drop the level on the multi index.
out.columns = out.columns.droplevel(None)
0 apple banana berry lemonade orange
0 apple pie banana bread NaN NaN orange juice
1 apple cookies NaN NaN orange lemonade orange lemonade
2 NaN banana muffin berry candy NaN orange ice
3 NaN NaN berry juice NaN NaN
Try this:尝试这个:
list_values=[item for value in df_fruits.values for item in value]
list_series=[]
for col in col_names:
locals()[col+"series"]=pd.Series(map(lambda x:x*(col in str(x)),list_values)
list_series.append(eval(col+"series"))
the first row is the get all your dataframe colums values into a list next we create a pandas series for every fruit type and append it into a list after we create a new data frame第一行是将所有 dataframe 列值放入列表中接下来我们为每种水果类型创建 pandas 系列,并在创建新数据框后将 append 放入列表中
new_df=pd.concat(list_series,axis=1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.