简体   繁体   English

如果单元格包含多个字符串,请放入Pandas中的新单元格

[英]If cell contains more than one string, put in to the new cell in Pandas

So I'm working with Pandas and I have multiple words (ie strings) in one cell, and I need to put every word into the new row and keep coordinated data. 所以我正在与Pandas一起工作,并且在一个单元格中有多个单词(即字符串),因此我需要将每个单词放到新行中并保持协调的数据。 I've found a method which could help me,but it works with numbers, not strings. 我找到了一种可以帮助我的方法,但是它适用于数字,而不是字符串。 So what method do I need to use? 那我需要用什么方法呢?

Simple example of my table: 我的表的简单示例:

id name     method
1  adenosis mammography, mri

And I need it to be: 我需要它是:

id name     method
1  adenosis mammography
            mri

Thanks! 谢谢!

UPDATE: 更新:

That's what I'm trying to do, according to @jezrael's proposal: 根据@jezrael的建议,这就是我想要做的:

import pandas as pd
import numpy as np
xl = pd.ExcelFile("./dev/eyetoai/google_form_pure.xlsx")
xl.sheet_names
df = xl.parse("Form Responses 1")
df.groupby(['Name of condition','Condition description','Relevant Modality','Type of finding Mammography', 'Type of finding MRI', 'Type of finding US']).mean()
splitted = df['Relevant Modality'].str.split(',')
l = splitted.str.len()
df = pd.DataFrame({col: np.repeat(df[col], l) for col in ['Name of condition','Condition description']})
df['Relevant Modality'] = np.concatenate(splitted)

But I have this type of error: TypeError: repeat() takes exactly 2 arguments (3 given) 但是我有这种类型的错误:TypeError:repeat()正好接受2个参数(给定3个)

You can use read_excel + split + stack + drop + join + reset_index : 您可以使用read_excel + split + stack + drop + join + reset_index

#define columns which need split by , and then flatten them
cols = ['Condition description','Relevant Modality']

#read csv to dataframe
df = pd.read_excel('Untitled 1.xlsx')
#print (df)

df1 = pd.DataFrame({col: df[col].str.split(',', expand=True).stack() for col in cols})
print (df1)
                                 Condition description Relevant Modality
0 0  Fibroadenomas are the most common cause of a b...       Mammography
  1                                                NaN                US
  2                                                NaN               MRI
1 0                    Papillomas are benign neoplasms       Mammography
  1                                  arising in a duct                US
  2   either centrally or peripherally within the b...               MRI
  3   leading to a nipple discharge. As they are of...               NaN
  4                 the discharge may be bloodstained.               NaN
2 0                                                 OK       Mammography
3 0                                      breast cancer       Mammography
  1                                                NaN                US
4 0                                breast inflammation       Mammography
  1                                                NaN                US

#remove original columns
df = df.drop(cols, axis=1)
#create Multiindex in original df for align rows
df.index = [df.index, [0]* len(df.index)]
#join original to flattened columns, remove Multiindex
df = df1.join(df).reset_index(drop=True)
#print (df)

The previous answer is correct, I think you should use the id of reference. 先前的答案是正确的,我认为您应该使用参考ID。 an easier way could possibly be to just parse the method string to a list: 一种更简单的方法可能只是将方法字符串解析为列表:

method_list = method.split(',')
method_list = np.asarray(method_list)

If you have any trouble with indexing when initializing your Dataframe, just set index to: 如果在初始化数据框时在索引方面遇到任何麻烦,只需将index设置为:

pd.Dataframe(data, index=[0,0])
df.set_index('id')

passing the list as a value for your method key will automatically create a copy of both the index - 'id' and 'name' 将列表作为方法键的值传递时,将自动创建索引的副本-'id'和'name'

id       method      name
1   mammography  adenosis
1           mri  adenosis

I hope this helps, all the best 我希望这对你有帮助

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 对于不是一个单元格而是整个列,Pandas过滤多个“包含” - Pandas filter by more than one “contains” for not one cell but entire column 如何在 Pandas 的单元格中删除具有多个值的行 - How to remove rows with more than one value in a cell in Pandas Jupyter笔记本新电池已超过一行 - Jupyter notebook new cell has more than one line 检查Pandas DataFrame单元格是否包含某些字符串 - Check if Pandas DataFrame cell contains certain string 如果包含管道,则在熊猫数据帧单元格中格式化字符串 - Format string in pandas dataframe cell if it contains a pipe 如果另一个单元格包含 Pandas 中的特定文本,则在一个单元格中生成一个值 - Produce a value in one cell, if another cell contains a specific text in Pandas 如何在每个单元格中具有一排以​​上的列的大熊猫中读取Excel文件 - How to read excel file in pandas with a column with more than one rows within each cell 如何在单个 jupyter 单元格中显示多个 pandas describe() output? - How do I display more than one pandas describe() output in a single jupyter cell? 如果其中一个单元格包含所有大写字符串的列表,则从 Pandas 数据框中删除行 - Removing a rows from pandas data frame if one of its cell contains list of all caps string Python pandas 检查单元格中列表的最后一个元素是否包含特定字符串 - Python pandas check if the last element of a list in a cell contains specific string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM