[英]How can I split the output of str.split() column in pandas?
Here's the thing, I have this sort of dataset (let's call it df ):事情是这样的,我有这种数据集(我们称之为df ):
id text
A1 How was your experience?: Great\nWhat did you buy?: A book\n
B1 How was your experience?: Good\nWhat did you buy?: A pen\n
C2 How was your experience?: Awful\nWhat did you buy?: A pencil\n
As you can see, this is a table containing a survey and I'm trying to get only the answers from the column text.如您所见,这是一个包含调查的表格,我试图仅从列文本中获取答案。 My first tought was to try to split the text, just like this:
我的第一个任务是尝试拆分文本,就像这样:
df['text_splitted'] = df.text.str.split('\n')
And then I would do something like this:然后我会做这样的事情:
df['final_text'] = df. text_splitted.str.split(':')
However, final_text is returning NaN .但是, final_text正在返回NaN 。 What just happened?
刚才发生了什么? Why is the new column returning null?
为什么新列返回 null? Is there any way I can fix this (or a better way to do what I'm trying to do here)?
有什么办法可以解决这个问题(或者更好的方法来做我想做的事情)?
As you wrote you need to split two times your column text
.正如您所写,您需要将列
text
拆分两次。 Afterward you can create a dataframe with 3 columns:之后,您可以创建一个包含 3 列的 dataframe:
id
from your original dataframeid
question
(even rows) from the previous splitquestion
(偶数行)answer
(odd rows) from the previous splitanswer
(奇数行)text = df["text"].str.strip().str.split("\n").explode().str.split(": ").explode()
out = pd.merge(df["id"], pd.DataFrame({"question": text[0::2], "answer": text[1::2]}),
left_index=True, right_index=True).reset_index(drop=True)
What do you think about this format?您如何看待这种格式?
>>> out
id question answer
0 A1 How was your experience? Great
1 A1 What did you buy? A book
2 B1 How was your experience? Good
3 B1 What did you buy? A pen
4 C2 How was your experience? Awful
5 C2 What did you buy? A pencil
You can use a combination of.apply() and.split() to get the answers您可以使用 .apply() 和 .split() 的组合来获得答案
df = pd.DataFrame({'text': ['How was your experience?: Great\nWhat did you buy?: A book\n']})
Input DF输入DF
text
0 How was your experience?: Great\nWhat did you ..
Split into questions and answers拆分成问题和答案
df['questions'] = df['text'].apply(lambda x: [y.split(":")[0] for y in x.split("\n")])
df['answers'] = df['text'].apply(lambda x: [y.split(":")[1] for y in x.split("\n") if len(y)>1])
Output DF Output DF
answers questions
0 [ Great, A book] [How was your experience?, What did you buy?, ]
You can try this:你可以试试这个:
df.set_index('id')['text'].str.replace(r'\\n$', '').str.split(r'\\n').explode().str.split(': ', expand=True)
0 1
id
A1 How was your experience? Great
A1 What did you buy? A book
B1 How was your experience? Good
B1 What did you buy? A pen
C2 How was your experience? Awful
C2 What did you buy? A pencil
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.