如何拆分 pandas 中 str.split() 列的 output？

Question

Here's the thing, I have this sort of dataset (let's call it df ):事情是这样的，我有这种数据集（我们称之为df ）：

id       text
A1       How was your experience?: Great\nWhat did you buy?: A book\n
B1       How was your experience?: Good\nWhat did you buy?: A pen\n
C2       How was your experience?: Awful\nWhat did you buy?: A pencil\n

As you can see, this is a table containing a survey and I'm trying to get only the answers from the column text.如您所见，这是一个包含调查的表格，我试图仅从列文本中获取答案。 My first tought was to try to split the text, just like this:我的第一个任务是尝试拆分文本，就像这样：

df['text_splitted'] = df.text.str.split('\n')

And then I would do something like this:然后我会做这样的事情：

df['final_text'] = df. text_splitted.str.split(':')

However, final_text is returning NaN .但是， final_text正在返回NaN 。 What just happened?刚才发生了什么？ Why is the new column returning null?为什么新列返回 null？ Is there any way I can fix this (or a better way to do what I'm trying to do here)?有什么办法可以解决这个问题（或者更好的方法来做我想做的事情）？

Answer 1

As you wrote you need to split two times your column text .正如您所写，您需要将列text拆分两次。 Afterward you can create a dataframe with 3 columns:之后，您可以创建一个包含 3 列的 dataframe：

id from your original dataframe来自原始 dataframe 的id
question (even rows) from the previous split上一次拆分的question （偶数行）
answer (odd rows) from the previous split上一次拆分的answer （奇数行）

text = df["text"].str.strip().str.split("\n").explode().str.split(": ").explode()

out = pd.merge(df["id"], pd.DataFrame({"question": text[0::2], "answer": text[1::2]}),
               left_index=True, right_index=True).reset_index(drop=True)

What do you think about this format?您如何看待这种格式？

>>> out
   id                  question    answer
0  A1  How was your experience?     Great
1  A1         What did you buy?    A book
2  B1  How was your experience?      Good
3  B1         What did you buy?     A pen
4  C2  How was your experience?     Awful
5  C2         What did you buy?  A pencil

Answer 2

You can use a combination of.apply() and.split() to get the answers您可以使用 .apply() 和 .split() 的组合来获得答案

df = pd.DataFrame({'text': ['How was your experience?: Great\nWhat did you buy?: A book\n']})

Input DF输入DF

    text
0   How was your experience?: Great\nWhat did you ..

Split into questions and answers拆分成问题和答案

df['questions'] = df['text'].apply(lambda x: [y.split(":")[0] for y in x.split("\n")])
df['answers'] = df['text'].apply(lambda x: [y.split(":")[1] for y in x.split("\n") if len(y)>1])

Output DF Output DF

    answers              questions
0   [ Great, A book]    [How was your experience?, What did you buy?, ]

Answer 3

You can try this:你可以试试这个：

df.set_index('id')['text'].str.replace(r'\\n$', '').str.split(r'\\n').explode().str.split(': ', expand=True)

                           0         1
id                                    
A1  How was your experience?     Great
A1         What did you buy?    A book
B1  How was your experience?      Good
B1         What did you buy?     A pen
C2  How was your experience?     Awful
C2         What did you buy?  A pencil

如何拆分 pandas 中 str.split() 列的 output？

问题描述

3 个解决方案

解决方案1
1 2021-05-17 23:45:32

解决方案2
0 2021-05-17 23:19:40

解决方案3
0 2021-05-18 00:15:51

如何拆分 pandas 中 str.split() 列的 output？

问题描述

3 个解决方案

解决方案1 1 2021-05-17 23:45:32

解决方案2 0 2021-05-17 23:19:40

解决方案3 0 2021-05-18 00:15:51

解决方案1
1 2021-05-17 23:45:32

解决方案2
0 2021-05-17 23:19:40

解决方案3
0 2021-05-18 00:15:51