繁体   English   中英

如何使用 python 将列中的列表拆分为 dataframe 中的两列?

[英]How to split a list in a column into two column in a dataframe using python?

如何使用 python 将列中的列表拆分为 dataframe 中的两列? 例如:

  row  |  column_A                
  ==================================
  1    |[('Ahli', 'NNP'),          |
       | ('paleontologi', 'NNP'),  | 
       | ('Thomas', 'NNP'),        |
       | ('dan', 'CC'),            |
       | ('timnya', 'RB'),         |
       | ('.', 'Z')],              |
  2    |[('fosil', 'NN'),          |
       | ('mamalia', 'NN'),        |
       | ('yang', 'SC'),           |
       | ('menghuni', 'VB'),       |
       | ('Antartika', 'NNP')]     |

我只想从列表中获取第二个字符串:

  row  |  column_A                 | postag
  =======================================
  1    |[('Ahli', 'NNP'),          |[('NNP'),
       | ('paleontologi', 'NNP'),  | (NNP),
       | ('Thomas', 'NNP'),        | (NNP),
       | ('dan', 'CC'),            | (CC),
       | ('timnya', 'RB'),         | (RB),
       | ('.', 'Z')],              | (Z)],
  2    |[('fosil', 'NN'),          |[('NN'),
       | ('mamalia', 'NN'),        | ('NN'), 
       | ('yang', 'SC'),           | ('SC),
       | ('menghuni', 'VB'),       | ('VB'),
       | ('Antartika', 'NNP')]     | ('NNP)]

添加到@Biranchi 的答案,正确的答案是

df['postag'] = df['column_A'].apply(lambda x: [(i[1],) for i in x])

结果将是

# print(df)

                                        column_A                      postag
0  [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN...  [(NNP,), (NNP,), (NNP,), ...

尝试在退出列上使用 apply function 以获得具有所需结果的新列

示例伪代码:

df['postag'] = df['column_A'].apply(your_function)

在 your_function 中,编写将 pos 标签与元组列表分开的逻辑。

使用Series.map应用自定义映射column_A根据所需要求映射 column_A 中的每个列表:

df['postag'] = df['column_A'].map(lambda l: [b for a, b in l])

另一个可能的想法:

df['postag'] = [[y for x, y in lst] for lst in df['column_A']]

结果:

# print(df)

                                            column_A                      postag
0  [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN...  [NNP, NNP, NNP, CC, RB, Z]
1  [(fosil, NN), (mamalia, NN), (yang, SC), (meng...       [NN, NN, SC, VB, NNP]

您可以通过以下应用 function 来实现此目的:

data = [{'column_A': [('Ahli', 'NNP'),
        ('paleontologi', 'NNP'),
        ('Thomas', 'NNP'),
        ('dan', 'CC'),
        ('timnya', 'RB'),
        ('.', 'Z')]},
        {'column_A': [('fosil', 'NN'),
        ('mamalia', 'NN'),
        ('yang', 'SC'),
        ('menghuni', 'VB'),
        ('Antartika', 'NNP')]}]

df = pd.DataFrame(data)
df['postag'] = df['column_A'].apply(lambda x : [y[1] for y in x])
df

Output:

    column_A                                            postag
0   [(Ahli, NNP), (paleontologi, NNP), (Thomas, NN...   [NNP, NNP, NNP, CC, RB, Z]
1   [(fosil, NN), (mamalia, NN), (yang, SC), (meng...   [NN, NN, SC, VB, NNP]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM