简体   繁体   English

更新 python 中的元组列表

[英]Update list of tuples in python

I have a dataframe where each row is a list of tuples, such as我有一个 dataframe ,其中每一行都是一个元组列表,例如

[('This', 'DET'), ('is', 'VERB'), ('an', 'DET'), ('example', 'NOUN'), ('text', 'NOUN'), ('that', 'DET'), ('I', 'PRON'), ('use', 'VERB'), ('in', 'ADP'), ('order', 'NOUN'), ('to', 'PART'), ('get', 'VERB'), ('an', 'DET'), ('answer', 'NOUN')]

Then, in each row, I mark the words of some tuples with <IN>word</IN> or <TA>word</TA> .然后,在每一行中,我用<IN>word</IN><TA>word</TA>标记一些元组的单词。 For example:例如:

updated_word : <IN>example</IN>
updated_word  : <TA>answer</TA>

I want to update each row of the dataframe so that it contains the updated version of my tuples, and have something like:我想更新 dataframe 的每一行,以便它包含我的元组的更新版本,并具有以下内容:

[('This', 'DET'), ('is', 'VERB'), ('an', 'DET'), ('<IN>example</IN>', 'NOUN'), ('text', 'NOUN'), ('that', 'DET'), ('I', 'PRON'), ('use', 'VERB'), ('in', 'ADP'), ('order', 'NOUN'), ('to', 'PART'), ('get', 'VERB'), ('an', 'DET'), ('<TA>answer</TA>', 'NOUN')]

I have managed to update each tuple separately, but I cannot find a way to append them to the dataframe row and have the updated list of tuples per row.我已经设法分别更新每个元组,但是我找不到将 append 到 dataframe 行的方法,并且每行都有更新的元组列表 Can someone help me?有人能帮我吗?

Here is the code:这是代码:

cols = list(df.columns)[4:]
for idx, row in df.iterrows():
    doc = nlp(row['title'])
    pos_tags = [(token.text, token.pos_) for token in doc if not token.pos_ == "PUNCT"]

    for position, tuple_ in enumerate(pos_tags, start=1):
        word = tuple_[0]
        spacy_pos_tag = tuple_[1]
        word = re.sub(r'[^\w\s]', '', word)
        for col in cols:
           if position in row[col]:
              word = f'<{col.upper()}>{word}</{col.upper()}>'
           else:
              word = word
         new_text.append(' '.join(word))
         tuple_ = (word, spacy_pos_tag)
        pos_tags[position] = tuple_
df['title'] = pos_tags
print(df.title)

UPDATE更新

I used @Peter White 's suggestion to get the list of tuples, but I still get an error when I want to append each pos_tags list of tuples into each row of my dataframe column named df['title'] .我使用@Peter White 的建议来获取元组列表,但是当我想将 append 每个 pos_tags 元组列表放入名为df['title']的 dataframe 列的每一行中时,我仍然遇到错误。 The error message is:错误信息是:

    raise ValueError(
 ValueError: Length of values (23) does not match length of index (500)

Put pos_tags[position] = tuple_?把 pos_tags[position] = tuple_? at the end, and remove the start=1 from enumerate:最后,从枚举中删除 start=1:

cols = list(df.columns)[4:]
for idx, row in df.iterrows():
    doc = nlp(row['title'])
    pos_tags = [(token.text, token.pos_) for token in doc if not token.pos_ == "PUNCT"]

    for position, tuple_ in enumerate(pos_tags):
        word = tuple_[0]
        spacy_pos_tag = tuple_[1]
        word = re.sub(r'[^\w\s]', '', word)
        for col in cols:
           if position in row[col]:
              word = f'<{col.upper()}>{word}</{col.upper()}>'
           else:
              word = word
         new_text.append(' '.join(word))
         tuple_ = (word, spacy_pos_tag)
         print(tuple_)
         pos_tags[position] = tuple_

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM