简体   繁体   English

如何快速添加一个大列表值对应python pandas dataframe

[英]How to quickly add a large list of value to the corresponding python pandas dataframe

I have a large csv file with the following format (example), the report_date is currently empty:我有一个大的 csv 文件,格式如下(示例),report_date 当前为空:

| ids | disease_code | report_date |
| --- | ------------ | ----------- |
| 10  |    I202      |             |
| 11  |    I232      |             |
| 11  |    I242      |             |

I generated a list of tuples from a data source like the following:我从数据源生成了一个元组列表,如下所示:

[(10, ['I202'], 2021-10-22), (11, ['I232', 'I242'], 2021-11-22), (11, ['I232', 'I242'], 2021-11-12),.....]

The above order is patient_id, disease_code and the reported_date (The dates are in order corresponding to the disease), for a patient who has more than one disease, the reported date was unfortunately separated into two tuples.上面的顺序是patient_id, disease_code and the reported_date(日期按照疾病的先后顺序),对于一个患有不止一种疾病的患者,reported date很不幸被分成了两个元组。 Now I want to fill the report_date column by matching the first two values of the tuple with the current csv, like this:现在我想通过将元组的前两个值与当前的 csv 匹配来填充 report_date 列,如下所示:

| ids | disease_code | report_date |
| --- | ------------ | ----------- |
| 10  |    I202      | 2021-10-22  |
| 11  |    I232      | 2021-11-22  |
| 11  |    I242      | 2021-11-12  |

I tried to use a nested loop but it seems like it will take 480 hours to complete.我尝试使用嵌套循环,但似乎需要 480 小时才能完成。 I believe there is a more simple answer but I could not figure it out.我相信有一个更简单的答案,但我无法弄清楚。 Any hint would be appreciated.任何提示将不胜感激。

First, you can create a dataframe with your data.首先,您可以使用您的数据创建一个 dataframe。 You'll see that the column "disease_code" contains a list of values, just as you mentioned:正如您提到的,您会看到"disease_code"列包含一个值列表:

>> df = pd.DataFrame(
    [(10, ['I202'], "2021-10-22"), (11, ['I232', 'I242'], "2021-11-22"), (11, ['I232', 'I242'], "2021-11-12")],
    columns=["ids", "disease_code", "report_date"],
)
>> df["report_date"] = pd.to_datetime(df["report_date"])
>> df
   ids  disease_code report_date
0   10        [I202]  2021-10-22
1   11  [I232, I242]  2021-11-22
2   11  [I232, I242]  2021-11-12

Now you need to separate the values in the "disease_code" column by repeating the values in the other columns... pd.DataFrame.explode does exactly that.现在您需要通过重复其他列中的值来分隔"disease_code"列中的值... pd.DataFrame.explode正是这样做的。 This method transforms values in a list-like column to multiple rows:此方法将类似列表的列中的值转换为多行:

>> df.explode(["disease_code"])  # Explode the "disease_code" column
   ids disease_code report_date
0   10         I202  2021-10-22
1   11         I232  2021-11-22
1   11         I242  2021-11-22
2   11         I232  2021-11-12
2   11         I242  2021-11-12

For new DataFrame use list comprehension:对于新的 DataFrame 使用列表理解:

L = [(10, ['I202'], '2021-10-22'), 
     (11, ['I232', 'I242'], '2021-11-22'),
     (11, ['I232', 'I242'], '2021-11-12')]

df1 = pd.DataFrame([(a, x, c) for a, b, c in L for x in b], 
                   columns=["ids", "disease_code", "report_date"])
print (df1)
   ids disease_code report_date
0   10         I202  2021-10-22
1   11         I232  2021-11-22
2   11         I242  2021-11-22
3   11         I232  2021-11-12
4   11         I242  2021-11-12

Then DataFrame.merge to original DataFrame df , but because there are duplicates in ids, disease_code columns first remove them:然后DataFrame.merge到原来的 DataFrame df ,但是因为ids, disease_code列先去掉:

print (df)
   ids disease_code  report_date
0   10         I202          NaN
1   11         I232          NaN
2   11         I242          NaN

print (df1.drop_duplicates(['ids','disease_code']))
   ids disease_code report_date
0   10         I202  2021-10-22
1   11         I232  2021-11-22
2   11         I242  2021-11-22

df = (df.drop('report_date', axis=1)
        .merge(df1.drop_duplicates(['ids','disease_code']), 
               on=['ids','disease_code']))
print (df)
   ids disease_code report_date
0   10         I202  2021-10-22
1   11         I232  2021-11-22
2   11         I242  2021-11-22

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在大 Pandas DataFrame 中找到“True”值对应的索引和列? - How to find the `True` values' corresponding index and column in a large Pandas DataFrame? Python 如何将 object 中的值与列表中的相应索引值相加 - Python how to add the values in an object with the corresponding index value in a list 如果上一列中的对应项在列表中,则将新列添加到 pandas dataframe - Add a new column to a pandas dataframe if the corresponding item in the previous column is in a list python pandas在dataframe列中添加列表作为默认值 - python pandas add list in dataframe column as default value 如何快速检查pandas DataFrame索引中是否存在值? - How to quickly check if a value exists in pandas DataFrame index? Python Pandas:如何读取列表中的所有元素并从 dataframe 中检索相应的值 - Python Pandas: how to read all elements in a list and retrieve the corresponding values from dataframe 快速将多个列添加到Pandas数据框中 - Add multiple columns to a Pandas dataframe quickly 如何将列表中的项目添加到 Python Pandas 中的数据框列? - How to add a items from a list to a dataframe column in Python Pandas? 如何根据相应行中 pd.DataFrame 中的值将值添加到列表? - How to add values to list based on value in pd.DataFrame in corresponding row? 如何将“无”添加到没有相应值的列表中? - How to add None to the list where there is no corresponding value?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM